Re: [AVTCORE] Clarifying the H.264 RTP payload format specification's text on when to set the RTP "M" bit

"Mo Zanaty (mzanaty)" <mzanaty@cisco.com> Wed, 31 July 2013 23:58 UTC

Return-Path: <mzanaty@cisco.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 37CCE11E80E2; Wed, 31 Jul 2013 16:58:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.598
X-Spam-Level:
X-Spam-Status: No, score=-10.598 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c4mXpUnOWnZP; Wed, 31 Jul 2013 16:58:26 -0700 (PDT)
Received: from rcdn-iport-6.cisco.com (rcdn-iport-6.cisco.com [173.37.86.77]) by ietfa.amsl.com (Postfix) with ESMTP id A88AF21F99A9; Wed, 31 Jul 2013 16:58:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=19170; q=dns/txt; s=iport; t=1375315104; x=1376524704; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=7JIuxYoqp7rsDFCvKAgZVXeFWnu0pN210KoennfGuNA=; b=fKj8ykvLiTxzygSv04mmmUegqrQcBKGwg46uQX5mmmXPn3Py159N8JJ0 sJnSggwn5edY1rc733Wi4orC9wlg8keSbATs9AbPljpMI7hqiljclzavD aaYUE0UNgr7pjc+u8aB9AeNFaYTAxmhAtClzDx1wxGUXEp23sP+RcP0zt 0=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApsFAOCj+VGtJXG+/2dsb2JhbABbgkJENVC9K4EJgRwWdIIkAQEBBC1cAgEIEQQBAQsZBAcyFAkIAgQBEgiICLklj1Y3AYMYcwOpLIFbgTmCKg
X-IronPort-AV: E=Sophos; i="4.89,790,1367971200"; d="scan'208,217"; a="242026423"
Received: from rcdn-core2-3.cisco.com ([173.37.113.190]) by rcdn-iport-6.cisco.com with ESMTP; 31 Jul 2013 23:58:23 +0000
Received: from xhc-aln-x15.cisco.com (xhc-aln-x15.cisco.com [173.36.12.89]) by rcdn-core2-3.cisco.com (8.14.5/8.14.5) with ESMTP id r6VNwNIc013551 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 31 Jul 2013 23:58:23 GMT
Received: from xmb-rcd-x14.cisco.com ([169.254.4.213]) by xhc-aln-x15.cisco.com ([173.36.12.89]) with mapi id 14.02.0318.004; Wed, 31 Jul 2013 18:58:22 -0500
From: "Mo Zanaty (mzanaty)" <mzanaty@cisco.com>
To: Ross Finlayson <finlayson@live555.com>, "avt@ietf.org" <avt@ietf.org>, "payload@ietf.org" <payload@ietf.org>
Thread-Topic: [AVTCORE] Clarifying the H.264 RTP payload format specification's text on when to set the RTP "M" bit
Thread-Index: AQHOjhp3XWusdVYmq0SHqDkxPhapBJl/ZD4w
Date: Wed, 31 Jul 2013 23:58:22 +0000
Message-ID: <3879D71E758A7E4AA99A35DD8D41D3D91D4C7AA7@xmb-rcd-x14.cisco.com>
References: <C1F36850-2B72-4A98-97B7-8847C9C90CB0@live555.com>
In-Reply-To: <C1F36850-2B72-4A98-97B7-8847C9C90CB0@live555.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.61.170.78]
Content-Type: multipart/alternative; boundary="_000_3879D71E758A7E4AA99A35DD8D41D3D91D4C7AA7xmbrcdx14ciscoc_"
MIME-Version: 1.0
Subject: Re: [AVTCORE] Clarifying the H.264 RTP payload format specification's text on when to set the RTP "M" bit
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Jul 2013 23:58:37 -0000

Hi Ross,

If your application interfaces with an encoder or systems layer (container file format, transport stream, CDN stream, etc.) that does not indicate frame boundaries nor any timing information, then how are you setting the RTP timestamp? That seems like a much more fundamental issue than the marker bit.

If you are willing to detect the last NALU of a frame by waiting for the first VCL NALU of the next frame, that is very simple in both H.264 and H.265. The first bit after the VCL NALU header is set if it's the first VCL NALU of a frame. This works in H.265 because the slice header explicitly has such a flag. It works in H.264 because the slice header starts with the macroblock address which will be 0 at the start of the frame (assuming no ASO/FMO), and 0 encodes to 1 under ue(v) encoding rules. Keep in mind the NALU header is 2 bytes in H.265 vs. 1 byte in H.264, so the slice header offsets are different.

If you can't wait for the first NALU of the next frame, it is much harder to detect the last NALU of a frame without full VCL bitstream parsing, unless you use some heuristics which may be simpler but less reliable. This is part of the reason why the marker bit in video payloads indicates end of frame, because it may be very difficult for RTP receivers to detect without deep parsing otherwise. Note that RTP receivers are technically forbidden to rely on the marker bit. But many implementations detect whether it is reliable, then use it as an optimization to minimize latency if reliable.

I don't think the draft needs any more text beyond what it currently says.

Regards,
Mo


From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of Ross Finlayson
Sent: Wednesday, July 31, 2013 2:19 PM
To: avt@ietf.org
Subject: [AVTCORE] Clarifying the H.264 RTP payload format specification's text on when to set the RTP "M" bit

As I noted in Wednesday morning's AVTCORE session, the proposed H.265 RTP payload format specification is not as clear as it could be about how a RTP sender knows when to set the "M" bit on an outgoing RTP packet.

Currently, the text says:

   Marker bit (M): 1 bit

      Set for the last packet of the access unit indicated by the RTP
      timestamp, in line with the normal use of the M bit in video
      formats, to allow an efficient playout buffer handling.  Decoders
      can use this bit as an early indication of the last packet of an
      access unit.

There is nothing inherently wrong with this.  However (because we're not in the MPEG world here :-) it's not sufficient to fully specify only how the receivers of RTP packets should behave; we should also try to specify, as clearly as possible, how the *senders* of RTP packets should behave.  We should try to give at least some guidance about when the NAL unit that's being sent ends an "access unit" because - for many implementations - it is not immediately obvious.

If I were amending the H.264 payload format specification - which of course I'm not - then I would, hypothetically, add a paragraph something like the following:

----------
Unfortunately the contents of a NAL unit, alone, does not tell a RTP sender implementation whether or not the NAL unit ends an access unit.  Instead, the implementation can obtain this information separately, from the encoder.  If, however, this information is not directly available from the encoder (e.g., because the implementation is sending data that consists solely of a sequence of pre-encoded NAL units), then it must instead inspect the next NAL unit, to determine whether or not the current NAL unit ends an access unit.  The following rule can be used:
    The current NAL unit ends an access unit if it is a VCL NAL unit (i.e., if its "nal_unit_type" is in the range [1..5]), and if either:
            - the next NAL unit is not a VCL NAL unit, or
            - either the current NAL unit's "nal_unit_type" or the next NAL unit's "nal_unit_type" - but not both - is 5 (i.e., "Coded slice of an IDR picture"), or
            - the next NAL unit's "nal_ref_idc" field differs from the current NAL unit's "nal_ref_idc" field, with one of them being equal to zero, or
            - both the current and next NAL units begin with slice headers, and their "frame_num" fields differ, or
            - both the current and next NAL units begin with slice headers, and their "pic_parameter_set_id" fields differ, or
            - both the current and next NAL units begin with slice headers, and their "field_pic_flag" fields differ, or
            - both the current and next NAL units begin with slice headers, and their "bottom_field_flag" fields differ, or
            - both the current and next NAL units' "nal_ref_idc" field is 5, and the "idr_pic_id" fields differ.
----------

(I derived this rule from Section 7.4.1.2.4 - "Detection of the first VCL NAL unit of a primary coded picture" - of the H.264 specification.  Perhaps it could be simplified?)

I suggest that an analogous paragraph be added to the H.265 RTP payload format specification.  (Or alternatively, instead of writing a detailed rule like this, perhaps a reference to an appropriate part of the H.265 specification might be made instead (can we do this in a IETF RFC?).)

Ross Finlayson
Live Networks, Inc.
http://www.live555.com/