[AVTCORE] Clarifying the H.264 RTP payload format specification's text on when to set the RTP "M" bit

Ross Finlayson <finlayson@live555.com> Wed, 31 July 2013 18:19 UTC

Return-Path: <finlayson@live555.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0C31411E81A6 for <avt@ietfa.amsl.com>; Wed, 31 Jul 2013 11:19:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=-0.001, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RUfcwYRv7D1P for <avt@ietfa.amsl.com>; Wed, 31 Jul 2013 11:19:00 -0700 (PDT)
Received: from ns.live555.com (ns.live555.com [4.79.217.242]) by ietfa.amsl.com (Postfix) with ESMTP id B7FFC11E81B3 for <avt@ietf.org>; Wed, 31 Jul 2013 11:18:59 -0700 (PDT)
Received: from [127.0.0.1] (localhost.live555.com [127.0.0.1]) by ns.live555.com (8.14.4/8.14.4) with ESMTP id r6VIIuRl083580 for <avt@ietf.org>; Wed, 31 Jul 2013 11:18:57 -0700 (PDT) (envelope-from finlayson@live555.com)
From: Ross Finlayson <finlayson@live555.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_C1B75B98-4F87-4D2D-9D71-C14A5C50B1BD"
Message-Id: <C1F36850-2B72-4A98-97B7-8847C9C90CB0@live555.com>
Date: Wed, 31 Jul 2013 20:18:56 +0200
To: avt@ietf.org
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
X-Mailer: Apple Mail (2.1508)
Subject: [AVTCORE] Clarifying the H.264 RTP payload format specification's text on when to set the RTP "M" bit
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Jul 2013 18:19:05 -0000

As I noted in Wednesday morning's AVTCORE session, the proposed H.265 RTP payload format specification is not as clear as it could be about how a RTP sender knows when to set the "M" bit on an outgoing RTP packet.

Currently, the text says:

   Marker bit (M): 1 bit

      Set for the last packet of the access unit indicated by the RTP
      timestamp, in line with the normal use of the M bit in video
      formats, to allow an efficient playout buffer handling.  Decoders
      can use this bit as an early indication of the last packet of an
      access unit.

There is nothing inherently wrong with this.  However (because we're not in the MPEG world here :-) it's not sufficient to fully specify only how the receivers of RTP packets should behave; we should also try to specify, as clearly as possible, how the *senders* of RTP packets should behave.  We should try to give at least some guidance about when the NAL unit that's being sent ends an "access unit" because - for many implementations - it is not immediately obvious.

If I were amending the H.264 payload format specification - which of course I'm not - then I would, hypothetically, add a paragraph something like the following:

----------
Unfortunately the contents of a NAL unit, alone, does not tell a RTP sender implementation whether or not the NAL unit ends an access unit.  Instead, the implementation can obtain this information separately, from the encoder.  If, however, this information is not directly available from the encoder (e.g., because the implementation is sending data that consists solely of a sequence of pre-encoded NAL units), then it must instead inspect the next NAL unit, to determine whether or not the current NAL unit ends an access unit.  The following rule can be used:
    The current NAL unit ends an access unit if it is a VCL NAL unit (i.e., if its "nal_unit_type" is in the range [1..5]), and if either:
	- the next NAL unit is not a VCL NAL unit, or
	- either the current NAL unit's "nal_unit_type" or the next NAL unit's "nal_unit_type" - but not both - is 5 (i.e., "Coded slice of an IDR picture"), or
	- the next NAL unit's "nal_ref_idc" field differs from the current NAL unit's "nal_ref_idc" field, with one of them being equal to zero, or
	- both the current and next NAL units begin with slice headers, and their "frame_num" fields differ, or
	- both the current and next NAL units begin with slice headers, and their "pic_parameter_set_id" fields differ, or
	- both the current and next NAL units begin with slice headers, and their "field_pic_flag" fields differ, or
	- both the current and next NAL units begin with slice headers, and their "bottom_field_flag" fields differ, or
	- both the current and next NAL units' "nal_ref_idc" field is 5, and the "idr_pic_id" fields differ.
----------

(I derived this rule from Section 7.4.1.2.4 - "Detection of the first VCL NAL unit of a primary coded picture" - of the H.264 specification.  Perhaps it could be simplified?)

I suggest that an analogous paragraph be added to the H.265 RTP payload format specification.  (Or alternatively, instead of writing a detailed rule like this, perhaps a reference to an appropriate part of the H.265 specification might be made instead (can we do this in a IETF RFC?).)

Ross Finlayson
Live Networks, Inc.
http://www.live555.com/