Re: [AVTCORE] [payload] Clarifying the H.265 RTP payload format specification's text on when to set the RTP "M" bit

"Mo Zanaty (mzanaty)" <mzanaty@cisco.com> Sat, 10 August 2013 03:21 UTC

Return-Path: <mzanaty@cisco.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0A1631F0D5B; Fri, 9 Aug 2013 20:21:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.598
X-Spam-Level:
X-Spam-Status: No, score=-10.598 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HPwTbovYkQqr; Fri, 9 Aug 2013 20:21:41 -0700 (PDT)
Received: from rcdn-iport-1.cisco.com (rcdn-iport-1.cisco.com [173.37.86.72]) by ietfa.amsl.com (Postfix) with ESMTP id 07C5D11E80E1; Fri, 9 Aug 2013 20:15:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=24516; q=dns/txt; s=iport; t=1376104509; x=1377314109; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=9nMqi7xSAthD1UZc7OOTUeCL9yC5XS7sbLhDJVSJ9qU=; b=WoadOQubo3H5A7PLk7AOqbThmU1IDxJOi7YM7uYiIyEzy2ptuZLEoSpu H16mYSKYZFKFXQatdh2BJ6mK9cIrSOEX7NpdA8f+Zwm88IVb+ES0zefeV /mZeo3+3QMHO1CeDbYWtz9aJCKciEdnHtpRZQhRRYgAMW+OxIH2KsMHN5 E=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AikFANSvBVKtJXG//2dsb2JhbABbgkJENVC9SwGBDoEcFnSCJAEBAQQtXAIBCBEEAQELFgMEBzIUCQgCBAESCIgIuEuObIEVNwGDGnUDlAuVJoFhgTqBcTk
X-IronPort-AV: E=Sophos; i="4.89,849,1367971200"; d="scan'208,217"; a="245494983"
Received: from rcdn-core2-4.cisco.com ([173.37.113.191]) by rcdn-iport-1.cisco.com with ESMTP; 10 Aug 2013 03:15:04 +0000
Received: from xhc-aln-x04.cisco.com (xhc-aln-x04.cisco.com [173.36.12.78]) by rcdn-core2-4.cisco.com (8.14.5/8.14.5) with ESMTP id r7A3F3H2030968 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Sat, 10 Aug 2013 03:15:03 GMT
Received: from xmb-rcd-x14.cisco.com ([169.254.4.213]) by xhc-aln-x04.cisco.com ([173.36.12.78]) with mapi id 14.02.0318.004; Fri, 9 Aug 2013 22:15:02 -0500
From: "Mo Zanaty (mzanaty)" <mzanaty@cisco.com>
To: Ross Finlayson <finlayson@live555.com>, "avt@ietf.org" <avt@ietf.org>, "payload@ietf.org" <payload@ietf.org>
Thread-Topic: [AVTCORE] [payload] Clarifying the H.265 RTP payload format specification's text on when to set the RTP "M" bit
Thread-Index: AQHOlWQpaJZ07kYRdUCcKb/5v960lpmNqdXA
Date: Sat, 10 Aug 2013 03:15:02 +0000
Message-ID: <3879D71E758A7E4AA99A35DD8D41D3D91D4CC65C@xmb-rcd-x14.cisco.com>
References: <C1F36850-2B72-4A98-97B7-8847C9C90CB0@live555.com> <3879D71E758A7E4AA99A35DD8D41D3D91D4C7AA7@xmb-rcd-x14.cisco.com> <A0D9B971-CDE0-4AF7-9A70-C280514AEF10@live555.com> <8BA7D4CEACFFE04BA2D902BF11719A83384D4C47@nasanexd02f.na.qualcomm.com> <5D781561-69CD-4E0D-9377-B4B26036E691@live555.com> <8BA7D4CEACFFE04BA2D902BF11719A83384D5FB3@nasanexd02f.na.qualcomm.com> <69F25DB2-6926-4ACA-935D-90376A2A7BFA@live555.com> <8BA7D4CEACFFE04BA2D902BF11719A83384D690F@nasanexd02f.na.qualcomm.com> <A6FDCB43-F221-4BE0-BEDA-1F1E5394DDEC@live555.com>
In-Reply-To: <A6FDCB43-F221-4BE0-BEDA-1F1E5394DDEC@live555.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.82.235.195]
Content-Type: multipart/alternative; boundary="_000_3879D71E758A7E4AA99A35DD8D41D3D91D4CC65Cxmbrcdx14ciscoc_"
MIME-Version: 1.0
Subject: Re: [AVTCORE] [payload] Clarifying the H.265 RTP payload format specification's text on when to set the RTP "M" bit
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 10 Aug 2013 03:21:47 -0000

Hi Ross,

If we focus on the RTP timestamp rather than the marker bit, I think we will solve your problem. The timestamp MUST be set reliably since it is critical for many RTP functions. The marker bit SHOULD be set reliably to allow receivers to detect end of frame without waiting for the next frame/timestamp, but this is not a MUST because it is just a latency optimization rather than a critical function.

If you are unable to set the RTP timestamp reliably, that is a much bigger issue than the marker bit. That would be an entirely different discussion, so I assume that is not the case.

If you are able to set the RTP timestamp reliably, then you must already have some sort of frame boundary identification. In that case, you can set the marker bit based on the timestamp changing, assuming you are willing to wait for the next frame/timestamp before sending the last packet of a frame.

But you should realize that, in doing this, you have defeated the marker bit optimization. You have just shifted the latency from the receiver to the sender, but the latency remains the same. In both cases you incur the extra latency between the last packet of a frame and the first packet of the next frame.

If the sender incurs this latency to make the marker bit reliable, the receiver will hopefully detect that it is reliable and not incur further latency. However, a simple receiver may incur further latency if it only looks for timestamp changes and has no logic to detect marker reliability to optimize out this latency. Also, if the receiver jitter buffer is currently longer than the extra latency added by the sender, then the sender added the extra latency unnecessarily.

If the sender never sets the marker bit to avoid adding latency, the receiver will incur the latency once it sees the marker bit is unreliable (or unconditionally for simple receivers that only look for timestamp changes).

So there is not much benefit in making the marker bit reliable in this case, and there is even the risk that it will add unnecessary latency.

Regards,
Mo

From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of Ross Finlayson
Sent: Friday, August 09, 2013 8:48 PM
To: avt@ietf.org; payload@ietf.org
Subject: Re: [AVTCORE] [payload] Clarifying the H.265 RTP payload format specification's text on when to set the RTP "M" bit

Now I understand your intention better - did not see that the note was intended for sender implementers.

OK, thanks.



 For this purpose, indeed (as Mo pointed out earlier?) that first of all, the sender needs make sure the RTP timestamp for each RTP packet is correct. As long as RTP timestamp is correct, it is also easy to know when the M bit should be set. Or are you saying that there is no problem with RTP timestamp setting but there is a problem for the M bit setting? If yes, please clarify why.

Figuring out the RTP timestamp isn't (nearly as much of) a problem, because this can be done by knowing the stream's frame rate.  But in this email thread, I'm focusing only on the RTP 'M' bit, because for it (unlike the timestamp) there's less of an incentive for a sender implementation to 'get it right'.  I'm worried that if the RTP payload format specification doesn't make it crystal clear when a NAL unit ends an 'access unit', then an implementer might be tempted to just not bother setting the bit at all.  I don't want to see that happen.  Hence this email thread.



I have no problem to putting some information in the line as you suggested, if the suggested text is good. However, the suggested wording below still has a few problems (or shortcomings).

----------
Unfortunately the contents of a NAL unit, alone, does not tell a RTP sender implementation whether or not the NAL unit ends an access unit.  Instead, the implementation can obtain this information separately, from the encoder.  If, however, this information is not available directly from the encoder (e.g., because the implementation is sending data that consists solely of a sequence of pre-encoded NAL units), then it must instead inspect subsequent NAL units, to determine whether or not the NAL unit ends an access unit.  The following rule can be used:
    A NAL unit ends an access unit if it is a VCL NAL unit, and the next-occurring VCL NAL unit has the high-order bit of the first byte after its NALU header equal to 1.
----------

The problems (or shortcomings) are:
-          I don't think it is good to say that the fact that a NAL unit itself does not indicate whether it is the last one of an AU is unfortunate.

Why not?  It *is* "unfortunate".  I'm not intending to 'insult' H.265 here :-)



-          Saying that the information can be obtained from the encoder is vague to me. I guess you meant the bitstream instead, as the encoder may be absent e.g. when dealing pre-encoded video.

I think you may have been confused by my use of the word "can" in the 2nd sentence of my proposed paragraph.  Here is a rewording of the 2nd (& 3rd) sentences; I hope this will make things clearer:

            "Instead, the implementation may be able to obtain this information separately, from the encoder.  If, however, the implementation cannot obtain this information directly from the encoder (e.g., because the implementation is sending data that consists solely of a sequence of pre-encoded NAL units), then it must instead inspect subsequent NAL units, to determine whether or not the NAL unit ends an access unit."



-          Last but not least, an AU may also end with a non-VCL NAL unit, e.g. a filler data NAL unit, an end of sequence NAL unit, an end of bitstream NAL unit, a suffix SEI NAL unit, etc.

Arggh!  Now, I hope, you see the problem :-)  Can you then help me figure out a more accurate wording for the rule?  In particular:
            1/ Suppose we have a VCL NAL unit, followed immediately by another VCL NAL unit with the first slice header bit set.  In this case, there's no doubt: The first VCL NAL unit definitely ends an 'access unit' - correct?
            2/ Suppose we have a VCL NAL unit, followed by a sequence of one or more non-VCL NAL units, followed immediately by a VCL NAL unit with the first slice header bit set.  In this case, which (if any) of the non-VCL NAL units ends an 'access unit'?  Can we figure out a rule (ideally, an easy rule, based solely on the "nal_unit_type"s) for this?

Ross Finlayson
Live Networks, Inc.
http://www.live555.com/