[AVT] Re: Comments on draft-ietf-avt-rtp-svc-02

Magnus Westerlund <magnus.westerlund@ericsson.com> Mon, 03 September 2007 14:45 UTC

Return-path: <avt-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1ISDBE-0007KV-GV; Mon, 03 Sep 2007 10:45:32 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1ISDBC-0007KQ-Nl for avt@ietf.org; Mon, 03 Sep 2007 10:45:30 -0400
Received: from mailgw4.ericsson.se ([193.180.251.62]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1ISDBB-0005eO-1t for avt@ietf.org; Mon, 03 Sep 2007 10:45:30 -0400
Received: from mailgw4.ericsson.se (unknown [127.0.0.1]) by mailgw4.ericsson.se (Symantec Mail Security) with ESMTP id 186842133A; Mon, 3 Sep 2007 16:45:28 +0200 (CEST)
X-AuditID: c1b4fb3e-b0034bb0000007e1-4d-46dc1e07b013
Received: from esealmw128.eemea.ericsson.se (unknown [153.88.254.121]) by mailgw4.ericsson.se (Symantec Mail Security) with ESMTP id BA4802132B; Mon, 3 Sep 2007 16:45:27 +0200 (CEST)
Received: from esealmw126.eemea.ericsson.se ([153.88.254.170]) by esealmw128.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Sep 2007 16:45:27 +0200
Received: from [147.214.30.247] ([147.214.30.247]) by esealmw126.eemea.ericsson.se with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Sep 2007 16:45:26 +0200
Message-ID: <46DC1E06.20709@ericsson.com>
Date: Mon, 03 Sep 2007 16:45:26 +0200
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Ye-Kui.Wang@nokia.com
References: <46D82939.8000300@ericsson.com> <1C1F3D15859526459B4DD0A7A9B2268B03CF1E4C@trebe101.NOE.Nokia.com>
In-Reply-To: <1C1F3D15859526459B4DD0A7A9B2268B03CF1E4C@trebe101.NOE.Nokia.com>
X-Enigmail-Version: 0.95.3
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 03 Sep 2007 14:45:26.0939 (UTC) FILETIME=[11044EB0:01C7EE39]
X-Brightmail-Tracker: AAAAAA==
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 6907f330301e69261fa73bed91449a20
Cc: stewe@stewe.org, miska.hannuksela@nokia.com, avt@ietf.org
Subject: [AVT] Re: Comments on draft-ietf-avt-rtp-svc-02
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Errors-To: avt-bounces@ietf.org

Hi,

See comments inline. I removed all issues where I am satisfied with
resolution or answer.

Ye-Kui.Wang@nokia.com skrev:
> Hi Magnus,
> 
> Thanks for your careful reading, thinking, and valuable comments. See replies inline. 
> 
>> 1. Section 1, last sentence. I think you need to clarify the following.
>> - That it uses its own identity rather then extending on the identity 
>> of H.264.
> 
> Do you mean to clarify that it uses its own media subtype name "H264-SVC" other than "H264"?

Yes.

> 
>> 2. Section 3.2: I think this text is a bit unclear. It can to easily be 
>> interpreted as that once an parameter set has been used it can't be 
>> changed. But I assume that this hasn't been changed and that you can 
>> change the active set and have the sets changed during the session by 
>> overwriting them. I think you should tell when the sets can be changed, 
>> rather also to make it clear. Because that do create some extra 
>> complexity.
>>
> 
> The content of an active sequence parameter set can only be changed at IDR access units (and each IDR access unit starts a new "coded video sequence"), while the content of an active picture parameter set can be changed at any layer picture (called "layer representation" according to the SVC spec). Sequence parameter set may be repated at non-IDR access units but shall not be overwritten at non-IDR access units. So what was written there is correct. Anyway, will try to make this point clearer. 

Well, my reading of the text is that it becomes a bit to hard to
interpret correct. So please look at what you can do to make the text
more readble.

> 
>> 3. Section 3.3, R and RR bits. Is this MUST in both send and receive.
>> And what do you do if you find an violation. To be extendable a clear 
>> receiver policy needs to be defined, chose between discard NAL or 
>> Ignore bit.
>>
> 
> Added "Receivers SHOULD discard NAL units with R equal to 0." and "Receivers SHOULD discard NAL units with RR not equal to '11'." for the two fields, respectively. 

Okay, that clarifies it.


> 
>> 4. Section 5.1.2:
>>
>> "   RTP packet stream: A sequence of RTP packets with increasing
>>   sequence numbers, identical PT and SSRC, carried in one RTP session.
>>   Within the scope of this memo, one RTP packet stream is utilized to
>>   transport an integer number of SVC Layers."
>>
>> I don't agree that the PT needs to be identical. However, for your 
>> purposes I assume that it is great simpification not having to think 
>> about people using multiple PTs configured for carrying SVC NAL units 
>> for the same video sequence. I think you are introducing a limitation 
>> that doesn't need to exist, but has little practical value.
>>
> 
> OK, will enable PT multiplexing in the new version. 

I hope this only means that you will remove the restriction you have
entered, not recommending specific behaviors regarding using multiple PTs.

>> 8. Section 6.6:
>>   [Ed.Note(YkW): I think we need more thinking on the
>>   value of the parameters. For example, requiring the parameters be
>>   the same for all the RTP streams and clients might be overkill for
>>   receivers of only lower layers.]
>>
>>
>>   [Edt. Note (StW): In RFC3984, the aforementioned codepoints are
>>   optional.  It appears that for SVC, when used in conjunction with
>>   session mux, they are mandatory.  I don't know how to express this
>>   in the MIME registration; we'll cross that bridge once we are
>>   getting to it.]"
>>
>>
>> Assuming that you have multiple layers in multiple RTP sessions. As I 
>> see it the only good way of getting the buffer handling to work will be 
>> max-don-diff. That needs to be the same for all layers. However 
>> deint-buf-req can increase and be the sum of all layers included and 
>> which there are dependencies from this session. That way one can have 
>> an increasing buffer requirement depending on the number of RTP 
>> sessions being received in a multi-layer structure.
>>
> 
> Requiring max-don-diff to be the same for all sessions could have been OK, if madating of interleaved packetization in layered multicast has not been relaxed in the Chicago meeting. However, as now it has been agreed to allow for any packetization mode for the base layer, therefore if the session for the base layer does not use interleaved mode while other sessions use interleaved mode, max-don-diff cannot be identical for all the sessions. 
> 
> Personally, I think each session should have their own parameters, optionally, as in RFC 3984. A receiver that subscribes to a certain set of sessions only consider the parameters of the session of the highest layer. In this case, for example, sprop-deint-buf-req would increase for sessions from low to high layers, as you described above. But we need more thinking after we have the NAL unit order recovery process for layered multicast on the table, which are are current working on. We authors will work hard to find a viable solution to be included in the new version. Suggestions are welcome and would be appreciated. 

Without having seen how you really are going to address this issue, it
is hard to know if you are making it to hard or only allows larger
flexibility. If one likes to support interleaving within a layer, then
clearly the interleaving and layer reordering is a form of overloading
of the DON field. I think that will work under certain restrictions. And
I don't think using interleaving is some layers and not in others will
be an issue for using DON to recover the decoding order. Please do
remember that the DON number usage can be very sparse seen from a
particular layer. Also interleaving is simply changing the transmission
order of things. If some layers are re-ordered and others not does not
make using the same max-don-diff a problem.

I think the real issue is if one allows for mixing non DON supporting
packetizations with DONs how that will be resolved. It might be that you
are making life unnecessary complex for no real gain. But I do
understand the desire to be able to use a non DON supporting
packetization in the base layer.


> 
>> 9. Section 6.9, why is the F bit redefined here? It seems much better 
>> to leave this bit alone to not complicate processing by having specific 
>> processing for it for this NALU type. Isn't the reasonable usage the 
>> same as for the aggregation NALU types in RFC 3984?
>>
> 
> The current definition of this F bit is the same as that for the aggregation NALU bit, i.e. they always have the same value. Do you mean that we should say something like "The value of this bit is unspecified."?

Sorry, the definition is okay, however you have logically inverted the
description. I should have checked better, rather then relying on memory.

> 
>> 13. Section 6.9, there is little concreate motivation why PASCI NALU 
>> and the additional signalling flags are needed.
>> Making an observation it seems that the PASCI has only limited 
>> appliability. First of all the level of aggregation within in a single 
>> RTP packet will not be that high that digging out the NALU headers will 
>> be a significant problem. Can we really expect that more than a few VCL 
>> NALUs to be present in the same packet. In some cases some SEI and 
>> parameters set NALUs may also appear but that will not be that common 
>> that it will be a significant issue. It is after all read and jump 
>> operations based on the NALU field length.
>> Thus making the jumping around operation on a per NALU become the 
>> following:
>>
>> 1. Read NALU type,
>> 2. read NALU length field (dependent on NALU type) 3. perform
>> 1 at current + NALU size + C.
>>
> 
> As you mentioned, the number of SEI and parameter set NALUs included in one packet could be big. In SVC, there are more NALUs that are small. The first is prefix NALU, which is mandatory for each VCL NALU in the base layer. The second is a coded slice NALU with slice_skip_flag equal to 1, which contains only a slice header. In addition, in SVC, SEI and parameter set NALUs can be specific to particular dependency and layer representations - which can further increase the number of NALUs in one packet. 
> 
>> Are there any great benefit from grouping the SEI NALUs in 
>> this NALU type? To me it seems that they could just as well be 
>> placed in the normal position. And if this information is so 
>> important please clarify which SEI messages that should be 
>> included here.
>>
> 
> Including SEI messages in PACSI NAL units allows to transport SEI messages with each coded slice of a picture for error resilience purposes. Just to name one example. When multiple copies of the same SEI message are received, those contained in PACSI NAL units can be easily identified and discarded to keep only one copy. This applies to any SEI messages. You could also do this by repeating the SEI messages directly in aggregation packets. However, this method has two problems, one is that it is not easy to identify whether the SEI message is a repeated one, the other is that it is not compatible with RFC 3984. Discarding of redundant copies is important because it is not for granted that the bitstream containing repeated SEI messages conforms to the SVC specification, as the implications to HRD conformance are unpredictable, and there might be such semantics of SEI messages that assume only one message per access unit.

Okay, but I don't understand how a receiver will be able to determine
which are the redundant copy. If you loose a primary transmission, you
still hve to go through all the SEI messages in the PACSI NALU to see if
you are missing anyone. In addition, I have some problem understanding
how one is going to ensure that the SEI message is inserted in the right
position in relation to the other NALUs present in the payload.


> 
>> Also the additional signalling information, I am missing how 
>> hard (if at all possible to derive) is to get from the NALUs 
>> itself. Is this additional information that doesn't exist 
>> otherwise in the bit-stream.
>> Also how useful is it?
>>
> 
> The additional fields (bytes 5-6) are either not present in the bistream or difficult to derive. 
> 
> The A bit tells whether you can do quality or spatial layer switching at an non-IDR intra layer picture (i.e. layer representation). When the coded pattern like IBBP is in use, non-IDR intra pictures are used for random access. Compared to using only IDR pictures, higher coding efficiency can be achieved. The H.264/AVC or SVC solution to indicate the random accessiblity of a non-IDR intra picture is using recovery point SEI message. With this A bit then it is much easier to parse than to parse the recovery point SEI message, which may even be buried deeply in an SEI NAL unit. And, the SEI message may not be present in the bitstream. 
> 
> The T bit tells whether you can do temporal layer switching (basically, changing of frame rate). SVC specifies Temporal Layer Switching Point SEI message for the signaling when needed. Again, parsing of SEI message is harder and they may not be present in the bitstream.
> 
> The P bit tells whether you may discard the packet because it contains redundant slice NAL units. The information itself is buried in the slice header, not in the fixed-length NAL unit header. 
> 
> The C bit tells whether the packet contains intra slices which may the only packets to be forwarded for a fast forward playback, e.g. when the network condition is extremely bad.
> 
> The S or E bit tells whether the first or last slice of a layer picture in decoding order is in the packet, to enable a MANE to fastly detect slice loss and take proper action such as request a retransmission etc, as well as to allow an efficient playout buffer handling similarly as the M bit in the RTP header. The RTP header M bit in SVC still indicates the end of an access unit not a layer picture. 
> 
> The TL0PICIDX field indicates the index of a lowest temporal layer picture. This enables detection of loss of picture in the most important temporal layer, by receivers as well as MANEs. Again, SVC includes an SEI message solution, which is harder to parse and may not be present in the bitstream. 
> 
>> My big question is: Is the PASCI motivated, despite being 
>> optional to utilize, it complicates the specification and 
>> implementations to smaller or bigger degree.
>>

Thanks, that helps clarify a lot.

> 
> See above. I hope the above motivations more or less resolve your concern. And, we will include more motivation text and use examples to the draft. 

Okay, I know to little about the SVC codec to determine if the provided
flags and the movement of the SEI to the front is providing sufficient
improved performance to motivate the PACSI construct. It do add
complexities to something that already is quite complex.

> 
>> 14. Section 9. I don't think SVC should change anything with 
>> the media type for AVC. Use it as is, for the base layer. Some 
>> explanation on how it relates to the SVC layers are of course 
>> necessary.
>>
> 
> Yes, it is specified that RFC 3984 is used for the base layer transported in its own session, which of course includes using the AVC media type. Will include some sentences to clarify the relationship of the two media types. 

It was more on the formulation in the section, which can be interpret
that it changes 3984.

> 
>> 15. Section 9.1: sprop-interleaving-depth: Will usage of 
>> interleaving be allowed in combination with layering? They are 
>> after all to certain degree orthogonal usage. You can after 
>> all have interleaving-depth 0 and still use layering with 
>> DONs. I think there need to be a separation on the buffering 
>> required for inserting the layers and for interleaving.
>>
>> I would propose that this is thought over and possibly new 
>> parameters are defined to better indicate properties of the 
>> layered stream when it comes to buffering.
>>
> 
> Yes, layering and interleaving can be applied at the same time. See my response to comment #8 regarding the parameters. 

Yes, needs to be worked out.


>> 20, Section 10, I think you will need to elaborate on why 
>> there are no extra security issues due to the layering.
>>
> 
> OK, will elaborate why, or describe what are the extra security issues, if any. 

thanks, if you do not at least explain a bit, it is hard to reviewers to
determine what issues has been considered.

Cheers

Magnus Westerlund

IETF Transport Area Director & TSVWG Chair
----------------------------------------------------------------------
Multimedia Technologies, Ericsson Research EAB/TVM/M
----------------------------------------------------------------------
Ericsson AB                | Phone +46 8 4048287
Torshamsgatan 23           | Fax   +46 8 7575550
S-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com
----------------------------------------------------------------------

_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt