[AVT] RE: Comments on draft-ietf-avt-rtp-svc-02
<Ye-Kui.Wang@nokia.com> Mon, 03 September 2007 12:54 UTC
Return-path: <avt-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1ISBRN-0004s4-5f; Mon, 03 Sep 2007 08:54:05 -0400
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1ISBRM-0004rz-6f for avt@ietf.org; Mon, 03 Sep 2007 08:54:04 -0400
Received: from smtp.nokia.com ([131.228.20.171] helo=mgw-ext12.nokia.com) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1ISBRJ-0001pt-U3 for avt@ietf.org; Mon, 03 Sep 2007 08:54:04 -0400
Received: from esebh106.NOE.Nokia.com (esebh106.ntc.nokia.com [172.21.138.213]) by mgw-ext12.nokia.com (Switch-3.2.5/Switch-3.2.5) with ESMTP id l83CrR3M010277; Mon, 3 Sep 2007 15:53:39 +0300
Received: from esebh103.NOE.Nokia.com ([172.21.143.33]) by esebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Sep 2007 15:53:29 +0300
Received: from trebe101.NOE.Nokia.com ([172.22.124.61]) by esebh103.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 3 Sep 2007 15:53:29 +0300
x-mimeole: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 03 Sep 2007 15:53:27 +0300
Message-ID: <1C1F3D15859526459B4DD0A7A9B2268B03CF1E4C@trebe101.NOE.Nokia.com>
In-Reply-To: <46D82939.8000300@ericsson.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Comments on draft-ietf-avt-rtp-svc-02
Thread-Index: Acfr3WlX/zypvLFdS1aIHK7BwdE68QCRD7fA
References: <46D82939.8000300@ericsson.com>
From: Ye-Kui.Wang@nokia.com
To: magnus.westerlund@ericsson.com, stewe@stewe.org, schierl@hhi.fhg.de, avt@ietf.org
X-OriginalArrivalTime: 03 Sep 2007 12:53:29.0074 (UTC) FILETIME=[6CDB7520:01C7EE29]
X-Nokia-AV: Clean
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e274a7d5658fb8b0d6fbc93f042d014b
Cc: miska.hannuksela@nokia.com
Subject: [AVT] RE: Comments on draft-ietf-avt-rtp-svc-02
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Errors-To: avt-bounces@ietf.org
Hi Magnus, Thanks for your careful reading, thinking, and valuable comments. See replies inline. >1. Section 1, last sentence. I think you need to clarify the following. > - That it uses its own identity rather then extending on the identity >of H.264. Do you mean to clarify that it uses its own media subtype name "H264-SVC" other than "H264"? > - That the deprecation only affects usage under SVC and not the normal >H.264 payload format. I would change the word deprecates, removes to >indicate that it doesn't change any existing only removes a not needed >functionality. > Addressed already in the new version under working. See also in below (comment #7). >2. Section 3.2: I think this text is a bit unclear. It can to easily be >interpreted as that once an parameter set has been used it can't be >changed. But I assume that this hasn't been changed and that you can >change the active set and have the sets changed during the session by >overwriting them. I think you should tell when the sets can be changed, >rather also to make it clear. Because that do create some extra >complexity. > The content of an active sequence parameter set can only be changed at IDR access units (and each IDR access unit starts a new "coded video sequence"), while the content of an active picture parameter set can be changed at any layer picture (called "layer representation" according to the SVC spec). Sequence parameter set may be repated at non-IDR access units but shall not be overwritten at non-IDR access units. So what was written there is correct. Anyway, will try to make this point clearer. >3. Section 3.3, R and RR bits. Is this MUST in both send and receive. >And what do you do if you find an violation. To be extendable a clear >receiver policy needs to be defined, chose between discard NAL or >Ignore bit. > Added "Receivers SHOULD discard NAL units with R equal to 0." and "Receivers SHOULD discard NAL units with RR not equal to '11'." for the two fields, respectively. >4. Section 5.1.2: > >" RTP packet stream: A sequence of RTP packets with increasing > sequence numbers, identical PT and SSRC, carried in one RTP session. > Within the scope of this memo, one RTP packet stream is utilized to > transport an integer number of SVC Layers." > >I don't agree that the PT needs to be identical. However, for your >purposes I assume that it is great simpification not having to think >about people using multiple PTs configured for carrying SVC NAL units >for the same video sequence. I think you are introducing a limitation >that doesn't need to exist, but has little practical value. > OK, will enable PT multiplexing in the new version. >5. Section 6.2: >"Please see section 5.1 of [RFC3984]. The following applies in > addition." > >I think you should reomve the second sentence. > Removed already in the new version under working. >6. Section 6.4, I think you need to extend a bit on what is meant with >protecting in this section. I understand it that it might be any RTP or >network transport mechanism that affect the probability of delivery of >the packet, including network QoS, FEC, RTP retransmissions, even >scheduling behavior if one has knowledge about a local link with such >properties. > OK. >7. Section 6.5, Is single nalu mode allowed for the base layer? As that >is following 3984, it is not clear. > Addressed already in the new version under working. And yes, single NAL unit mode is now clearly allowed for the base layer transported using RFC 3984. >8. Section 6.6: > [Ed.Note(YkW): I think we need more thinking on the > value of the parameters. For example, requiring the parameters be > the same for all the RTP streams and clients might be overkill for > receivers of only lower layers.] > > > [Edt. Note (StW): In RFC3984, the aforementioned codepoints are > optional. It appears that for SVC, when used in conjunction with > session mux, they are mandatory. I don't know how to express this > in the MIME registration; we'll cross that bridge once we are > getting to it.]" > > >Assuming that you have multiple layers in multiple RTP sessions. As I >see it the only good way of getting the buffer handling to work will be >max-don-diff. That needs to be the same for all layers. However >deint-buf-req can increase and be the sum of all layers included and >which there are dependencies from this session. That way one can have >an increasing buffer requirement depending on the number of RTP >sessions being received in a multi-layer structure. > Requiring max-don-diff to be the same for all sessions could have been OK, if madating of interleaved packetization in layered multicast has not been relaxed in the Chicago meeting. However, as now it has been agreed to allow for any packetization mode for the base layer, therefore if the session for the base layer does not use interleaved mode while other sessions use interleaved mode, max-don-diff cannot be identical for all the sessions. Personally, I think each session should have their own parameters, optionally, as in RFC 3984. A receiver that subscribes to a certain set of sessions only consider the parameters of the session of the highest layer. In this case, for example, sprop-deint-buf-req would increase for sessions from low to high layers, as you described above. But we need more thinking after we have the NAL unit order recovery process for layered multicast on the table, which are are current working on. We authors will work hard to find a viable solution to be included in the new version. Suggestions are welcome and would be appreciated. > >9. Section 6.9, why is the F bit redefined here? It seems much better >to leave this bit alone to not complicate processing by having specific >processing for it for this NALU type. Isn't the reasonable usage the >same as for the aggregation NALU types in RFC 3984? > The current definition of this F bit is the same as that for the aggregation NALU bit, i.e. they always have the same value. Do you mean that we should say something like "The value of this bit is unspecified."? >10. Section 6.9, definition of A bit is not understandable. To much new >concepts introduced in the section. I would recommend that the concepts >are explained in the introductionary part, rather then here in the >specification part. "Layer Picture" >also needs to be explained. > OK, the first point will be addressed, and the second point ("layer picture") has already been addressed in the new version under working. >11. Section 6.9, what is "temporal scalable layer switching point"? > The defintion is included in the end of the semantics specification of the T bit. But to make it easier to read, I will move it to the definition section, and also try to make the defintion easier to understand. >12. Section 6.9, are the definition, like "The P bit MUST be set to 1 >if all the layer pictures containing the target NAL units (as defined >above) are redundant pictures." meant to apply only to NALUs within >this packet but also to NALUs in any other packet? > Yes, it is expressed in a way that could apply to NAL units in some other packets. Will change to express in a way that apply only to the NALUs within the current packet, similarly as the I (idr_flag) bit, for example. >13. Section 6.9, there is little concreate motivation why PASCI NALU >and the additional signalling flags are needed. >Making an observation it seems that the PASCI has only limited >appliability. First of all the level of aggregation within in a single >RTP packet will not be that high that digging out the NALU headers will >be a significant problem. Can we really expect that more than a few VCL >NALUs to be present in the same packet. In some cases some SEI and >parameters set NALUs may also appear but that will not be that common >that it will be a significant issue. It is after all read and jump >operations based on the NALU field length. >Thus making the jumping around operation on a per NALU become the >following: > >1. Read NALU type, >2. read NALU length field (dependent on NALU type) 3. perform >1 at current + NALU size + C. > As you mentioned, the number of SEI and parameter set NALUs included in one packet could be big. In SVC, there are more NALUs that are small. The first is prefix NALU, which is mandatory for each VCL NALU in the base layer. The second is a coded slice NALU with slice_skip_flag equal to 1, which contains only a slice header. In addition, in SVC, SEI and parameter set NALUs can be specific to particular dependency and layer representations - which can further increase the number of NALUs in one packet. >Are there any great benefit from grouping the SEI NALUs in >this NALU type? To me it seems that they could just as well be >placed in the normal position. And if this information is so >important please clarify which SEI messages that should be >included here. > Including SEI messages in PACSI NAL units allows to transport SEI messages with each coded slice of a picture for error resilience purposes. Just to name one example. When multiple copies of the same SEI message are received, those contained in PACSI NAL units can be easily identified and discarded to keep only one copy. This applies to any SEI messages. You could also do this by repeating the SEI messages directly in aggregation packets. However, this method has two problems, one is that it is not easy to identify whether the SEI message is a repeated one, the other is that it is not compatible with RFC 3984. Discarding of redundant copies is important because it is not for granted that the bitstream containing repeated SEI messages conforms to the SVC specification, as the implications to HRD conformance are unpredictable, and there might be such semantics of SEI messages that assume only one message per access unit. >Also the additional signalling information, I am missing how >hard (if at all possible to derive) is to get from the NALUs >itself. Is this additional information that doesn't exist >otherwise in the bit-stream. >Also how useful is it? > The additional fields (bytes 5-6) are either not present in the bistream or difficult to derive. The A bit tells whether you can do quality or spatial layer switching at an non-IDR intra layer picture (i.e. layer representation). When the coded pattern like IBBP is in use, non-IDR intra pictures are used for random access. Compared to using only IDR pictures, higher coding efficiency can be achieved. The H.264/AVC or SVC solution to indicate the random accessiblity of a non-IDR intra picture is using recovery point SEI message. With this A bit then it is much easier to parse than to parse the recovery point SEI message, which may even be buried deeply in an SEI NAL unit. And, the SEI message may not be present in the bitstream. The T bit tells whether you can do temporal layer switching (basically, changing of frame rate). SVC specifies Temporal Layer Switching Point SEI message for the signaling when needed. Again, parsing of SEI message is harder and they may not be present in the bitstream. The P bit tells whether you may discard the packet because it contains redundant slice NAL units. The information itself is buried in the slice header, not in the fixed-length NAL unit header. The C bit tells whether the packet contains intra slices which may the only packets to be forwarded for a fast forward playback, e.g. when the network condition is extremely bad. The S or E bit tells whether the first or last slice of a layer picture in decoding order is in the packet, to enable a MANE to fastly detect slice loss and take proper action such as request a retransmission etc, as well as to allow an efficient playout buffer handling similarly as the M bit in the RTP header. The RTP header M bit in SVC still indicates the end of an access unit not a layer picture. The TL0PICIDX field indicates the index of a lowest temporal layer picture. This enables detection of loss of picture in the most important temporal layer, by receivers as well as MANEs. Again, SVC includes an SEI message solution, which is harder to parse and may not be present in the bitstream. >My big question is: Is the PASCI motivated, despite being >optional to utilize, it complicates the specification and >implementations to smaller or bigger degree. > See above. I hope the above motivations more or less resolve your concern. And, we will include more motivation text and use examples to the draft. >14. Section 9. I don't think SVC should change anything with >the media type for AVC. Use it as is, for the base layer. Some >explanation on how it relates to the SVC layers are of course >necessary. > Yes, it is specified that RFC 3984 is used for the base layer transported in its own session, which of course includes using the AVC media type. Will include some sentences to clarify the relationship of the two media types. >15. Section 9.1: sprop-interleaving-depth: Will usage of >interleaving be allowed in combination with layering? They are >after all to certain degree orthogonal usage. You can after >all have interleaving-depth 0 and still use layering with >DONs. I think there need to be a separation on the buffering >required for inserting the layers and for interleaving. > >I would propose that this is thought over and possibly new >parameters are defined to better indicate properties of the >layered stream when it comes to buffering. > Yes, layering and interleaving can be applied at the same time. See my response to comment #8 regarding the parameters. >16. Section 9.1, sprop-scalabilit-info: Please include >referense to Base64. > OK. >17. Section 9.1, sprop-leyer-ids: Why is it needed to base64 >encode the DID,QID and TID numbers? Aren't there a point of >keeping these human readable? > OK, will make these human readable. >18. Section 9.2.2: Here is some work. I think we need to >rework the offer/answer of 3984 with the additional complexity >of the layering. Agreed. Will be addressed in the new version. > >19. Section 9.2.3: This also needs clarification. > Absolutely. The whole signaling thing needs to be worked in the new version. See also comments #8 and #15. >20, Section 10, I think you will need to elaborate on why >there are no extra security issues due to the layering. > OK, will elaborate why, or describe what are the extra security issues, if any. >21. Section 13.3, Mentioning FGS when it is not yet supported >by SVC. Is that not going a bit to far? I think MGS is good >enough for motivating in this use case. > Yes, FGS stuff will be completely removed, from here and also §13.4 and §13.6 as pointed out by Thomas Rusert (thanks!) in a private email. BR, YK _______________________________________________ Audio/Video Transport Working Group avt@ietf.org https://www1.ietf.org/mailman/listinfo/avt
- [AVT] Comments on draft-ietf-avt-rtp-svc-02 Magnus Westerlund
- [AVT] RE: Comments on draft-ietf-avt-rtp-svc-02 Ye-Kui.Wang
- [AVT] Re: Comments on draft-ietf-avt-rtp-svc-02 Magnus Westerlund
- [AVT] RE: Comments on draft-ietf-avt-rtp-svc-02 Ye-Kui.Wang