Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis
"DRAGE, Keith (Keith)" <keith.drage@alcatel-lucent.com> Sun, 05 December 2010 23:58 UTC
Return-Path: <keith.drage@alcatel-lucent.com>
X-Original-To: avt@core3.amsl.com
Delivered-To: avt@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3CB7928C161 for <avt@core3.amsl.com>; Sun, 5 Dec 2010 15:58:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.664
X-Spam-Level:
X-Spam-Status: No, score=-103.664 tagged_above=-999 required=5 tests=[AWL=-1.204, BAYES_05=-1.11, HELO_EQ_FR=0.35, MANGLED_SAVELE=2.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cY28TOQLTb67 for <avt@core3.amsl.com>; Sun, 5 Dec 2010 15:58:01 -0800 (PST)
Received: from smail5.alcatel.fr (smail5.alcatel.fr [64.208.49.27]) by core3.amsl.com (Postfix) with ESMTP id 9D07628C160 for <avt@ietf.org>; Sun, 5 Dec 2010 15:57:59 -0800 (PST)
Received: from FRMRSSXCHHUB02.dc-m.alcatel-lucent.com (FRMRSSXCHHUB02.dc-m.alcatel-lucent.com [135.120.45.62]) by smail5.alcatel.fr (8.14.3/8.14.3/ICT) with ESMTP id oB5NxL06013200 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Mon, 6 Dec 2010 00:59:21 +0100
Received: from FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com ([135.120.45.46]) by FRMRSSXCHHUB02.dc-m.alcatel-lucent.com ([135.120.45.62]) with mapi; Mon, 6 Dec 2010 00:59:21 +0100
From: "DRAGE, Keith (Keith)" <keith.drage@alcatel-lucent.com>
To: "avt@ietf.org" <avt@ietf.org>
Date: Mon, 06 Dec 2010 00:59:19 +0100
Thread-Topic: WGLC on draft-ietf-avt-rfc3016bis
Thread-Index: AcuLFsBeOQY1OiawTUaiQTw9W2XKpQJiYaWg
Message-ID: <EDC0A1AE77C57744B664A310A0B23AE21E363F94@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com>
References: <00e201cb8b16$c634edd0$529ec970$%roni@huawei.com>
In-Reply-To: <00e201cb8b16$c634edd0$529ec970$%roni@huawei.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.64 on 155.132.188.13
Cc: "draft-ietf-avt-rfc3016bis.all@tools.ietf.org" <draft-ietf-avt-rfc3016bis.all@tools.ietf.org>
Subject: Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Dec 2010 23:58:03 -0000
I've reviewed the document, comments as follows 1) This document has inherited a considerable problem from the previous RFC 3016, in that much of that document is written with strange RFC 2119 usage, and pretty much entirely in the passive case. How readers assess conformance to this document I don't know. However, given that anyone who is going to implement these parts of RFC 3016 will presumably already have done so, I have limited my comments on this part to areas where I think the requirements are totally unclear, and specifically tagged them as on the existing text. If anyone thinks it would be useful to go further, then I can certainly submit a large number of further comments on the existing text, but they are not included here. 2) (Editorial). Abstract. Comments are solicited and should be addressed to the working group's mailing list at avt@ietf.org and/or the author(s). Delete this text. I don't know why people include this in an internet-draft in the first place, given that it states the obvious, and it is not needed in the RFC. 3) (Existing text). Section 1.1 The fragmentation rule recommends not to map more than one VOP in an RTP packet so that the RTP timestamp uniquely indicates the VOP time framing. On the other hand, MPEG-4 video may generate VOPs of very small size, in cases with an empty VOP (vop_coded=0) containing only VOP header or an arbitrary shaped VOP with a small number of coding blocks. To reduce the overhead for such cases, the fragmentation rule permits concatenating multiple VOPs in an RTP packet. (See fragmentation rule (4) in section 3.2 and marker bit and timestamp in section 3.1.) The use of recommends here is a pseudo quote from later in the document. It would be better if it was a real quote as in the previous paragraph, thus it should be modified as follows, The fragmentation rule "Different VOPs SHOULD be fragmented into different RTP packets" is made so that the RTP timestamp uniquely indicates the VOP time framing. On the other hand, MPEG-4 video may generate VOPs of very small size, in cases with an empty VOP (vop_coded=0) containing only VOP header or an arbitrary shaped VOP with a small number of coding blocks. To reduce the overhead for such cases, the fragmentation rule permits concatenating multiple VOPs in an RTP packet. (See fragmentation rule (4) in section 3.2 and marker bit and timestamp in section 3.1.) 4) (Existing text). Section 1.2 in RTP transmission there is no need for the last two features. Therefore, these two features MUST NOT be used in applications based on RTP packetization specified by this document. Since LATM has been developed for only natural audio coding tools, i.e., not for synthesis tools, it seems difficult to transmit Structured Audio (SA) data and Text to Speech Interface (TTSI) data by LATM. Therefore, SA data and TTSI data MUST NOT be transported by the RTP packetization in this document. Section 1.2 is an introductory section. Therefore it is inappropriate to start introducing RFC 2119 language at this stage. As far as I can see, these requirements are not elsewhere in the document, so we will need to create a place for them. I suggest a new section 3 immediate before the existing section 3, and renumber accordingly (given that the numbering of the existing RFC 3016 has not been preserved. Similar considerations apply to the following two paragraphs: For transmission of scalable streams, audio data of each layer SHOULD be packetized onto different RTP streams allowing for the different layers to be treated differently at the IP level, for example via some means of differentiated service. On the other hand, all configuration data of the scalable streams are contained in one LATM configuration data "StreamMuxConfig" and every scalable layer shares the StreamMuxConfig. The mapping between each layer and its configuration data is achieved by LATM header information attached to the audio data. In order to indicate the dependency information of the scalable streams, the signaling mechanism as specified in [RFC5583] SHOULD be used (see section 4.2). For MPEG-4 Audio coding tools, as is true for other audio coders, if the payload is a single audio frame, packet loss will not impair the decodability of adjacent packets. Therefore, the additional media specific header for recovering errors will not be required for MPEG-4 Audio. Existing RTP protection mechanisms, such as Generic Forward Error Correction (RFC 5109 [RFC5109]) and Redundant Audio Data (RFC 2198 [RFC2198]), MAY be applied to improve error resiliency. 5) Section 1.3. I first looked for this at the end of the document, in some kind of appendix, because that is where I expected to find it, and I wanted to check this first. So my first comment is that while there is no rule on this, I would prefer it to appear at the end of the document. 6) Section 1.3. I do not think the list as currently drafted gives a very good indication to existing implementors at what they need to look at in order to update their implementations. This should be redrafted to identify key things that are revised implementation will need to look at. For example: o The audio parameter "SBR-enabled" is not defined within RFC 3016 but used by 3GPP should become something like: o The use of an audio parameter "SBR-enabled" is now defined in this document, which is used by 3GPP implementations [informative reference to 3GPP specification here]. 6) Section 1.3. For the text: Furthermore some comments have been addressed and signaling support for MPEG surround [23003-1] was added. It should be noted that the audio payload format described here has some known limitations. For new system designs RFC 3640 [RFC3640] is recommended. Remove the contruct "It should be noted that the". It is mixing material that could be confused as normative with definitely informative material. It is also unclear how RFC 3640 is recommended, as the reference is informative, and this is the only place it is referenced. I assume you mean that new implementations would not be using this document at all. But that surely should be called out in the Abstract of this document. 7) Section 2 (Editorial) This memo makes use of terms, specified in [14496-2], [14496-3], and [23003-1]. In addition, the following terms are used in this document and have specific meaning within the context of this document. Change to: This document makes use of terms, specified in [14496-2], [14496-3], and [23003-1]. In addition, the following terms are used in this document and have specific meaning within the context of this document. 8) Section 3 (Existing text) This section specifies RTP packetization rules for MPEG-4 Visual content. An MPEG-4 Visual bitstream is mapped directly onto RTP packets without the addition of extra header fields or any removal of Visual syntax elements. The Combined Configuration/Elementary stream mode MUST be used so that configuration information will be carried to the same RTP port as the elementary stream. (see 6.2.1 "Start codes" of ISO/IEC 14496-2 [14496-2]) The configuration information MAY additionally be specified by some out-of-band means. If needed for an H.323 terminal, H.245 codepoint "decoderConfigurationInformation" MUST be used for this purpose. If needed by systems using Media Type parameters and SDP parameters, e.g., SIP and RTSP, the optional parameter "config" MUST be used to specify the configuration information (see 5.1 and 5.2). The abbreviation "e.g." means for example, and "MUST" can never be and example. I supect the example here is only the string "e.g., SIP and RTSP" so I suggest that this bit is placed in parenthesis, as follows: "(e.g., SIP and RTSP)" 9) Section 3.2 (Existing text). In the following text: (5) It is RECOMMENDED that a single video packet is sent as a single RTP packet. The size of a video packet SHOULD be adjusted in such a way that the resulting RTP packet is not larger than the path-MTU. Note: Rule (5) does not apply when the video packet is disabled by the coder configuration (by setting resync_marker_disable in the VOL header to 1), or in coding tools where the video packet is not supported. In this case, a VOP MAY be split at arbitrary byte- positions. remove the mix of the word "note" with normative text. I suggest: (5) It is RECOMMENDED that a single video packet is sent as a single RTP packet. The size of a video packet SHOULD be adjusted in such a way that the resulting RTP packet is not larger than the path-MTU. If the video packet is disabled by the coder configuration (by setting resync_marker_disable in the VOL header to 1), or in coding tools where the video packet is not supported, a VOP MAY be split at arbitrary byte- positions. 10) Section 3.3 (Existing text. In the following text: When concatenating more than one video packets into an RTP packet, VOP header or video_packet_header() shall not be placed in the middle of the RTP payload. The packetization as in (b) is not allowed by criterion (2) due to the aspect of the error resiliency. Comparing this example with Figure 2(d), although two video packets are mapped onto two RTP packets in both cases, the packet-loss resiliency is not identical. Namely, if the second RTP packet is lost, both video packets 1 and 2 are lost in the case of Figure 3(b) whereas only video packet 2 is lost in the case of Figure 2(d). is the "shall" here denoting a requirements (in which case it should be capitalised and replaced with "MUST", and moved to a section that is not titled "Example") or is it denoting something else, e.g. an impossibility, in which case change the modal auxiliary to something else. 11) Section 4.2, 1st paragraph: Payload Type (PT): The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile for a particular class of applications will assign a payload type for this encoding, or if that is not done then a payload type in the dynamic range shall be chosen by means of an out-of-band signaling protocol (e.g., H.245, SIP, etc). In the dynamic assignment of RTP payload types for scalable streams, a different value SHOULD be assigned to each layer. The dependency relationships between the enhance layer and the base layer SHOULD be signaled as specified in [RFC5583]. An example of the use of such signaling for scalable audio streams can be found in [RFC5691]. Write the above new text in the active, i.e. "In the dynamic assignment of RTP payload types for scalable streams, the server SHOULD assign a different value to each layer. The server SHOULD signal ...". Under what circumstances does the "SHOULD" not apply. It is good practice when using "SHOULD" to give guidance tot he reader of the conditions under which it is safe to ignore the requirement. Otherwise may implementors treat the word as a straight option and ignore it. It is also noted that a requirement specifying the assignment of the payload type is inconsistent with the first sentence, which says "assignment of an RTP payload type for this new packet format is outside the scope of this document". This sentence will therefore need to be altered. 12) Section 4.2 (Existing text). Is the following meant to be normative: Timestamp: The timestamp indicates the sampling instance of the first audio frame contained in the RTP packet. Timestamps are recommended to start at a random value for security reasons. It is currently drafted as such, but "recommended" is lower case. 13) Section 4.3 (Existing text). What is the meaning of the final sentence: If it cannot, the audioMuxElement MAY be fragmented and spread across multiple packets. As written it is telling me I have an option to fragment it and spread it across multiple packets. Is that what is meant. 14) Section 5.1. The following text: Note, any unspecified parameter MUST be ignored by the receiver to ensure that additional parameters can be added in any future revision of this specification. has the word "MUST" mandatory requirement, following the word "note" which many people interpret as informative. Assuming a mandatory requirement is meant, delete the word "Note". The text should also be redrafted in the active, e.g.: The receiver MUST ignore any unspecified parameter, to ensure that additional parameters can be added in any future revision of this specification. This is also a registration section. Surely normative requirements on the implementation should be elsewhere in the document, so I suggest this text is moved to another more appropriate section. 15) Section 5.1. Published specification: The specifications for MPEG-4 Visual streams are presented in ISO/ IEC 14469-2 [14496-2]. The RTP payload format is described in RFC XXXX. Why not just say: "The RTP payload format is described in this document." Same change for the following two entries, and also in section 5.3. 16) Section 5.3, 5.4.1.3 and 5.4.1.4. This document does not define the coding of the SBR-enabled parameter. I get the impression from reading the document that it is defined elsewhere, but no reference is made to the point where it is defined. That would surely need to be a normative reference. 17) Section 5.3. Note, any unspecified parameter MUST be ignored by the receiver to ensure that additional parameters can be added in any future revision of this specification. See comments on similar text at the start of section 4.1. 18) Section 5.3 (Existing text): Required parameters: rate: the rate parameter indicates the RTP time stamp clock rate. The default value is 90000. Other rates MAY be specified only if they are set to the same value as the audio sampling rate (number of samples per second). Change "specified" to "indicated". The AVT working group is the specifier. 19) Section 5.3 (Existing text). There are a number of lower case "shall" here. Do these constitute requirements. How is the reader of this specification expected to understand and interpret them. If they are not requirements, then change them. 19) Section 5.3 (existing text) MPS-profile-level-id: a decimal representation of the MPEG Surround Profile Level indication as defined in ISO/IEC 14496-3 [14496-3]. This parameter indicates the MPEG Surround profile and level that the decoder must be capable in order to decode the stream. What is the meaning of the final sentence? 20) Section 5.3 (existing text) ptime: RECOMMENDED duration of each packet in milliseconds. Why is this a requirement. Hows does an implementation conform to it. 21) Section 5.3 If this parameter is set to 0, a decoder SHALL expect that SBR is not used. If this parameter is set to 1, a decoder SHOULD upsample the audio data with the SBR tool, regardless whether SBR data is present in the stream or not. First sentence is not written as a conformable requirement. What does the decoder need to do to conform? For the second sentence, see comments earlier on use of SHOULD. 22) Section 5.3 If the presence of SBR can not be detected from out-of-band configuration and the SBR-enabled parameter is not present, the parameter defaults to 1 for an SBR-capable decoder. If the resulting output sampling rate or the computational complexity is not supported, the SBR tool may be disabled or run in downsampled mode. Change "may" to "can" to indicate that it is definitely not a conformance requirement. 23) Section 5.4.1.3 In this example, the presence of SBR can not be determined by the SDP parameter set. The clock rate represents the core codec sampling rate. An SBR enabled decoder SHOULD use the SBR tool to upsample the audio data if complexity and resulting output sampling rate permits. This section is entitled "Example". Such sections ideally should not include conformable requirements. So if this is a requirement then move it somewhere else. Also comments elsewhere about the use of SHOULD also apply here. 24) Section 5.4.1.4 In this example, the clock rate is still 24000 and this information should be used for RTP timestamp calculation. The value of 24000 is used to support old AAC decoders. This makes the decoder supporting only AAC understand the HE AAC coded data, although only plain AAC is supported. A HE AAC decoder is able to generate output data with the SBR sampling rate. Change "should be" to "is" to avoid confusion with a conformable requirement. 25) Section 6. This memo defines additional optional format parameters to the Media Type "audio" and its subtype "MP4A-LATM", as defined in RFC XXXX. The Media Type parameters are defined in sections 5.1 and 5.3 of RFC XXXX. Just say "sections 5.1 and 5.3 of this document". 26) Section 6.1. This memo defines the following additional optional parameters which SHOULD be used if SBR or MPEG Surround data is present inside the payload of an AAC elementary stream. We do not need a SHOULD (or any other RFC 2119 language in relation to an IANA considerations section. I suspect this is just superfluous, and not a real conformable requirement on implementations. 27) Section 6 An IANA considerations section should basically consists of a set of instructions to IANA as to what they need to include in their tables. See http://www.iana.org/protocols/ I cannot ascertain from this section whether they need to do work. Are there any existing RFC 3013 registrations where the reference needs to be updated? Are there any of these parameters that actually need registration? regards Keith ________________________________ From: Roni Even [mailto:Even.roni@huawei.com] Sent: Tuesday, November 23, 2010 2:00 PM To: avt@ietf.org Cc: DRAGE, Keith (Keith); draft-ietf-avt-rfc3016bis.all@tools.ietf.org Subject: WGLC on draft-ietf-avt-rfc3016bis Hi, I would like to start a working group last call on http://tools.ietf.org/html/draft-ietf-avt-rfc3016bis-01 RTP Payload Format for MPEG-4 Audio/Visual Streams The WGLC will end on December 13th , 2010 Please review the draft and send comments to the list Note that section 1.3 and 1.4 discuss the changes from RFC3016 Roni Even AVT co- ass=MsoNormal>
- [AVT] WGLC on draft-ietf-avt-rfc3016bis Roni Even
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Ali C. Begen (abegen)
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Qin Wu
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Roni Even
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Ali C. Begen (abegen)
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Roni Even
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis DRAGE, Keith (Keith)
- Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de