Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis

"DRAGE, Keith (Keith)" <keith.drage@alcatel-lucent.com> Sun, 05 December 2010 23:58 UTC

Return-Path: <keith.drage@alcatel-lucent.com>
X-Original-To: avt@core3.amsl.com
Delivered-To: avt@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3CB7928C161 for <avt@core3.amsl.com>; Sun, 5 Dec 2010 15:58:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.664
X-Spam-Level:
X-Spam-Status: No, score=-103.664 tagged_above=-999 required=5 tests=[AWL=-1.204, BAYES_05=-1.11, HELO_EQ_FR=0.35, MANGLED_SAVELE=2.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cY28TOQLTb67 for <avt@core3.amsl.com>; Sun, 5 Dec 2010 15:58:01 -0800 (PST)
Received: from smail5.alcatel.fr (smail5.alcatel.fr [64.208.49.27]) by core3.amsl.com (Postfix) with ESMTP id 9D07628C160 for <avt@ietf.org>; Sun, 5 Dec 2010 15:57:59 -0800 (PST)
Received: from FRMRSSXCHHUB02.dc-m.alcatel-lucent.com (FRMRSSXCHHUB02.dc-m.alcatel-lucent.com [135.120.45.62]) by smail5.alcatel.fr (8.14.3/8.14.3/ICT) with ESMTP id oB5NxL06013200 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Mon, 6 Dec 2010 00:59:21 +0100
Received: from FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com ([135.120.45.46]) by FRMRSSXCHHUB02.dc-m.alcatel-lucent.com ([135.120.45.62]) with mapi; Mon, 6 Dec 2010 00:59:21 +0100
From: "DRAGE, Keith (Keith)" <keith.drage@alcatel-lucent.com>
To: "avt@ietf.org" <avt@ietf.org>
Date: Mon, 06 Dec 2010 00:59:19 +0100
Thread-Topic: WGLC on draft-ietf-avt-rfc3016bis
Thread-Index: AcuLFsBeOQY1OiawTUaiQTw9W2XKpQJiYaWg
Message-ID: <EDC0A1AE77C57744B664A310A0B23AE21E363F94@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com>
References: <00e201cb8b16$c634edd0$529ec970$%roni@huawei.com>
In-Reply-To: <00e201cb8b16$c634edd0$529ec970$%roni@huawei.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.64 on 155.132.188.13
Cc: "draft-ietf-avt-rfc3016bis.all@tools.ietf.org" <draft-ietf-avt-rfc3016bis.all@tools.ietf.org>
Subject: Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Dec 2010 23:58:03 -0000

I've reviewed the document, comments as follows

1)      This document has inherited a considerable problem from the previous RFC 3016, in that much of that document is written with strange RFC 2119 usage, and pretty much entirely in the passive case. How readers assess conformance to this document I don't know. However, given that anyone who is going to implement these parts of RFC 3016 will presumably already have done so, I have limited my comments on this part to areas where I think the requirements are totally unclear, and specifically tagged them as on the existing text. If anyone thinks it would be useful to go further, then I can certainly submit a large number of further comments on the existing text, but they are not included here.

2)      (Editorial). Abstract.

   Comments are solicited and should be addressed to the working group's
   mailing list at avt@ietf.org and/or the author(s).

Delete this text. I don't know why people include this in an internet-draft in the first place, given that it states the obvious, and it is not needed in the RFC.

3)      (Existing text). Section 1.1

   The fragmentation rule recommends not to map more than one VOP in an
   RTP packet so that the RTP timestamp uniquely indicates the VOP time
   framing.  On the other hand, MPEG-4 video may generate VOPs of very
   small size, in cases with an empty VOP (vop_coded=0) containing only
   VOP header or an arbitrary shaped VOP with a small number of coding
   blocks.  To reduce the overhead for such cases, the fragmentation
   rule permits concatenating multiple VOPs in an RTP packet.  (See
   fragmentation rule (4) in section 3.2 and marker bit and timestamp in
   section 3.1.)

The use of recommends here is a pseudo quote from later in the document. It would be better if it was a real quote as in the previous paragraph, thus it should be modified as follows,

   The fragmentation rule "Different VOPs SHOULD be fragmented into different RTP packets" is made
   so that the RTP timestamp uniquely indicates the VOP time
   framing.  On the other hand, MPEG-4 video may generate VOPs of very
   small size, in cases with an empty VOP (vop_coded=0) containing only
   VOP header or an arbitrary shaped VOP with a small number of coding
   blocks.  To reduce the overhead for such cases, the fragmentation
   rule permits concatenating multiple VOPs in an RTP packet.  (See
   fragmentation rule (4) in section 3.2 and marker bit and timestamp in
   section 3.1.)

4)      (Existing text). Section 1.2

   in RTP transmission there is no need for the last two features.
   Therefore, these two features MUST NOT be used in applications based
   on RTP packetization specified by this document.  Since LATM has been
   developed for only natural audio coding tools, i.e., not for
   synthesis tools, it seems difficult to transmit Structured Audio (SA)
   data and Text to Speech Interface (TTSI) data by LATM.  Therefore, SA
   data and TTSI data MUST NOT be transported by the RTP packetization
   in this document.

Section 1.2 is an introductory section. Therefore it is inappropriate to start introducing RFC 2119 language at this stage. As far as I can see, these requirements are not elsewhere in the document, so we will need to create a place for them. I suggest a new section 3 immediate before the existing section 3, and renumber accordingly (given that the numbering of the existing RFC 3016 has not been preserved.

Similar considerations apply to the following two paragraphs:

   For transmission of scalable streams, audio data of each layer SHOULD
   be packetized onto different RTP streams allowing for the different
   layers to be treated differently at the IP level, for example via
   some means of differentiated service.  On the other hand, all
   configuration data of the scalable streams are contained in one LATM
   configuration data "StreamMuxConfig" and every scalable layer shares
   the StreamMuxConfig.  The mapping between each layer and its
   configuration data is achieved by LATM header information attached to
   the audio data.  In order to indicate the dependency information of
   the scalable streams, the signaling mechanism as specified in
   [RFC5583] SHOULD be used (see section 4.2).

   For MPEG-4 Audio coding tools, as is true for other audio coders, if
   the payload is a single audio frame, packet loss will not impair the
   decodability of adjacent packets.  Therefore, the additional media
   specific header for recovering errors will not be required for MPEG-4
   Audio.  Existing RTP protection mechanisms, such as Generic Forward
   Error Correction (RFC 5109 [RFC5109]) and Redundant Audio Data (RFC
   2198 [RFC2198]), MAY be applied to improve error resiliency.

5)      Section 1.3.

I first looked for this at the end of the document, in some kind of appendix, because that is where I expected to find it, and I wanted to check this first. So my first comment is that while there is no rule on this, I would prefer it to appear at the end of the document.

6)      Section 1.3.

I do not think the list as currently drafted gives a very good indication to existing implementors at what they need to look at in order to update their implementations. This should be redrafted to identify key things that are revised implementation will need to look at. For example:

   o  The audio parameter "SBR-enabled" is not defined within RFC 3016
      but used by 3GPP

should become something like:

   o  The use of an audio parameter "SBR-enabled" is now defined in this document, which is used by 3GPP implementations [informative reference to 3GPP specification here].

6)      Section 1.3. For the text:

   Furthermore some comments have been addressed and signaling support
   for MPEG surround [23003-1] was added.  It should be noted that the
   audio payload format described here has some known limitations.  For
   new system designs RFC 3640 [RFC3640] is recommended.

Remove the contruct "It should be noted that the". It is mixing material that could be confused as normative with definitely informative material.

It is also unclear how RFC 3640 is recommended, as the reference is informative, and this is the only place it is referenced. I assume you mean that new implementations would not be using this document at all. But that surely should be called out in the Abstract of this document.

7)      Section 2 (Editorial)

   This memo makes use of terms, specified in [14496-2], [14496-3], and
   [23003-1].  In addition, the following terms are used in this
   document and have specific meaning within the context of this
   document.

Change to:

   This document makes use of terms, specified in [14496-2], [14496-3], and
   [23003-1].  In addition, the following terms are used in this
   document and have specific meaning within the context of this
   document.

8)      Section 3 (Existing text)

   This section specifies RTP packetization rules for MPEG-4 Visual
   content.  An MPEG-4 Visual bitstream is mapped directly onto RTP
   packets without the addition of extra header fields or any removal of
   Visual syntax elements.  The Combined Configuration/Elementary stream
   mode MUST be used so that configuration information will be carried
   to the same RTP port as the elementary stream. (see 6.2.1 "Start
   codes" of ISO/IEC 14496-2 [14496-2]) The configuration information
   MAY additionally be specified by some out-of-band means.  If needed
   for an H.323 terminal, H.245 codepoint
   "decoderConfigurationInformation" MUST be used for this purpose.  If
   needed by systems using Media Type parameters and SDP parameters,
   e.g., SIP and RTSP, the optional parameter "config" MUST be used to
   specify the configuration information (see 5.1 and 5.2).

The abbreviation "e.g." means for example, and "MUST" can never be and example. I supect the example here is only the string "e.g., SIP and RTSP" so I suggest that this bit is placed in parenthesis, as follows: "(e.g., SIP and RTSP)"

9)      Section 3.2 (Existing text). In the following text:

  (5) It is RECOMMENDED that a single video packet is sent as a single
   RTP packet.  The size of a video packet SHOULD be adjusted in such a
   way that the resulting RTP packet is not larger than the path-MTU.
   Note: Rule (5) does not apply when the video packet is disabled by
   the coder configuration (by setting resync_marker_disable in the VOL
   header to 1), or in coding tools where the video packet is not
   supported.  In this case, a VOP MAY be split at arbitrary byte-
   positions.

remove the mix of the word "note" with normative text. I suggest:

  (5) It is RECOMMENDED that a single video packet is sent as a single
   RTP packet.  The size of a video packet SHOULD be adjusted in such a
   way that the resulting RTP packet is not larger than the path-MTU.
   If the video packet is disabled by
   the coder configuration (by setting resync_marker_disable in the VOL
   header to 1), or in coding tools where the video packet is not
   supported, a VOP MAY be split at arbitrary byte-
   positions.

10)     Section 3.3 (Existing text. In the following text:

   When concatenating more than one video packets into an RTP packet,
   VOP header or video_packet_header() shall not be placed in the middle
   of the RTP payload.  The packetization as in (b) is not allowed by
   criterion (2) due to the aspect of the error resiliency.  Comparing
   this example with Figure 2(d), although two video packets are mapped
   onto two RTP packets in both cases, the packet-loss resiliency is not
   identical.  Namely, if the second RTP packet is lost, both video
   packets 1 and 2 are lost in the case of Figure 3(b) whereas only
   video packet 2 is lost in the case of Figure 2(d).

is the "shall" here denoting a requirements (in which case it should be capitalised and replaced with "MUST", and moved to a section that is not titled "Example") or is it denoting something else, e.g. an impossibility, in which case change the modal auxiliary to something else.

11)     Section 4.2, 1st paragraph:

   Payload Type (PT): The assignment of an RTP payload type for this new
   packet format is outside the scope of this document, and will not be
   specified here.  It is expected that the RTP profile for a particular
   class of applications will assign a payload type for this encoding,
   or if that is not done then a payload type in the dynamic range shall
   be chosen by means of an out-of-band signaling protocol (e.g., H.245,
   SIP, etc).  In the dynamic assignment of RTP payload types for
   scalable streams, a different value SHOULD be assigned to each layer.
   The dependency relationships between the enhance layer and the base
   layer SHOULD be signaled as specified in [RFC5583].  An example of
   the use of such signaling for scalable audio streams can be found in
   [RFC5691].

Write the above new text in the active, i.e. "In the dynamic assignment of RTP payload types for scalable streams, the server SHOULD assign a different value to each layer. The server SHOULD signal ...".

Under what circumstances does the "SHOULD" not apply. It is good practice when using "SHOULD" to give guidance tot he reader of the conditions under which it is safe to ignore the requirement. Otherwise may implementors treat the word as a straight option and ignore it.

It is also noted that a requirement specifying the assignment of the payload type is inconsistent with the first sentence, which says "assignment of an RTP payload type for this new packet format is outside the scope of this document". This sentence will therefore need to be altered.

12)     Section 4.2 (Existing text). Is the following meant to be normative:

   Timestamp: The timestamp indicates the sampling instance of the first
   audio frame contained in the RTP packet.  Timestamps are recommended
   to start at a random value for security reasons.

It is currently drafted as such, but "recommended" is lower case.

13)     Section 4.3 (Existing text). What is the meaning of the final sentence:

   If it cannot, the audioMuxElement MAY
   be fragmented and spread across multiple packets.

As written it is telling me I have an option to fragment it and spread it across multiple packets. Is that what is meant.

14)     Section 5.1. The following text:

   Note, any unspecified parameter MUST be ignored by the receiver to
   ensure that additional parameters can be added in any future revision
   of this specification.

has the word "MUST" mandatory requirement, following the word "note" which many people interpret as informative. Assuming a mandatory requirement is meant, delete the word "Note". The text should also be redrafted in the active, e.g.:

   The receiver MUST ignore any unspecified parameter, to
   ensure that additional parameters can be added in any future revision
   of this specification.

This is also a registration section. Surely normative requirements on the implementation should be elsewhere in the document, so I suggest this text is moved to another more appropriate section.

15)     Section 5.1.

   Published specification:

      The specifications for MPEG-4 Visual streams are presented in ISO/
      IEC 14469-2 [14496-2].  The RTP payload format is described in RFC
      XXXX.

Why not just say: "The RTP payload format is described in this document."

Same change for the following two entries, and also in section 5.3.

16)     Section 5.3, 5.4.1.3 and 5.4.1.4. This document does not define the coding of the SBR-enabled parameter. I get the impression from reading the document that it is defined elsewhere, but no reference is made to the point where it is defined. That would surely need to be a normative reference.

17)     Section 5.3.

   Note, any unspecified parameter MUST be ignored by the receiver to
   ensure that additional parameters can be added in any future revision
   of this specification.

See comments on similar text at the start of section 4.1.

18)     Section 5.3 (Existing text):

  Required parameters:

      rate: the rate parameter indicates the RTP time stamp clock rate.
      The default value is 90000.  Other rates MAY be specified only if
      they are set to the same value as the audio sampling rate (number
      of samples per second).

Change "specified" to "indicated". The AVT working group is the specifier.

19)     Section 5.3 (Existing text). There are a number of lower case "shall" here. Do these constitute requirements. How is the reader of this specification expected to understand and interpret them.

If they are not requirements, then change them.

19)     Section 5.3 (existing text)

      MPS-profile-level-id: a decimal representation of the MPEG
      Surround Profile Level indication as defined in ISO/IEC 14496-3
      [14496-3].  This parameter indicates the MPEG Surround profile and
      level that the decoder must be capable in order to decode the
      stream.

What is the meaning of the final sentence?

20)     Section 5.3 (existing text)

     ptime: RECOMMENDED duration of each packet in milliseconds.

Why is this a requirement. Hows does an implementation conform to it.

21)     Section 5.3

     If this parameter is set to 0, a decoder SHALL expect that SBR is
      not used.  If this parameter is set to 1, a decoder SHOULD
      upsample the audio data with the SBR tool, regardless whether SBR
      data is present in the stream or not.

First sentence is not written as a conformable requirement. What does the decoder need to do to conform? For the second sentence, see comments earlier on use of SHOULD.

22)     Section 5.3

      If the presence of SBR can not be detected from out-of-band
      configuration and the SBR-enabled parameter is not present, the
      parameter defaults to 1 for an SBR-capable decoder.  If the
      resulting output sampling rate or the computational complexity is
      not supported, the SBR tool may be disabled or run in downsampled
      mode.

Change "may" to "can" to indicate that it is definitely not a conformance requirement.

23)     Section 5.4.1.3

   In this example, the presence of SBR can not be determined by the SDP
   parameter set.  The clock rate represents the core codec sampling
   rate.  An SBR enabled decoder SHOULD use the SBR tool to upsample the
   audio data if complexity and resulting output sampling rate permits.

This section is entitled "Example". Such sections ideally should not include conformable requirements. So if this is a requirement then move it somewhere else. Also comments elsewhere about the use of SHOULD also apply here.

24)     Section 5.4.1.4

   In this example, the clock rate is still 24000 and this information
   should be used for RTP timestamp calculation.  The value of 24000 is
   used to support old AAC decoders.  This makes the decoder supporting
   only AAC understand the HE AAC coded data, although only plain AAC is
   supported.  A HE AAC decoder is able to generate output data with the
   SBR sampling rate.

Change "should be" to "is" to avoid confusion with a conformable requirement.

25)     Section 6.

   This memo defines additional optional format parameters to the Media
   Type "audio" and its subtype "MP4A-LATM", as defined in RFC XXXX.
   The Media Type parameters are defined in sections 5.1 and 5.3 of RFC
   XXXX.

Just say "sections 5.1 and 5.3 of this document".

26)     Section 6.1.

   This memo defines the following additional optional parameters which
   SHOULD be used if SBR or MPEG Surround data is present inside the
   payload of an AAC elementary stream.

We do not need a SHOULD (or any other RFC 2119 language in relation to an IANA considerations section.

I suspect this is just superfluous, and not a real conformable requirement on implementations.

27)     Section 6

An IANA considerations section should basically consists of a set of instructions to IANA as to what they need to include in their tables. See http://www.iana.org/protocols/

I cannot ascertain from this section whether they need to do work.

Are there any existing RFC 3013 registrations where the reference needs to be updated?

Are there any of these parameters that actually need registration?



regards

Keith


________________________________

        From: Roni Even [mailto:Even.roni@huawei.com]
        Sent: Tuesday, November 23, 2010 2:00 PM
        To: avt@ietf.org
        Cc: DRAGE, Keith (Keith); draft-ietf-avt-rfc3016bis.all@tools.ietf.org
        Subject: WGLC on draft-ietf-avt-rfc3016bis



        Hi,

        I would like to start a working group last call on http://tools.ietf.org/html/draft-ietf-avt-rfc3016bis-01



        RTP Payload Format for MPEG-4 Audio/Visual Streams



        The WGLC will end on December 13th , 2010

        Please review the draft and send comments to the list



        Note that section 1.3 and 1.4 discuss the changes from RFC3016



        Roni Even

        AVT co- ass=MsoNormal>