Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis

"DRAGE, Keith (Keith)" <keith.drage@alcatel-lucent.com> Sun, 05 December 2010 23:58 UTC

From: "DRAGE, Keith (Keith)" <keith.drage@alcatel-lucent.com>
To: "avt@ietf.org" <avt@ietf.org>
Date: Mon, 06 Dec 2010 00:59:19 +0100
Thread-Topic: WGLC on draft-ietf-avt-rfc3016bis
Thread-Index: AcuLFsBeOQY1OiawTUaiQTw9W2XKpQJiYaWg
Message-ID: <EDC0A1AE77C57744B664A310A0B23AE21E363F94@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com>
References: <00e201cb8b16$c634edd0$529ec970$%roni@huawei.com>
In-Reply-To: <00e201cb8b16$c634edd0$529ec970$%roni@huawei.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "draft-ietf-avt-rfc3016bis.all@tools.ietf.org" <draft-ietf-avt-rfc3016bis.all@tools.ietf.org>
Subject: Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis
Precedence: list

I've reviewed the document, comments as follows

1) This document has inherited a considerable problem from the previous RFC 3016, in that much of that document is written with strange RFC 2119 usage, and pretty much entirely in the passive case. How readers assess conformance to this document I don't know. However, given that anyone who is going to implement these parts of RFC 3016 will presumably already have done so, I have limited my comments on this part to areas where I think the requirements are totally unclear, and specifically tagged them as on the existing text. If anyone thinks it would be useful to go further, then I can certainly submit a large number of further comments on the existing text, but they are not included here.

2) (Editorial). Abstract.

Comments are solicited and should be addressed to the working group's
mailing list at avt@ietf.org and/or the author(s).

Delete this text. I don't know why people include this in an internet-draft in the first place, given that it states the obvious, and it is not needed in the RFC.

3) (Existing text). Section 1.1

The fragmentation rule recommends not to map more than one VOP in an
RTP packet so that the RTP timestamp uniquely indicates the VOP time
framing. On the other hand, MPEG-4 video may generate VOPs of very
small size, in cases with an empty VOP (vop_coded=0) containing only
VOP header or an arbitrary shaped VOP with a small number of coding
blocks. To reduce the overhead for such cases, the fragmentation
rule permits concatenating multiple VOPs in an RTP packet. (See
fragmentation rule (4) in section 3.2 and marker bit and timestamp in
section 3.1.)

The use of recommends here is a pseudo quote from later in the document. It would be better if it was a real quote as in the previous paragraph, thus it should be modified as follows,

The fragmentation rule "Different VOPs SHOULD be fragmented into different RTP packets" is made
so that the RTP timestamp uniquely indicates the VOP time
framing. On the other hand, MPEG-4 video may generate VOPs of very
small size, in cases with an empty VOP (vop_coded=0) containing only
VOP header or an arbitrary shaped VOP with a small number of coding
blocks. To reduce the overhead for such cases, the fragmentation
rule permits concatenating multiple VOPs in an RTP packet. (See
fragmentation rule (4) in section 3.2 and marker bit and timestamp in
section 3.1.)

4) (Existing text). Section 1.2

in RTP transmission there is no need for the last two features.
Therefore, these two features MUST NOT be used in applications based
on RTP packetization specified by this document. Since LATM has been
developed for only natural audio coding tools, i.e., not for
synthesis tools, it seems difficult to transmit Structured Audio (SA)
data and Text to Speech Interface (TTSI) data by LATM. Therefore, SA
data and TTSI data MUST NOT be transported by the RTP packetization
in this document.

Section 1.2 is an introductory section. Therefore it is inappropriate to start introducing RFC 2119 language at this stage. As far as I can see, these requirements are not elsewhere in the document, so we will need to create a place for them. I suggest a new section 3 immediate before the existing section 3, and renumber accordingly (given that the numbering of the existing RFC 3016 has not been preserved.

Similar considerations apply to the following two paragraphs:

For transmission of scalable streams, audio data of each layer SHOULD
be packetized onto different RTP streams allowing for the different
layers to be treated differently at the IP level, for example via
some means of differentiated service. On the other hand, all
configuration data of the scalable streams are contained in one LATM
configuration data "StreamMuxConfig" and every scalable layer shares
the StreamMuxConfig. The mapping between each layer and its
configuration data is achieved by LATM header information attached to
the audio data. In order to indicate the dependency information of
the scalable streams, the signaling mechanism as specified in
[RFC5583] SHOULD be used (see section 4.2).

For MPEG-4 Audio coding tools, as is true for other audio coders, if
the payload is a single audio frame, packet loss will not impair the
decodability of adjacent packets. Therefore, the additional media
specific header for recovering errors will not be required for MPEG-4
Audio. Existing RTP protection mechanisms, such as Generic Forward
Error Correction (RFC 5109 [RFC5109]) and Redundant Audio Data (RFC
2198 [RFC2198]), MAY be applied to improve error resiliency.

5) Section 1.3.

I first looked for this at the end of the document, in some kind of appendix, because that is where I expected to find it, and I wanted to check this first. So my first comment is that while there is no rule on this, I would prefer it to appear at the end of the document.

6) Section 1.3.

I do not think the list as currently drafted gives a very good indication to existing implementors at what they need to look at in order to update their implementations. This should be redrafted to identify key things that are revised implementation will need to look at. For example:

o The audio parameter "SBR-enabled" is not defined within RFC 3016
but used by 3GPP

should become something like:

o The use of an audio parameter "SBR-enabled" is now defined in this document, which is used by 3GPP implementations [informative reference to 3GPP specification here].

6) Section 1.3. For the text:

Furthermore some comments have been addressed and signaling support
for MPEG surround [23003-1] was added. It should be noted that the
audio payload format described here has some known limitations. For
new system designs RFC 3640 [RFC3640] is recommended.

Remove the contruct "It should be noted that the". It is mixing material that could be confused as normative with definitely informative material.

It is also unclear how RFC 3640 is recommended, as the reference is informative, and this is the only place it is referenced. I assume you mean that new implementations would not be using this document at all. But that surely should be called out in the Abstract of this document.

7) Section 2 (Editorial)

This memo makes use of terms, specified in [14496-2], [14496-3], and
[23003-1]. In addition, the following terms are used in this
document and have specific meaning within the context of this
document.

Change to:

This document makes use of terms, specified in [14496-2], [14496-3], and
[23003-1]. In addition, the following terms are used in this
document and have specific meaning within the context of this
document.

8) Section 3 (Existing text)

This section specifies RTP packetization rules for MPEG-4 Visual
content. An MPEG-4 Visual bitstream is mapped directly onto RTP
packets without the addition of extra header fields or any removal of
Visual syntax elements. The Combined Configuration/Elementary stream
mode MUST be used so that configuration information will be carried
to the same RTP port as the elementary stream. (see 6.2.1 "Start
codes" of ISO/IEC 14496-2 [14496-2]) The configuration information
MAY additionally be specified by some out-of-band means. If needed
for an H.323 terminal, H.245 codepoint
"decoderConfigurationInformation" MUST be used for this purpose. If
needed by systems using Media Type parameters and SDP parameters,
e.g., SIP and RTSP, the optional parameter "config" MUST be used to
specify the configuration information (see 5.1 and 5.2).

The abbreviation "e.g." means for example, and "MUST" can never be and example. I supect the example here is only the string "e.g., SIP and RTSP" so I suggest that this bit is placed in parenthesis, as follows: "(e.g., SIP and RTSP)"

9) Section 3.2 (Existing text). In the following text:

(5) It is RECOMMENDED that a single video packet is sent as a single
RTP packet. The size of a video packet SHOULD be adjusted in such a
way that the resulting RTP packet is not larger than the path-MTU.
Note: Rule (5) does not apply when the video packet is disabled by
the coder configuration (by setting resync_marker_disable in the VOL
header to 1), or in coding tools where the video packet is not
supported. In this case, a VOP MAY be split at arbitrary byte-
positions.

remove the mix of the word "note" with normative text. I suggest:

(5) It is RECOMMENDED that a single video packet is sent as a single
RTP packet. The size of a video packet SHOULD be adjusted in such a
way that the resulting RTP packet is not larger than the path-MTU.
If the video packet is disabled by
the coder configuration (by setting resync_marker_disable in the VOL
header to 1), or in coding tools where the video packet is not
supported, a VOP MAY be split at arbitrary byte-
positions.

10) Section 3.3 (Existing text. In the following text:

When concatenating more than one video packets into an RTP packet,
VOP header or video_packet_header() shall not be placed in the middle
of the RTP payload. The packetization as in (b) is not allowed by
criterion (2) due to the aspect of the error resiliency. Comparing
this example with Figure 2(d), although two video packets are mapped
onto two RTP packets in both cases, the packet-loss resiliency is not
identical. Namely, if the second RTP packet is lost, both video
packets 1 and 2 are lost in the case of Figure 3(b) whereas only
video packet 2 is lost in the case of Figure 2(d).

is the "shall" here denoting a requirements (in which case it should be capitalised and replaced with "MUST", and moved to a section that is not titled "Example") or is it denoting something else, e.g. an impossibility, in which case change the modal auxiliary to something else.

11) Section 4.2, 1st paragraph:

Payload Type (PT): The assignment of an RTP payload type for this new
packet format is outside the scope of this document, and will not be
specified here. It is expected that the RTP profile for a particular
class of applications will assign a payload type for this encoding,
or if that is not done then a payload type in the dynamic range shall
be chosen by means of an out-of-band signaling protocol (e.g., H.245,
SIP, etc). In the dynamic assignment of RTP payload types for
scalable streams, a different value SHOULD be assigned to each layer.
The dependency relationships between the enhance layer and the base
layer SHOULD be signaled as specified in [RFC5583]. An example of
the use of such signaling for scalable audio streams can be found in
[RFC5691].

Write the above new text in the active, i.e. "In the dynamic assignment of RTP payload types for scalable streams, the server SHOULD assign a different value to each layer. The server SHOULD signal ...".

Under what circumstances does the "SHOULD" not apply. It is good practice when using "SHOULD" to give guidance tot he reader of the conditions under which it is safe to ignore the requirement. Otherwise may implementors treat the word as a straight option and ignore it.

It is also noted that a requirement specifying the assignment of the payload type is inconsistent with the first sentence, which says "assignment of an RTP payload type for this new packet format is outside the scope of this document". This sentence will therefore need to be altered.

12) Section 4.2 (Existing text). Is the following meant to be normative:

Timestamp: The timestamp indicates the sampling instance of the first
audio frame contained in the RTP packet. Timestamps are recommended
to start at a random value for security reasons.

It is currently drafted as such, but "recommended" is lower case.

13) Section 4.3 (Existing text). What is the meaning of the final sentence:

If it cannot, the audioMuxElement MAY
be fragmented and spread across multiple packets.

As written it is telling me I have an option to fragment it and spread it across multiple packets. Is that what is meant.

14) Section 5.1. The following text:

Note, any unspecified parameter MUST be ignored by the receiver to
ensure that additional parameters can be added in any future revision
of this specification.

has the word "MUST" mandatory requirement, following the word "note" which many people interpret as informative. Assuming a mandatory requirement is meant, delete the word "Note". The text should also be redrafted in the active, e.g.:

The receiver MUST ignore any unspecified parameter, to
ensure that additional parameters can be added in any future revision
of this specification.

This is also a registration section. Surely normative requirements on the implementation should be elsewhere in the document, so I suggest this text is moved to another more appropriate section.

15) Section 5.1.

Published specification:

The specifications for MPEG-4 Visual streams are presented in ISO/
IEC 14469-2 [14496-2]. The RTP payload format is described in RFC
XXXX.

Why not just say: "The RTP payload format is described in this document."

Same change for the following two entries, and also in section 5.3.

16) Section 5.3, 5.4.1.3 and 5.4.1.4. This document does not define the coding of the SBR-enabled parameter. I get the impression from reading the document that it is defined elsewhere, but no reference is made to the point where it is defined. That would surely need to be a normative reference.

17) Section 5.3.

Note, any unspecified parameter MUST be ignored by the receiver to
ensure that additional parameters can be added in any future revision
of this specification.

See comments on similar text at the start of section 4.1.

18) Section 5.3 (Existing text):

Required parameters:

rate: the rate parameter indicates the RTP time stamp clock rate.
The default value is 90000. Other rates MAY be specified only if
they are set to the same value as the audio sampling rate (number
of samples per second).

Change "specified" to "indicated". The AVT working group is the specifier.

19) Section 5.3 (Existing text). There are a number of lower case "shall" here. Do these constitute requirements. How is the reader of this specification expected to understand and interpret them.

If they are not requirements, then change them.

19) Section 5.3 (existing text)

MPS-profile-level-id: a decimal representation of the MPEG
Surround Profile Level indication as defined in ISO/IEC 14496-3
[14496-3]. This parameter indicates the MPEG Surround profile and
level that the decoder must be capable in order to decode the
stream.

What is the meaning of the final sentence?

20) Section 5.3 (existing text)

ptime: RECOMMENDED duration of each packet in milliseconds.

Why is this a requirement. Hows does an implementation conform to it.

21) Section 5.3

If this parameter is set to 0, a decoder SHALL expect that SBR is
not used. If this parameter is set to 1, a decoder SHOULD
upsample the audio data with the SBR tool, regardless whether SBR
data is present in the stream or not.

First sentence is not written as a conformable requirement. What does the decoder need to do to conform? For the second sentence, see comments earlier on use of SHOULD.

22) Section 5.3

If the presence of SBR can not be detected from out-of-band
configuration and the SBR-enabled parameter is not present, the
parameter defaults to 1 for an SBR-capable decoder. If the
resulting output sampling rate or the computational complexity is
not supported, the SBR tool may be disabled or run in downsampled
mode.

Change "may" to "can" to indicate that it is definitely not a conformance requirement.

23) Section 5.4.1.3

In this example, the presence of SBR can not be determined by the SDP
parameter set. The clock rate represents the core codec sampling
rate. An SBR enabled decoder SHOULD use the SBR tool to upsample the
audio data if complexity and resulting output sampling rate permits.

This section is entitled "Example". Such sections ideally should not include conformable requirements. So if this is a requirement then move it somewhere else. Also comments elsewhere about the use of SHOULD also apply here.

24) Section 5.4.1.4

In this example, the clock rate is still 24000 and this information
should be used for RTP timestamp calculation. The value of 24000 is
used to support old AAC decoders. This makes the decoder supporting
only AAC understand the HE AAC coded data, although only plain AAC is
supported. A HE AAC decoder is able to generate output data with the
SBR sampling rate.

Change "should be" to "is" to avoid confusion with a conformable requirement.

25) Section 6.

This memo defines additional optional format parameters to the Media
Type "audio" and its subtype "MP4A-LATM", as defined in RFC XXXX.
The Media Type parameters are defined in sections 5.1 and 5.3 of RFC
XXXX.

Just say "sections 5.1 and 5.3 of this document".

26) Section 6.1.

This memo defines the following additional optional parameters which
SHOULD be used if SBR or MPEG Surround data is present inside the
payload of an AAC elementary stream.

We do not need a SHOULD (or any other RFC 2119 language in relation to an IANA considerations section.

I suspect this is just superfluous, and not a real conformable requirement on implementations.

27) Section 6

An IANA considerations section should basically consists of a set of instructions to IANA as to what they need to include in their tables. See http://www.iana.org/protocols/

I cannot ascertain from this section whether they need to do work.

Are there any existing RFC 3013 registrations where the reference needs to be updated?

Are there any of these parameters that actually need registration?

regards

Keith

________________________________

From: Roni Even [mailto:Even.roni@huawei.com]
Sent: Tuesday, November 23, 2010 2:00 PM
To: avt@ietf.org
Cc: DRAGE, Keith (Keith); draft-ietf-avt-rfc3016bis.all@tools.ietf.org
Subject: WGLC on draft-ietf-avt-rfc3016bis

Hi,

I would like to start a working group last call on http://tools.ietf.org/html/draft-ietf-avt-rfc3016bis-01

RTP Payload Format for MPEG-4 Audio/Visual Streams

The WGLC will end on December 13th , 2010

Please review the draft and send comments to the list

Note that section 1.3 and 1.4 discuss the changes from RFC3016

Roni Even

AVT co- ass=MsoNormal>

[AVT] WGLC on draft-ietf-avt-rfc3016bis Roni Even
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Ali C. Begen (abegen)
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Qin Wu
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Roni Even
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Ali C. Begen (abegen)
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Roni Even
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis DRAGE, Keith (Keith)
Re: [AVT] WGLC on draft-ietf-avt-rfc3016bis Bont, Frans de