RE: [AVT] draft-ietf-avt-rtp-vc1-05.txt

Hello Anders, all,

Thanks for addressing our comments carefully. In my review of -05
version I found a couple of points requiring minor technical
clarifications or for which informative text would improve readability
of the specification. Please see below.

T1.
4.3 Time stamp considerations
...
   If B-pictures may be present in the coded bit stream, then the decode

   times of frames are determined as follows: 

   - Non-B frames:  The decode time SHALL be equal to the presentation 
     time of the previous non-B frame in the coded order. 

   - B-frames:  The decode time SHALL be equal to the presentation time 
     of the B-frame. 

Comment: The very first frame of a stream is naturally a non-B frame.
The draft lacks a rule how the decode time of the very first frame SHALL
be set.

T2.
4.3 Time stamp considerations 

Comment: As far as we can see, the only use for decode time is specified
in the Hypothetical Reference Decoder section of the VC-1 spec. HRD is
something that decoders need not implement. Moreover, as specified in
section 4.3, the decode time is something that can be derived from
presentation times. Thus, we are puzzled what is the reason to have the
decode time present in the AU header at all. Moreover, it might be good
to have some informative text what receivers are supposed to do with the
values of decode time. For example, does the receiver have any benefit
if it strictly follows the decoding times in decoding compared to
initially buffering until the desired buffer occupancy level and then
decoding an AU as soon as it becomes available in the receiver buffer?

T3.
6.1 Media type Registration
...
         height: 
           The value is an integer greater than zero, specifying the 
           maximum vertical size of the coded picture in luma samples 
           (pixels in the luma picture.)   

           For Simple and Main profiles, the value SHALL be identical to

           the actual vertical size of the coded picture. 
           For Advanced profile, the value SHALL be greater than, or 
           equal to, the largest vertical size of the coded picture. 
...
         max-height: 
           The value is an integer greater than zero, specifying a 
           vertical size for the coded picture, in luma samples (pixels 
           in the luma picture.)  If the value is less than the maximum 

Comment: Section 4.12 of the VC-1 spec defines coded picture as follows:

coded picture: A coded picture is made of a picture header, the optional
extensions immediately following it, and the following picture data. A
coded picture may be a coded frame or a coded field.

It is therefore ambigous whether the values of height and max-height
refer to the height of fields or frames in the Advanced profile. We
suggest replacing "coded picture" with "frame" in the specification of
height and max-height.

T4.
8. Congestion Control
...
   - The existence of non-reference frames (e.g., B-frames) in the bit 
     stream.  Non-reference frames can be discarded by the transmitter 
     prior to encapsulation in RTP.  If the frames contain the TFCNTR 
     (Temporal Reference Frame Counter) syntax element, it will require 
     updating the TFCNTR fields of other frames to ensure that the 
     field remains continuous.  Because TFCNTR counts the frames in the 
     display order, which is different from the order in which they are 
     transmitted (the coded order), it will require the transmitter to 
     "look ahead", or buffer, of some number of frames. 

Comment: A similar, though more straightforward, procedure also applies
to FRMCNT field (which is present in the Main profile streams, whereas
TFCNTR is conditionally present in the Advanced profile streams). It
might be good to add discussion on FRMCNT to this paragraph.

T5.
8. Congestion Control 

   Congestion control for RTP SHALL be used in accordance with RFC 3550 
   [3], and with any applicable RTP profile; e.g., RFC 3551 [15].   

   If best-effort service is being used, users of this payload format 
   MUST monitor packet loss to ensure that the packet loss rate is 
   within acceptable parameters.  Packet loss is considered acceptable 
   if a TCP flow across the same network path, and experiencing the same

   network conditions, would achieve an average throughput, measured on 
   a reasonable timescale, that is not less than the RTP flow is 
   achieving.  This condition can be satisfied by implementing 
   congestion control mechanisms to adapt the transmission rate (or the 
   number of layers subscribed for a layered multicast session), or by 
   arranging for a receiver to leave the session if the loss rate is 
   unacceptably high. 

Comment: The sentence "(or the number of layers subscribed for a layered
multicast session)" suggests that this payload specification can be used
for layered multicast. We don't think this payload specification is
capable for layered multicast (reason being explained in point 3 below)
and therefore the sentence is somewhat misleading and should be
clarified. As far as we can see, there are three altenatives for the
clarification:

1) Remove "(or the number of layers subscribed for a layered multicast
session)"

2) Make it clear to the reader that they payload specification cannot be
used for layered multicast and therefore the option to prune multicast
layers can only be used hypothetically with some future extension. For
example, one shoud remove "(or the number of layers subscribed for a
layered multicast session)" and add: "Note that this payload
specification cannot be used as such for layered multicast.  However, it
may be possible to use this payload specification for the lowest
multicast group (i.e. the base layer) and another payload specification
or protocol extension for higher multicast groups of the same layered
multicast session.  In such a case, the congestion control mechanism can
be implemented by adjusting the number of subscribed layers in a layered
multicast session."

3) Specify how the payload specification is used with layered multicast.

Note that if a receiver receives the base layer only, the base layer
must be compliant with the VC-1 spec and with the VC-1 RTP payload spec.
In particular, when it comes to frame counters FRMCNT and TFCNTR, this
causes some complications. In the base layer stream, FRMCNT and TFCNTR
must be standard-compliant, i.e. incremented by 1 per each frame.
However, it is then unclear how the transmitted should set the value of
these fields and how the receiver should aggregate and modify these
fields before passing the aggregated and standard-compliant stream to
the decoder. To clarify the issue further, I quickly wrote a section
that could be inserted to the RTP payload specification. 

[section start]

X.Y Use with Layered Multicast

When this payload specification is used with layered multicast, the
transmitter MUST ensure that the lowest multicast group (i.e. the base
layer) contains a valid VC-1 bitstream. In particular, the values of
FRMCNT field in the Simple and Main profile streams and the values of
TFCNTR field in the Advanced profile streams MUST be continuous as
specified in SMPTE 421M. Moreover, the TFCNTR field MUST be present in
Advanced profile streams (i.e. TFCNTRFLAG in the sequence header MUST be
set to 1). The transmitter MUST set the values of the FMRCNT field and
the TFCNTR field of each frame in other multicast groups than the lowest
multicast group (i.e. in enhancement layers) equal to the values of the
FRMCNT field and the TFCNTR field of the base layer frame that
immediately precedes the enhancement layer frame in coded order.

If a receiver receives only the base layer of a layered multicast, no
additional receiver operations are required. If a receiver receives
multiple layers of a layered multicast, then it MUST arrange coded
frames of different layers into coded order according to the following
algorithm: 
1. The first frame in coded order is the base layer frame having FRMCNT
or TFCNTR equal to 0. 
2. Variable prevFrmCnt is set to 0.
3. If there are frames in enhancement layers having the value of FRMCNT
or TFCNTR equal to prevFrmCnt, these frames are next in coded order in
ascending order of multicast groups. If there are multiple frames within
a multicast group having the value of FRMCNT or TFCNTR equal to
prevFrmCnt, the coded order of these frames is equal to the RTP sequence
number order.
4. The next frame in coded order is the base layer frame next in coded
order (i.e. next in sequence number order). Variable prevFrmCnt is set
to the value of FRMCNT or TFCNTR of the base layer frame.
5. Steps 3 and 4 are repeated until the RTP sessions are terminated.

When the receiver has arranged a coded frame of a layered multicast to
coded order, the receiver MUST rewrite the value of the FRMCNT field or
the TFCNTR field, whichever is present in the frame, such that it
complies with SMPTE 421M before passing the frame to the decoder.

[section end]

Best Regards,
Miska

________________________________

	From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On
Behalf Of ext Anders Klemets
	Sent: 03 January, 2006 17:16
	To: IETF AVT WG
	Subject: [AVT] draft-ietf-avt-rtp-vc1-05.txt

	I just submitted draft-ietf-avt-rtp-vc1-05.txt.  It should be
available on the IETF web site in a couple of days.  Colin suggested
that I CC the AVT list with my submission to give the AVT members some
extra time to review version -05.  But I don't know if it will go
through as the attachment was a little bit large.  (If it does go
through, I want to apologize in advance for the large attachment.)

	In version -05, I have incorporated the changes suggested by
Nokia:

	The biggest change is the addition of a Congestion Control
section.  I placed it directly after the Security Considerations
section, and added a paragraph to the Security Considerations section
that links the two sections together.

	I added "BI-picture" to the definitions section.  This is a
special kind of B-picture, but to make sure there is no confusion, in
sections that talk about B-pictures (such as 3.4, 4.3) I also mention
that the rules also apply to BI-pictures.

	In sections 4.1 and 4.3, I added a paragraph explaining how
interlaced video frames are handled.

	I added a sentence to the Introduction section to explain how
Main profile differs from Simple profile.

	In the Media Type parameter definition of "width" and "height",
I added a paragraph calling out that for Simple/Main these parameters
must be identical to the actual width and height used, while for
Advanced profile the actual width and height may be lower than
specified.

	Anders

_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt