Re: [rtcweb] [avtext] Fwd: New Version Notification for draft-fineberg-avtext-temporal-layer-ext-00.txt

Stephan,

This seems like a reasonable approach.  I like your point about 
designing the RTP stack to generically handle scalability features.  My 
concern is complexity and overhead for what should be a very lightweight 
process however generalizing seems to be the consensus so that should be 
the direction forward.

Regards,
Adam

On 7/25/13 12:44 AM, Stephan Wenger wrote:
> Hi,
> I thought about this a bit more.  I think that one sensible document 
> structure may be as follows:
>
> (1) generic structure of scalability/pruning information to an RTP 
> header extension (to be worked on in avtext).
> (2) mapping of codec-specific features to the generic structure.  As a 
> location, I would suggest that (for new codecs) the RTP payload specs 
> are the right document.  For codecs with existing RTP payload specs, a 
> short doc that updates the payload spec would be the right place, or a 
> revision of the payload spec itself if that needs to be done anyway. 
>  This would bundle all the codec-specifics to one document.
>
> This document structure, and the philosophy behind it, would allow the 
> design of a generic RTP stack that could take advantage of scalability 
> features independently of the scalable codec actually in use.  I 
> believe that such an advantage outweighs the possible advantage of 
> quick standardization of a more specific approach as proposed in 
> Adam's draft.  That said, we need to get the design of the generic 
> structure right such that it is indeed useful, ideally for all present 
> and hopefully for a large subset of future codecs.
>
> As for the specs involved:
> -H.264 SVC, H.265, and VP8 have payload format RFCs or reasonably 
> stable drafts.
> -No one uses H.263 and MPEG-2 scalability, so we can ignore those two.
> -As Adam, I'm also not aware of a VP9 payload spec draft.  However, my 
> understanding is that VP9 will include a number of scalability 
> features enabled by multiple reference pictures and reference picture 
> resampling.  The latter is a somewhat different approach compared to 
> the traditional scalability implementations.  I think that google 
> planned to freeze the VP9 bitstream a few weeks go, but I don't know 
> whether that has happened.  Without a stable spec and/or input from 
> google/webm folks it will indeed be hard to address VP9's specific 
> needs, if any.
> -There are a bunch of audio codecs that claim to be scalable.  We 
> should have a scope discussion on whether those need to be included here.
>
> Stephan
>
>
>
> From: Adam Fineberg <fineberg@vline.me <mailto:fineberg@vline.me>>
> Date: Thursday, 25 July, 2013 00:54
> To: "Wang, Ye-Kui" <yekuiw@qti.qualcomm.com 
> <mailto:yekuiw@qti.qualcomm.com>>
> Cc: Bernard Aboba <bernard_aboba@hotmail.com 
> <mailto:bernard_aboba@hotmail.com>>, "avtext@ietf.org 
> <mailto:avtext@ietf.org>" <avtext@ietf.org <mailto:avtext@ietf.org>>, 
> "rtcweb@ietf.org <mailto:rtcweb@ietf.org>" <rtcweb@ietf.org 
> <mailto:rtcweb@ietf.org>>, Justin Uberti <juberti@google.com 
> <mailto:juberti@google.com>>
> Subject: Re: [avtext] [rtcweb] Fwd: New Version Notification for 
> draft-fineberg-avtext-temporal-layer-ext-00.txt
>
> YK,
>
> I would appreciate your collaboration.  Which codecs are you referring 
> to when you say "all existing standard scalable video codecs"?
>
> Regards,
> Adam
>
> On 7/24/13 3:32 PM, Wang, Ye-Kui wrote:
>>
>> If the group is to specify something generic, naturally it should be 
>> generic enough to cover at least all existing standard scalable video 
>> codecs if possible. And I personally think that is possible and not 
>> difficult at all. Thus, why limit to only a few scalable video codecs?
>>
>> I could provide some help here too if needed.
>>
>> BR, YK
>>
>> *From:*avtext-bounces@ietf.org [mailto:avtext-bounces@ietf.org] *On 
>> Behalf Of *Adam Fineberg
>> *Sent:* Wednesday, July 24, 2013 10:08 AM
>> *To:* Bernard Aboba
>> *Cc:* avtext@ietf.org; rtcweb@ietf.org; Justin Uberti
>> *Subject:* Re: [avtext] [rtcweb] Fwd: New Version Notification for 
>> draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>> Bernard,
>>
>> I apologize if I come across as being difficult here but I stil am 
>> not seeing the benefits.  Since the fields are not the same for the 
>> codecs, we will be multiplexing the bits and that seems to me to add 
>> complexity rather than add clarity.  Also, I can't find an IETF VP9 
>> document for the payload format to reference.  If the group thinks 
>> generalization is the right approach I would appreciate some 
>> collaboration on getting the right bit definitions for the other codecs.
>>
>> Regards,
>> Adam
>>
>> On 7/23/13 12:07 PM, Bernard Aboba wrote:
>>
>>     I do not think it is necessary to "support all forms of
>>     scalability for all codecs".   In fact, I would make that an
>>     explicit "non-goal".  All that was suggested is to try to create
>>     a single extension that supports a few common cases.   If it is
>>     possible to handle VP8, VP9 and H.264/SVC in a single extension
>>     that would be sufficient. So why not limit it to that?
>>
>>     ------------------------------------------------------------------------
>>
>>     Date: Tue, 23 Jul 2013 08:53:45 -0700
>>     From: fineberg@vline.me <mailto:fineberg@vline.me>
>>     To: stewe@stewe.org <mailto:stewe@stewe.org>
>>     CC: juberti@google.com <mailto:juberti@google.com>;
>>     bernard_aboba@hotmail.com <mailto:bernard_aboba@hotmail.com>;
>>     avtext@ietf.org <mailto:avtext@ietf.org>; rtcweb@ietf.org
>>     <mailto:rtcweb@ietf.org>
>>     Subject: Re: [avtext] [rtcweb] Fwd: New Version Notification for
>>     draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>     I've been thinking about this and given the ease at which RFC5285
>>     allows for the specification of a header extension and the
>>     complexity introduced by trying to generalize the header
>>     extension to support all forms of scalability for all codecs that
>>     the generalization might not be the best approach.  I'm not sure
>>     what we really gain by trying to capture all this in a single
>>     header extension rather than one per that can succinctly explain
>>     the fields without the complexity of multiplexing the bits.
>>
>>     Thoughts?
>>
>>     Regards,
>>     Adam
>>
>>     On 7/19/13 3:44 PM, Stephan Wenger wrote:
>>
>>         Hi,
>>
>>         *From: *Adam Fineberg <fineberg@vline.me
>>         <mailto:fineberg@vline.me>>
>>         *Date: *Friday, 19 July, 2013 15:12
>>         *To: *Stephan Wenger <stewe@stewe.org <mailto:stewe@stewe.org>>
>>         *Cc: *Justin Uberti <juberti@google.com
>>         <mailto:juberti@google.com>>, Bernard Aboba
>>         <bernard_aboba@hotmail.com
>>         <mailto:bernard_aboba@hotmail.com>>, "avtext@ietf.org
>>         <mailto:avtext@ietf.org>" <avtext@ietf.org
>>         <mailto:avtext@ietf.org>>, "rtcweb@ietf.org
>>         <mailto:rtcweb@ietf.org>" <rtcweb@ietf.org
>>         <mailto:rtcweb@ietf.org>>
>>         *Subject: *Re: [avtext] [rtcweb] Fwd: New Version
>>         Notification for draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>         Stephan,
>>
>>         Thanks for the info and the reference.  I'm not sure I follow
>>         as I'm not at all familiar with H.265. I'll review the
>>         reference and see what I can figure.
>>
>>         StW: Good luck :-)
>>
>>         It seems though to me that you are suggesting that except in
>>         the simple case, that the data for H.265 would not be well
>>         suited to a header extension, am I understanding you
>>         correctly?  There is no reason the middlebox couldn't get out
>>         of band signaling of the VPS as you mention but that would
>>         not be within the scope of this header extension.
>>
>>         StW: well, if you would copy the layer_id into your header
>>         extension (just as you need to do for the simple case), a
>>         really smart middle box could use this information just as a
>>         decoder uses it, assuming that it intercepted the VPS in the
>>         first place.  Insofar, I wouldn't rule out the second option
>>         on technical grounds.  Whether any of the actual products
>>         would bother to do that, ever, is another question.  I think
>>         the case ought to be documented, though.  I can help drafting
>>         text.
>>
>>         While we are at it: doing this right could mean that you need
>>         multiple specs.  First, a generic header extension mechanism
>>         dedicated to side information required for pruning of RTP
>>         packet streams—ideally not only for scalable video, although
>>         that is the main customer today.  And second, for each
>>         "payload" (at present we are talking about H.264/SVC, H.265v1
>>         (HEVC), H.265v2 (including scalable and 3D extensions, which
>>         are not yet finalized), VP8, and VP9 (I know little about the
>>         latter), plus Daala, and whatnot) a mapping of the bits
>>         available in the generic header extension to the bits in the
>>         payload itself (NAL header and VPS in case of H.265, NALU
>>         header in SVC, and the fields you mention for VP8).
>>
>>         Stephan
>>
>>
>>
>>         Any insights are appreciated.
>>
>>         Regards,
>>         Adam
>>
>>         On 7/19/13 8:33 AM, Stephan Wenger wrote:
>>
>>             Hi,
>>
>>             I also believe that 16 bits should be enough.  For H.264
>>             and VP8 that has already been demonstrated.  For H.265,
>>             some initial thoughts below.  Apologies for the word-count.
>>
>>             The scalable version of H265 (called SHVC) is currently
>>             under development.  The current working draft can be
>>             found here:
>>             http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/JCTVC-M1008-v3.zip.
>>              Therein, the options for defining layering structures
>>             are considerably more complex.  To start, we have 3 bits
>>             for the temporal ID in the NAL unit header of the H.265
>>             version 1 (HEVC) base specification (temporal scalability
>>             is already nicely supported in version 1).  Just like in
>>             SVC.  In the scalable extension, the NAL unit header
>>             contains a six bit field that points into a data
>>             structure known as "Video Parameter Set" (VPS).  Inside
>>             the VPS, those six bits are mapped to to a position in a
>>             directed graph (specified through "dimension_id[][]"),
>>             which tells you about the reference relationship of the
>>             layer in question and its parent layer.  One can
>>             recursively follow the graph to determine what used to be
>>             called dependency_id, quality_id, view_id, and whatnot.
>>              The six bit pointer field can (or: is to be when
>>             possible) organized by the encoder such that it is
>>             prudent for a middle box to throw away NAL units
>>             (belonging to layers) with higher values of the six bit
>>             field first, before throwing away NAL units with lower
>>             values.  Relying on this feature, 3+6 bits == 9 bits
>>             should be fine for the header extension.
>>
>>             That said, the ordering by the encoder is just a
>>             recommendation, and there may well be cases where
>>             different pruning strategies may be advisable.  For
>>             example, a layering structure could be constructed that
>>             expands into two branches, one using 2D scalable tools
>>             only, the other including view_id for multi view coding.
>>              By looking at the six bit field alone, a middle box will
>>             not be able to meaningfully remove NAL units belonging to
>>             one of the branches completely while pruning the other
>>             branch.  In order to meaningfully deal with that
>>             scenario, there would be two options: one to represent
>>             the dimension_id[][] (and associated control info) in the
>>             header extension, or require the middle box to have
>>             access to the VPS and be able to interpret its content.
>>              The further could take considerably more than 16 bits
>>             and we would be talking about a variable length data
>>             structure.  The latter requires the middle box to have
>>             state and a mechanism to intercept the VPS (through
>>             signaling—as the encrypted in-band VPS would not be
>>             useful under the assumption that the middle box does not
>>             have the key to the media—which is the motivation of the
>>             draft in the first place).  I personally don't mind at
>>             all the second mechanism, as I'm a big fan of out-of-band
>>             parameter set transmission and any middle box must be in
>>             the signaling path anyway to meaningfully manipulate RTP.
>>              I do not like the first option due to its variable, and
>>             possibly substantial, overhead.
>>
>>             Stephan
>>
>>             *From: *Justin Uberti <juberti@google.com
>>             <mailto:juberti@google.com>>
>>             *Date: *Friday, 19 July, 2013 06:32
>>             *To: *Bernard Aboba <bernard_aboba@hotmail.com
>>             <mailto:bernard_aboba@hotmail.com>>
>>             *Cc: *"avtext@ietf.org <mailto:avtext@ietf.org>"
>>             <avtext@ietf.org <mailto:avtext@ietf.org>>,
>>             "rtcweb@ietf.org <mailto:rtcweb@ietf.org>"
>>             <rtcweb@ietf.org <mailto:rtcweb@ietf.org>>
>>             *Subject: *Re: [rtcweb] Fwd: New Version Notification for
>>             draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>             Agree those are the right codecs to design for. Since in
>>             each case there are fairly low limits on the number of
>>             supported layers (i.e. 3 spatial layers for SVC), I think
>>             it should be possible to pack the temporal, spatial,
>>             quality layer ids into 16 bits.
>>
>>             On Fri, Jul 19, 2013 at 1:56 AM, Bernard Aboba
>>             <bernard_aboba@hotmail.com
>>             <mailto:bernard_aboba@hotmail.com>> wrote:
>>
>>             If we can support VP8/9 as well as H.264/5 SVC
>>
>>             that would be a start. It seems doable to me.
>>
>>
>>             On Jul 18, 2013, at 8:34 PM, "Adam Fineberg"
>>             <fineberg@vline.me <mailto:fineberg@vline.me>> wrote:
>>
>>                 Bernard,
>>
>>                 Are there other codecs you are thinking should be
>>                 supported?  If it's generalized I would think we want
>>                 to be able to cover all known scalable codecs. I'll
>>                 look into the H264/SVC fields to see how to encode
>>                 them in a generalized header.
>>
>>                 Regards,
>>                 Adam
>>
>>                 On 7/18/13 7:40 PM, Bernard Aboba wrote:
>>
>>                     I think it may be possible to generalize this. 
>>                     For example, for H.264/SVC which can support
>>                     temporal, spatial and quality scalability, you
>>                     would need the quality_id and dependency_id in
>>                     addition to the temporal_id (what you call the
>>                     temporal layer index).
>>
>>                     ------------------------------------------------------------------------
>>
>>                     Date: Thu, 18 Jul 2013 08:45:38 -0700
>>                     From: fineberg@vline.me <mailto:fineberg@vline.me>
>>                     To: bernard_aboba@hotmail.com
>>                     <mailto:bernard_aboba@hotmail.com>
>>                     CC: rtcweb@ietf.org <mailto:rtcweb@ietf.org>;
>>                     avtext@ietf.org <mailto:avtext@ietf.org>
>>                     Subject: Re: [rtcweb] Fwd: New Version
>>                     Notification for
>>                     draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>                     Bernard,
>>
>>                     Good question.  I'm not familiar enough with the
>>                     parameter requirements of all other scalable
>>                     codecs to be able to generalize. If you'd like to
>>                     help specify them, I'd be fine revising the draft
>>                     to generalize.
>>
>>                     Regards,
>>                     Adam
>>
>>                     On 7/17/13 8:26 PM, Bernard Aboba wrote:
>>
>>                         Since the need is not codec specific (e.g. it
>>                         arises with any codec supporting temporal,
>>                         spatial and quality scalability), why
>>                          a VP8-specific RTP extension?
>>
>>                         ------------------------------------------------------------------------
>>
>>                         Date: Wed, 17 Jul 2013 17:09:46 -0700
>>                         From: fineberg@vline.me
>>                         <mailto:fineberg@vline.me>
>>                         To: rtcweb@ietf.org <mailto:rtcweb@ietf.org>
>>                         Subject: [rtcweb] Fwd: New Version
>>                         Notification for
>>                         draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>                         Hi,
>>
>>                         I'm working on WebRTC services and have found
>>                         that while developing services that forward
>>                         VP8 video streams if we want to take
>>                         advantage of the VP8 temporal scaling we must
>>                         get the temporal layer information from the
>>                         RTP header which requires us to decrypt the
>>                         SRTP packets. This is undesirable both
>>                         because the middle-box needs to have access
>>                         to the keys as well as the because of the
>>                         added overhead of the decrypt/encrypt cycle.
>>                         This draft proposes an RTP header extension
>>                         that will allow us to use the VP8 temporal
>>                         layer information included in the header
>>                         extension and therefore do forwarding without
>>                         SRTP decryption. Comments welcome.
>>
>>                         Regards,
>>                         Adam Fineberg
>>
>>                         fineberg at vline.com
>>                         <mailto:fineberg%20at%20vline.com>
>>
>>                         -------- Original Message --------
>>
>>                         *Subject:*
>>
>>                         	
>>
>>                         New Version Notification for
>>                         draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>                         *Date:*
>>
>>                         	
>>
>>                         Tue, 09 Jul 2013 10:02:05 -0700
>>
>>                         *From:*
>>
>>                         	
>>
>>                         internet-drafts at ietf.org
>>                         <mailto:internet-drafts%20at%20ietf.org>
>>
>>                         *To:*
>>
>>                         	
>>
>>                         Adam Fineberg <fineberg at vline.com>
>>                         <mailto:fineberg%20at%20vline.com>
>>
>>
>>
>>
>>                         A new version of I-D, draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>                         has been successfully submitted by Adam Fineberg and posted to the
>>
>>                         IETF repository.
>>
>>                           
>>
>>                         Filename:         draft-fineberg-avtext-temporal-layer-ext
>>
>>                         Revision:         00
>>
>>                         Title:            A Real-Time Transport Protocol (RTP) Header Extension for VP8 Temporal Layer Information
>>
>>                         Creation date:    2013-07-08
>>
>>                         Group:            Individual Submission
>>
>>                         Number of pages: 6
>>
>>                         URL:http://www.ietf.org/internet-drafts/draft-fineberg-avtext-temporal-layer-ext-00.txt
>>
>>                         Status:http://datatracker.ietf.org/doc/draft-fineberg-avtext-temporal-layer-ext
>>
>>                         Htmlized:http://tools.ietf.org/html/draft-fineberg-avtext-temporal-layer-ext-00
>>
>>                           
>>
>>                           
>>
>>                         Abstract:
>>
>>                             This document defines a mechanism by which packets of Real-Time
>>
>>                             Tranport Protocol (RTP) video streams encoded with the VP8 codec can
>>
>>                             indicate, in an RTP header extension, the temporal layer information
>>
>>                             about the frame encoded in the RTP packet.  This information can be
>>
>>                             used in a middlebox performing bandwidth management of streams
>>
>>                             without requiring it to decrypt the streams.
>>
>>
>>                         _______________________________________________
>>                         rtcweb mailing list rtcweb@ietf.org
>>                         <mailto:rtcweb@ietf.org>
>>                         https://www.ietf.org/mailman/listinfo/rtcweb
>>
>>                     -- 
>>
>>                     Regards,
>>
>>                     Adam
>>
>>
>>             _______________________________________________
>>             rtcweb mailing list
>>             rtcweb@ietf.org <mailto:rtcweb@ietf.org>
>>             https://www.ietf.org/mailman/listinfo/rtcweb
>>
>>
>>
>>             _______________________________________________
>>
>>             avtext mailing list
>>
>>             avtext@ietf.org  <mailto:avtext@ietf.org>https://www.ietf.org/mailman/listinfo/avtext
>>
>>