Re: [rtcweb] [avtext] Fwd: New Version Notification for draft-fineberg-avtext-temporal-layer-ext-00.txt

Bernard Aboba <> Tue, 23 July 2013 19:07 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C359911E838B; Tue, 23 Jul 2013 12:07:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -102.598
X-Spam-Status: No, score=-102.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 2JH9LhxiKR63; Tue, 23 Jul 2013 12:07:24 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D20AD11E8380; Tue, 23 Jul 2013 12:07:17 -0700 (PDT)
Received: from BLU169-W20 ([]) by with Microsoft SMTPSVC(6.0.3790.4675); Tue, 23 Jul 2013 12:07:16 -0700
X-TMN: [Y5DyEb8sefXyStR6BBh9THoqYNyO9u/j]
X-Originating-Email: []
Message-ID: <BLU169-W20CACC8554C875802188A3936F0@phx.gbl>
Content-Type: multipart/alternative; boundary="_788a735f-7672-45a0-80b1-0a9af35b1860_"
From: Bernard Aboba <>
To: Adam Fineberg <>
Date: Tue, 23 Jul 2013 12:07:16 -0700
Importance: Normal
In-Reply-To: <>
References: <>,<>
MIME-Version: 1.0
X-OriginalArrivalTime: 23 Jul 2013 19:07:16.0604 (UTC) FILETIME=[D8D773C0:01CE87D7]
Cc: "" <>, "" <>
Subject: Re: [rtcweb] [avtext] Fwd: New Version Notification for draft-fineberg-avtext-temporal-layer-ext-00.txt
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 23 Jul 2013 19:07:28 -0000

I do not think it is necessary to "support all forms of scalability for all codecs".   In fact, I would make that an explicit "non-goal".  All that was suggested is to try to create a single extension that supports a few common cases.   If it is possible to handle VP8, VP9 and H.264/SVC in a single extension that would be sufficient.  So why not limit it to that? 

Date: Tue, 23 Jul 2013 08:53:45 -0700
Subject: Re: [avtext] [rtcweb] Fwd: New Version Notification for draft-fineberg-avtext-temporal-layer-ext-00.txt

    I've been thinking about this and given the ease at which RFC5285
    allows for the specification of a header extension and the
    complexity introduced by trying to generalize the header extension
    to support all forms of scalability for all codecs that the
    generalization might not be the best approach.  I'm not sure what we
    really gain by trying to capture all this in a single header
    extension rather than one per that can succinctly explain the fields
    without the complexity of multiplexing the bits.







    On 7/19/13 3:44 PM, Stephan Wenger


          From: Adam Fineberg

          Date: Friday, 19 July,
          2013 15:12 

          To: Stephan Wenger <>

          Cc: Justin Uberti <>,
          Bernard Aboba <>,

          Subject: Re: [avtext]
          [rtcweb] Fwd: New Version Notification for




            Thanks for the info and the reference.  I'm not sure I
            follow as I'm not at all familiar with H.265.  I'll review
            the reference and see what I can figure. 

      StW: Good luck :-)  

          It seems though to me
            that you are suggesting that except in the simple case, that
            the data for H.265 would not be well suited to a header
            extension, am I understanding you correctly?  There is no
            reason the middlebox couldn't get out of band signaling of
            the VPS as you mention but that would not be within the
            scope of this header extension.

            well, if you would copy the layer_id into your header
            extension (just as you need to do for the simple case), a
            really smart middle box could use this information just as a
            decoder uses it, assuming that it intercepted the VPS in the
            first place.  Insofar, I wouldn't rule out the second option
            on technical grounds.  Whether any of the actual products
            would bother to do that, ever, is another question.  I think
            the case ought to be documented, though.  I can help
            drafting text.
            we are at it: doing this right could mean that you need
            multiple specs.  First, a generic header extension mechanism
            dedicated to side information required for pruning of RTP
            packet streams—ideally not only for scalable video, although
            that is the main customer today.  And second, for each
            "payload" (at present we are talking about H.264/SVC,
            H.265v1 (HEVC), H.265v2 (including scalable and 3D
            extensions, which are not yet finalized), VP8, and VP9 (I
            know little about the latter), plus Daala, and whatnot) a
            mapping of the bits available in the generic header
            extension to the bits in the payload itself (NAL header and
            VPS in case of H.265, NALU header in SVC, and the fields you
            mention for VP8).



            Any insights are appreciated.





            On 7/19/13 8:33 AM, Stephan
              Wenger wrote:

              I also believe that 16 bits should be enough.  For
                H.264 and VP8 that has already been demonstrated.  For
                H.265, some initial thoughts below.  Apologies for the

              The scalable version of H265 (called SHVC) is
                currently under development.  The current working draft
                can be found here:
                 Therein, the options for defining layering structures
                are considerably more complex.  To start, we have 3 bits
                for the temporal ID in the NAL unit header of the H.265
                version 1 (HEVC) base specification (temporal
                scalability is already nicely supported in version 1).
                 Just like in SVC.  In the scalable extension, the NAL
                unit header contains a six bit field that points into a
                data structure known as "Video Parameter Set" (VPS).
                 Inside the VPS, those six bits are mapped to to a
                position in a directed graph (specified through
                "dimension_id[][]"), which tells you about the reference
                relationship of the layer in question and its parent
                layer.  One can recursively follow the graph to
                determine what used to be called dependency_id,
                quality_id, view_id, and whatnot.  The six bit pointer
                field can (or: is to be when possible) organized by the
                encoder such that it is prudent for a middle box to
                throw away NAL units (belonging to layers) with higher
                values of the six bit field first, before throwing away
                NAL units with lower values.  Relying on this feature,
                3+6 bits == 9 bits should be fine for the header

              That said, the ordering by the encoder is just a
                recommendation, and there may well be cases where
                different pruning strategies may be advisable.  For
                example, a layering structure could be constructed that
                expands into two branches, one using 2D scalable tools
                only, the other including view_id for multi view coding.
                 By looking at the six bit field alone, a middle box
                will not be able to meaningfully remove NAL units
                belonging to one of the branches completely while
                pruning the other branch.  In order to meaningfully deal
                with that scenario, there would be two options: one to
                represent the dimension_id[][] (and associated control
                info) in the header extension, or require the middle box
                to have access to the VPS and be able to interpret its
                content.  The further could take considerably more than
                16 bits and we would be talking about a variable length
                data structure.  The latter requires the middle box to
                have state and a mechanism to intercept the VPS (through
                signaling—as the encrypted in-band VPS would not be
                useful under the assumption that the middle box does not
                have the key to the media—which is the motivation of the
                draft in the first place).  I personally don't mind at
                all the second mechanism, as I'm a big fan of
                out-of-band parameter set transmission and any middle
                box must be in the signaling path anyway to meaningfully
                manipulate RTP.  I do not like the first option due to
                its variable, and possibly substantial, overhead.



                  From: Justin
                  Uberti <>

                  Date: Friday,
                  19 July, 2013 06:32 

                  To: Bernard
                  Aboba <>

                  Cc: ""

                  Subject: Re:
                  [rtcweb] Fwd: New Version Notification for


                    Agree those are the right codecs to
                      design for. Since in each case there are fairly
                      low limits on the number of supported layers (i.e.
                      3 spatial layers for SVC), I think it should be
                      possible to pack the temporal, spatial, quality
                      layer ids into 16 bits.


                        On Fri, Jul 19, 2013 at
                          1:56 AM, Bernard Aboba 

                              If we can support VP8/9 as well as
                                H.264/5 SVC
                              that would be a start. It seems
                                doable to me.

                                    On Jul 18, 2013, at 8:34 PM, "Adam
                                    Fineberg" <>




                                        Are there other codecs you are
                                        thinking should be supported? 
                                        If it's generalized I would
                                        think we want to be able to
                                        cover all known scalable codecs.
                                        I'll look into the H264/SVC
                                        fields to see how to encode them
                                        in a generalized header.





                                        On 7/18/13 7:40 PM, Bernard
                                        Aboba wrote:

                                        I think it may be
                                          possible to generalize this. 
                                          For example, for H.264/SVC
                                          which can support temporal,
                                          spatial and quality
                                          scalability, you would need
                                          the quality_id and
                                          dependency_id in addition to
                                          the temporal_id (what you call
                                          the temporal layer index).   


                                            Date: Thu, 18 Jul 2013
                                            08:45:38 -0700




                                            Subject: Re: [rtcweb] Fwd:
                                            New Version Notification for




                                            Good question.  I'm not
                                            familiar enough with the
                                            parameter requirements of
                                            all other scalable codecs to
                                            be able to generalize.  If
                                            you'd like to help specify
                                            them, I'd be fine revising
                                            the draft to generalize.





                                            On 7/17/13 8:26 PM,
                                              Bernard Aboba wrote:

                                              Since the
                                                need is not codec
                                                specific (e.g. it arises
                                                with any codec
                                                supporting temporal,
                                                spatial and quality
                                                scalability), why

                                                 a VP8-specific RTP


                                                  Date: Wed, 17 Jul 2013
                                                  17:09:46 -0700



                                                  Subject: [rtcweb] Fwd:
                                                  New Version
                                                  Notification for


                                                    working on WebRTC
                                                    services and have
                                                    found that while
                                                    developing services
                                                    that forward VP8
                                                    video streams if we
                                                    want to take
                                                    advantage of the VP8
                                                    temporal scaling we
                                                    must get the
                                                    temporal layer
                                                    information from the
                                                    RTP header which
                                                    requires us to
                                                    decrypt the SRTP
                                                    packets. This is
                                                    undesirable both
                                                    because the
                                                    middle-box needs to
                                                    have access to the
                                                    keys as well as the
                                                    because of the added
                                                    overhead of the
                                                    cycle. This draft
                                                    proposes an RTP
                                                    header extension
                                                    that will allow us
                                                    to use the VP8
                                                    temporal layer
                                                    information included
                                                    in the header
                                                    extension and
                                                    therefore do
                                                    forwarding without
                                                    SRTP decryption.
                                                    Comments welcome.


                                                    -------- Original
                                                    Message --------
                                                          Tue, 09
                                                          Jul 2013
                                                          10:02:05 -0700
                                                          Fineberg <fineberg at>


                                                    A new version of I-D, draft-fineberg-avtext-temporal-layer-ext-00.txt
has been successfully submitted by Adam Fineberg and posted to the
IETF repository.

Filename:	 draft-fineberg-avtext-temporal-layer-ext
Revision:	 00
Title:		 A Real-Time Transport Protocol (RTP) Header Extension for VP8 Temporal Layer Information
Creation date:	 2013-07-08
Group:		 Individual Submission
Number of pages: 6

   This document defines a mechanism by which packets of Real-Time
   Tranport Protocol (RTP) video streams encoded with the VP8 codec can
   indicate, in an RTP header extension, the temporal layer information
   about the frame encoded in the RTP packet.  This information can be
   used in a middlebox performing bandwidth management of streams
   without requiring it to decrypt the streams.


                                                  rtcweb mailing list 




                            rtcweb mailing list







avtext mailing list