Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)

Justin Uberti <> Wed, 25 April 2012 03:20 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A03FF21E8026 for <>; Tue, 24 Apr 2012 20:20:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -102.037
X-Spam-Status: No, score=-102.037 tagged_above=-999 required=5 tests=[AWL=0.939, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id HuLz0G+iF9j3 for <>; Tue, 24 Apr 2012 20:20:42 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 30A7C21E8024 for <>; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
Received: by qaea16 with SMTP id a16so547821qae.10 for <>; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-system-of-record; bh=iHWd/+ObhokveWgDZDyHyCtxHuY3RgfVb2O6J04z1iE=; b=QRJkCiT/jdjylfViYyucG2lYy73flvlCG2EP/PIuDsNcvNJ+IPlHUEVTM7ZAdQ2uf2 SYTONyr4/uNThnyyIRCw7bc7Y20DKjViheiH35fRd/8ilYhAqiq1Wou2TWe0S5T8LqQs 7hn8wZfLGjTaABt9PBzWqxeRjP8hCAXFSRovSqNG+ucNkAp3HGp+TvvGq3p0QvdgL5aQ 6MZ/cxTiEsF1/3/YHbqbFDDz0HrY0nCzd/hFbVfuweGQRnvzV56ZmZHg6v1CI5T8XBz1 drUrgW9yufrOVnGM9zXN7ksWbdPbu8p3lwzij5n/xi5RHlawjHuYkS4+p/VxdXwAjV36 PjRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-system-of-record:x-gm-message-state; bh=iHWd/+ObhokveWgDZDyHyCtxHuY3RgfVb2O6J04z1iE=; b=Te+ODL6ifqiSXpTc/3dqbSCTYTaLocdeJGY9OODTePXiN2WavvA7cGA7DpIuDeAJKW wn98z0F40qj8N2Q0jzfbd4H4d8/2IuZEB3v3DZsFVK1HQvwZ9EHULC2MMzPqW2UxkEcP 1YMdyr+Y4obcC22yBYjAmZ1jb7tmFsZTariPdshddxh1U6C3bzE5qiTYPHs224SOLz1J Cv/Ok7SFxy6hlM8lEARBCXcVjc10ItedFidjPWVKzQsGDv114fJFfxU9ACX/Bv3fZV4e HfHPoxWQvLsiMtbhuI5Vo3VyBS2R50nzBM8zZmwBHvrUkXn1XOWxZkQrKKq7nRlSjVjO OKtw==
Received: by with SMTP id cb11mr948824qab.26.1335324041395; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
Received: by with SMTP id cb11mr948814qab.26.1335324041170; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Tue, 24 Apr 2012 20:20:19 -0700 (PDT)
In-Reply-To: <>
References: <> <>
From: Justin Uberti <>
Date: Tue, 24 Apr 2012 23:20:19 -0400
Message-ID: <>
To: Magnus Westerlund <>
Content-Type: multipart/alternative; boundary=20cf303b42db224db704be785bda
X-System-Of-Record: true
X-Gm-Message-State: ALoCoQn/1bHkteFghsz0a7RubSw9rhUC+EN5QGl/IGJNYpevBU4AmKYRSAHdH6XxEJMnX/c1SY62YVD6IyXCXX/5L4alAlpflO4euSY6/oVv+FLZErQs2ySD7hSrbJi5FKNF4Kv1pPBh
Cc: "" <>
Subject: Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 25 Apr 2012 03:20:44 -0000

On Tue, Apr 24, 2012 at 10:25 AM, Magnus Westerlund <> wrote:

> Harald, WG,
> This is posted as individual. When it comes to this topic I will not
> engage in any chair activities because I do have an alternative proposal
> based on
> .
> My proposal has some significant differences in how it works compared to
> Harald's. I will start with a discussion of requirements, in addition to
> Harald's, then an overview of my proposal, and ending with a discussion
> of the differences between the proposals. This is quite long but I do
> hope you will read it to the end.
> Requirements
> ------------
> Let's start with the requirements that Harald have written. I think they
> should be formulated differently. First of all I think the requirements
> are negotiated, indicated etc in the context of a peer connection.
> Secondly, when it comes to maximum resolution and maximum frame-rate,
> there are actually three different limits that I think are important;
> Highest spatial resolution, highest frame-rate, and maximum complexity
> for a given video stream. The maximum complexity is often expressed as
> the number of macroblocks a video codec can process per second. This is
> a well-established complexity measure, used as part of standardized
> video codec's "level" definitions since the introduction of H.263 Annex
> X in 2004. As this is basically a joint maximum requirement on the
> maximum amount of pixels per frame times the maximum frame-rate, there
> exist cases where this complexity figure actually is the constraining
> factor forcing a sender or receiver to either request higher frame rate
> and lower resolution or higher resolution and a lower frame rate.
> The requirements should also be clearer in that one needs to handle
> multiple SSRCs per RTP session, including multiple cameras or audio
> streams (microphones) from a single end-point, where each stream can be
> encoded using different parameter values.
> I also think it is important that we consider what additional encoding
> property parameters that would make sense for WebRTC to have a
> possibility to dynamically negotiate, request and indicate.
> Some additional use cases that should be considered in the requirements:
> 1) We have use cases including multi-party communication. One way to
> realize these is to use a central node and I believe everyone agrees
> that we should have good support for getting this usage to work well.
> Thus in a basic star topology of different participants there is going
> to be different path characteristics between the central node and each
> participant.
> 1A) This makes it necessary to consider how one can ensure that the
> central node can deliver appropriate rates. One way is to de-couple the
> links by having the central node perform individual transcodings to
> every participant. A simpler non-transcoding central node, only
> forwarding streams between end-points, would have to enforce the lowest
> path characteristics to all. If one don't want to transcode at the
> central node and not use the lowest path characteristics to all, one
> need to consider either simulcast or scalable video coding. Both
> simulcast and scalable video coding result in that at least in the
> direction from a participant to the central node one needs to use
> multiple codec operation points. Either one per peer connection, which
> is how I see simulcast being realized with today's API and
> functionality, or using an encoding format supporting scalable coding
> within a single peer connection.
> 1B) In cases where the central node has minimal functionality and
> basically is a relay and an ICE plus DTLS-SRTP termination point (I
> assume EKT to avoid having to do re-encryption), there is a need to be
> able to handle sources from different participants. This puts extra
> requirements on how to successfully negotiate the parameters. For
> example changing the values for one media source should not force one to
> renegotiate with everyone.
> 2) The non-centralized multiparty use case appears to equally or more
> stress the need for having timely dynamic control. If each sender has a
> number of peer-connections to its peers, it may use local audio levels
> to determine if its media stream is to be sent or not. Thus the amount
> of bit-rate and screen estate needed to display will rapidly change as
> different users speaks. Thus the need for minimal delay when changing
> preferences are important.
> 3) We also have speech and audio including audio-only use cases. For
> audio there could also exist desire to request or indicate changes in
> the audio bandwidth required, or usage of multi-channel.
> 4) Adaptation to legacy node from a central node in a multi-party
> conference. In some use cases legacy nodes might have special needs that
> are within the profiles a WebRTC end-point is capable of producing. Thus
> the central node might request the nodes to constrain themselves to
> particular payload types, audio bandwidth etc to meet a joining session
> participant.
> 5) There appear to exist a need for expressing dynamic requests for
> target bit-rate as one parameter. This can be supported by TMMBR
> (RFC5104) but there exist additional transport related parameters could
> help with the adaptation. These include MTU, limits on packet rate, and
> amount of aggregation of audio frames in the payload.
> Overview
> --------
> The basic idea in this proposal is to use JSEP to establish the outer
> limits for behavior and then use Codec Operation Point (COP) proposal as
> detailed in draft-westerlund-avtext-codec-operation-point to handle
> dynamic changes during the session.
> So highest resolution, frame-rate and maximum complexity are expressed
> in JSEP SDP. Complexity is in several video codecs expressed by profile
> and level. I know that VP8 currently doesn't have this but it is under
> discussion when it comes to these parameters.
> During the session the browser implementation detects when there is need
> to use COP to do any of the following things.
> A) Request new target values for codec operation, for example due to
> that the GUI element displaying a video has changed due to window resize.
> B) Indicate when the end-point in its role as sender change parameters.
> In addition to just spatial resolution and video frame rate, I propose
> that the following parameters are considered as parameters that could be
> dynamically possible to indicate and request.
> Spatial resolution (as x and y resolution), Frame-rate, Picture Aspect
> ratio, Sample Aspect Ratio, Payload Type, Bit-rate, Token Bucket Size
> (To control burstiness of sender), Channels, Sampling Rate, Maximum RTP
> Packet Size, Maximum RTP Packet Rate, and Application Data Unit
> Aggregation (to control amount of audio frames in the same RTP packet).
> Differences
> -----------
> A) Using COP and using SDP based signaling for the dynamic changes are
> two quite different models in relation to how the interaction happens.
> For COP this all happens in the browser, normally initiated by the
> browser's own determination that a COP request or notification is
> needed. Harald's proposal appears to require that the JS initiate a
> renegotiation. This puts a requirement on the implementation to listen
> to the correct callbacks to know when changes happens, such as window
> resize. To my knowledge there are not yet any proposals for how the
> browser can initiate a JSEP renegotiation.

Maybe I misunderstand what you are saying, but the application will
definitely know when the browser size changes, and can then trigger a JSEP
renegotiation by calling createOffer and shipping it off.

I have the opposite concern - how can the browser know when the application
makes a change, for instance to display a participant in a large view
versus a small view? This may be possible if using a <video/> for display,
but I don't think it will be possible when using WebGL for rendering, where
the size of the view will be dictated only by the geometry of the scene.

> Thus COP has the advantage that there is no API changes to get browser
> triggered parameter changes. W3C can select too but are not required to
> add API methods to allow JS to make codec parameter requests.
> The next thing is that COP does not require the JS application to have
> code to detect and handle re-negotiation. This makes it simpler for the
> basic application to get good behavior and they are not interrupted nor
> do they need to handle JSEP & Offer/Answer state machine lock-out
> effects due to dynamic changes.
> How big impact these API issues have are unclear as W3C currently appear
> not to have included any discussion of how the browser can initiate a
> offer/answer exchange towards the JS when it determines a need to change
> parameters.
> But I am worried that using SDP and with an API that requires the
> application to listen for triggers that could benefit from a codec
> parameter renegotiation. This will likely only result in good behavior
> for the JS application implementors that are really good and find out
> what listeners and what signaling tricks are needed with JSEP to get
> good performance. I would much rather prefer good behavior by default in
> simple applications, i.e. using the default behavior that the browser
> implementor have put in.

The issue here is that this behavior needs to be sufficiently specified, or
else we will have many different behaviors, none of which can be fully
overridden by the app. I worry this will make life hard for people trying
to develop sophisticated apps.

> B) Using the media plane, i.e. RTCP for this signaling lets it in most
> case go directly between the encoding and the decoding entity in the
> code. There is no need to involve the JS nor the signaling server. One
> issue of using JSEP and SDP is that the state machine lock-out effects
> that can occur if one has sent an Offer. Then that browser may not be
> able to send a new updated Offer reflecting the latest change until the
> answer has been properly processed. COP doesn't have these limitations.
> It can send a new parameter request immediately, only limited by RTCP
> bandwidth restrictions. Using the media plane in my view guarantees that
> COP is never worse than what the signaling plane can perform at its best.
> C) As the general restrictions are determined in the initial
> negotiation, COP doesn't have the issue that in-flight media streams can
> become out of bounds. Thus no need for a two phase change of signaling
> parameters.
> D) Relying on draft-lennox-mmusic-sdp-source-selection-04 has several
> additional implications that should be discussed separately. The draft
> currently includes the following functionalities.
>  D1) It contains a media stream pause proposal. This makes it subject
> to the architectural debate currently ongoing in dispatch around
> draft-westerlund-avtext-rtp-stream-pause-00 which is a competing
> proposal for the same functionality.
>  D2) The inclusion of max desired frame rate in a SSRC specific way
>  D3) Extending the image attribute to be per SSRC specific expression
> of desired resolutions.
>  D4) Expressing relative priority in receiving different SSRCs.
>  D5) Providing an "information" on a sent SSRC
>  D6) Indication if the media sender is actively sending media using the
> given SSRC.
> Is it a correct observation that only D2 and D3 are required for the
> functionality of Resolution negotiation?
> E) The standardization situation is similar for both proposals. They are
> both relying on Internet drafts that are currently individual
> submissions. Both are partially caught in the architectural discussion
> which was held in Paris in the DISPATCH WG around Media pause/resume
> (draft-westerlund-avtext-rtp-stream-pause-00) and Media Stream Selection
> (draft-westerlund-dispatch-stream-selection-00) on what the most
> appropriate level of discussion are. This discussion will continue on
> the RAI-Area mailing list.
> F) As seen by the discussion on the mailing list the imageattr
> definitions may not be a 100% match to what is desired with Harald's
> proposal. I believe that COP's are more appropriate especially the
> "target" values possibility. In addition these are still open for
> adjustment and if they don't match WebRTC's requirements.
> I would also like to point out that I believe this functionality is also
> highly desirable for CLUE and that their requirements should be taken
> into account. I do think that this is one of the aspects where having
> matching functionality will make it much easier to have WebRTC to CLUE
> interworking.
> Thanks for reading all the way here!
> Cheers
> Magnus Westerlund
> ----------------------------------------------------------------------
> Multimedia Technologies, Ericsson Research EAB/TVM
> ----------------------------------------------------------------------
> Ericsson AB                | Phone  +46 10 7148287
> Färögatan 6                | Mobile +46 73 0949079
> SE-164 80 Stockholm, Sweden| mailto:
> ----------------------------------------------------------------------
> _______________________________________________
> rtcweb mailing list