Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)

Justin Uberti <juberti@google.com> Wed, 25 April 2012 03:20 UTC

Return-Path: <juberti@google.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A03FF21E8026 for <rtcweb@ietfa.amsl.com>; Tue, 24 Apr 2012 20:20:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.037
X-Spam-Level:
X-Spam-Status: No, score=-102.037 tagged_above=-999 required=5 tests=[AWL=0.939, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HuLz0G+iF9j3 for <rtcweb@ietfa.amsl.com>; Tue, 24 Apr 2012 20:20:42 -0700 (PDT)
Received: from mail-qa0-f51.google.com (mail-qa0-f51.google.com [209.85.216.51]) by ietfa.amsl.com (Postfix) with ESMTP id 30A7C21E8024 for <rtcweb@ietf.org>; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
Received: by qaea16 with SMTP id a16so547821qae.10 for <rtcweb@ietf.org>; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-system-of-record; bh=iHWd/+ObhokveWgDZDyHyCtxHuY3RgfVb2O6J04z1iE=; b=QRJkCiT/jdjylfViYyucG2lYy73flvlCG2EP/PIuDsNcvNJ+IPlHUEVTM7ZAdQ2uf2 SYTONyr4/uNThnyyIRCw7bc7Y20DKjViheiH35fRd/8ilYhAqiq1Wou2TWe0S5T8LqQs 7hn8wZfLGjTaABt9PBzWqxeRjP8hCAXFSRovSqNG+ucNkAp3HGp+TvvGq3p0QvdgL5aQ 6MZ/cxTiEsF1/3/YHbqbFDDz0HrY0nCzd/hFbVfuweGQRnvzV56ZmZHg6v1CI5T8XBz1 drUrgW9yufrOVnGM9zXN7ksWbdPbu8p3lwzij5n/xi5RHlawjHuYkS4+p/VxdXwAjV36 PjRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-system-of-record:x-gm-message-state; bh=iHWd/+ObhokveWgDZDyHyCtxHuY3RgfVb2O6J04z1iE=; b=Te+ODL6ifqiSXpTc/3dqbSCTYTaLocdeJGY9OODTePXiN2WavvA7cGA7DpIuDeAJKW wn98z0F40qj8N2Q0jzfbd4H4d8/2IuZEB3v3DZsFVK1HQvwZ9EHULC2MMzPqW2UxkEcP 1YMdyr+Y4obcC22yBYjAmZ1jb7tmFsZTariPdshddxh1U6C3bzE5qiTYPHs224SOLz1J Cv/Ok7SFxy6hlM8lEARBCXcVjc10ItedFidjPWVKzQsGDv114fJFfxU9ACX/Bv3fZV4e HfHPoxWQvLsiMtbhuI5Vo3VyBS2R50nzBM8zZmwBHvrUkXn1XOWxZkQrKKq7nRlSjVjO OKtw==
Received: by 10.224.182.75 with SMTP id cb11mr948824qab.26.1335324041395; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
Received: by 10.224.182.75 with SMTP id cb11mr948814qab.26.1335324041170; Tue, 24 Apr 2012 20:20:41 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.229.217.129 with HTTP; Tue, 24 Apr 2012 20:20:19 -0700 (PDT)
In-Reply-To: <4F96B7C9.1030609@ericsson.com>
References: <4F869648.2020605@alvestrand.no> <4F96B7C9.1030609@ericsson.com>
From: Justin Uberti <juberti@google.com>
Date: Tue, 24 Apr 2012 23:20:19 -0400
Message-ID: <CAOJ7v-0ZPxxPEpr6-r8tUyiWJ0MJz47s59kD2tiUPo7Tkj9qcA@mail.gmail.com>
To: Magnus Westerlund <magnus.westerlund@ericsson.com>
Content-Type: multipart/alternative; boundary=20cf303b42db224db704be785bda
X-System-Of-Record: true
X-Gm-Message-State: ALoCoQn/1bHkteFghsz0a7RubSw9rhUC+EN5QGl/IGJNYpevBU4AmKYRSAHdH6XxEJMnX/c1SY62YVD6IyXCXX/5L4alAlpflO4euSY6/oVv+FLZErQs2ySD7hSrbJi5FKNF4Kv1pPBh
Cc: "rtcweb@ietf.org" <rtcweb@ietf.org>
Subject: Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Apr 2012 03:20:44 -0000

On Tue, Apr 24, 2012 at 10:25 AM, Magnus Westerlund <
magnus.westerlund@ericsson.com> wrote:

>
> Harald, WG,
>
> This is posted as individual. When it comes to this topic I will not
> engage in any chair activities because I do have an alternative proposal
> based on
>
> https://datatracker.ietf.org/doc/draft-westerlund-avtext-codec-operation-point/
> .
>
>
> My proposal has some significant differences in how it works compared to
> Harald's. I will start with a discussion of requirements, in addition to
> Harald's, then an overview of my proposal, and ending with a discussion
> of the differences between the proposals. This is quite long but I do
> hope you will read it to the end.
>
> Requirements
> ------------
>
> Let's start with the requirements that Harald have written. I think they
> should be formulated differently. First of all I think the requirements
> are negotiated, indicated etc in the context of a peer connection.
>
> Secondly, when it comes to maximum resolution and maximum frame-rate,
> there are actually three different limits that I think are important;
> Highest spatial resolution, highest frame-rate, and maximum complexity
> for a given video stream. The maximum complexity is often expressed as
> the number of macroblocks a video codec can process per second. This is
> a well-established complexity measure, used as part of standardized
> video codec's "level" definitions since the introduction of H.263 Annex
> X in 2004. As this is basically a joint maximum requirement on the
> maximum amount of pixels per frame times the maximum frame-rate, there
> exist cases where this complexity figure actually is the constraining
> factor forcing a sender or receiver to either request higher frame rate
> and lower resolution or higher resolution and a lower frame rate.
>
> The requirements should also be clearer in that one needs to handle
> multiple SSRCs per RTP session, including multiple cameras or audio
> streams (microphones) from a single end-point, where each stream can be
> encoded using different parameter values.
>
> I also think it is important that we consider what additional encoding
> property parameters that would make sense for WebRTC to have a
> possibility to dynamically negotiate, request and indicate.
>
> Some additional use cases that should be considered in the requirements:
>
> 1) We have use cases including multi-party communication. One way to
> realize these is to use a central node and I believe everyone agrees
> that we should have good support for getting this usage to work well.
> Thus in a basic star topology of different participants there is going
> to be different path characteristics between the central node and each
> participant.
>
> 1A) This makes it necessary to consider how one can ensure that the
> central node can deliver appropriate rates. One way is to de-couple the
> links by having the central node perform individual transcodings to
> every participant. A simpler non-transcoding central node, only
> forwarding streams between end-points, would have to enforce the lowest
> path characteristics to all. If one don't want to transcode at the
> central node and not use the lowest path characteristics to all, one
> need to consider either simulcast or scalable video coding. Both
> simulcast and scalable video coding result in that at least in the
> direction from a participant to the central node one needs to use
> multiple codec operation points. Either one per peer connection, which
> is how I see simulcast being realized with today's API and
> functionality, or using an encoding format supporting scalable coding
> within a single peer connection.
>
> 1B) In cases where the central node has minimal functionality and
> basically is a relay and an ICE plus DTLS-SRTP termination point (I
> assume EKT to avoid having to do re-encryption), there is a need to be
> able to handle sources from different participants. This puts extra
> requirements on how to successfully negotiate the parameters. For
> example changing the values for one media source should not force one to
> renegotiate with everyone.
>
> 2) The non-centralized multiparty use case appears to equally or more
> stress the need for having timely dynamic control. If each sender has a
> number of peer-connections to its peers, it may use local audio levels
> to determine if its media stream is to be sent or not. Thus the amount
> of bit-rate and screen estate needed to display will rapidly change as
> different users speaks. Thus the need for minimal delay when changing
> preferences are important.
>
> 3) We also have speech and audio including audio-only use cases. For
> audio there could also exist desire to request or indicate changes in
> the audio bandwidth required, or usage of multi-channel.
>
> 4) Adaptation to legacy node from a central node in a multi-party
> conference. In some use cases legacy nodes might have special needs that
> are within the profiles a WebRTC end-point is capable of producing. Thus
> the central node might request the nodes to constrain themselves to
> particular payload types, audio bandwidth etc to meet a joining session
> participant.
>
> 5) There appear to exist a need for expressing dynamic requests for
> target bit-rate as one parameter. This can be supported by TMMBR
> (RFC5104) but there exist additional transport related parameters could
> help with the adaptation. These include MTU, limits on packet rate, and
> amount of aggregation of audio frames in the payload.
>
> Overview
> --------
>
> The basic idea in this proposal is to use JSEP to establish the outer
> limits for behavior and then use Codec Operation Point (COP) proposal as
> detailed in draft-westerlund-avtext-codec-operation-point to handle
> dynamic changes during the session.
>
> So highest resolution, frame-rate and maximum complexity are expressed
> in JSEP SDP. Complexity is in several video codecs expressed by profile
> and level. I know that VP8 currently doesn't have this but it is under
> discussion when it comes to these parameters.
>
> During the session the browser implementation detects when there is need
> to use COP to do any of the following things.
>
> A) Request new target values for codec operation, for example due to
> that the GUI element displaying a video has changed due to window resize.
>
> B) Indicate when the end-point in its role as sender change parameters.
>
> In addition to just spatial resolution and video frame rate, I propose
> that the following parameters are considered as parameters that could be
> dynamically possible to indicate and request.
>
> Spatial resolution (as x and y resolution), Frame-rate, Picture Aspect
> ratio, Sample Aspect Ratio, Payload Type, Bit-rate, Token Bucket Size
> (To control burstiness of sender), Channels, Sampling Rate, Maximum RTP
> Packet Size, Maximum RTP Packet Rate, and Application Data Unit
> Aggregation (to control amount of audio frames in the same RTP packet).
>
>
> Differences
> -----------
>
> A) Using COP and using SDP based signaling for the dynamic changes are
> two quite different models in relation to how the interaction happens.
>
> For COP this all happens in the browser, normally initiated by the
> browser's own determination that a COP request or notification is
> needed. Harald's proposal appears to require that the JS initiate a
> renegotiation. This puts a requirement on the implementation to listen
> to the correct callbacks to know when changes happens, such as window
> resize. To my knowledge there are not yet any proposals for how the
> browser can initiate a JSEP renegotiation.
>

Maybe I misunderstand what you are saying, but the application will
definitely know when the browser size changes, and can then trigger a JSEP
renegotiation by calling createOffer and shipping it off.

I have the opposite concern - how can the browser know when the application
makes a change, for instance to display a participant in a large view
versus a small view? This may be possible if using a <video/> for display,
but I don't think it will be possible when using WebGL for rendering, where
the size of the view will be dictated only by the geometry of the scene.



> Thus COP has the advantage that there is no API changes to get browser
> triggered parameter changes. W3C can select too but are not required to
> add API methods to allow JS to make codec parameter requests.
>
> The next thing is that COP does not require the JS application to have
> code to detect and handle re-negotiation. This makes it simpler for the
> basic application to get good behavior and they are not interrupted nor
> do they need to handle JSEP & Offer/Answer state machine lock-out
> effects due to dynamic changes.
>
> How big impact these API issues have are unclear as W3C currently appear
> not to have included any discussion of how the browser can initiate a
> offer/answer exchange towards the JS when it determines a need to change
> parameters.
>
> But I am worried that using SDP and with an API that requires the
> application to listen for triggers that could benefit from a codec
> parameter renegotiation. This will likely only result in good behavior
> for the JS application implementors that are really good and find out
> what listeners and what signaling tricks are needed with JSEP to get
> good performance. I would much rather prefer good behavior by default in
> simple applications, i.e. using the default behavior that the browser
> implementor have put in.
>

The issue here is that this behavior needs to be sufficiently specified, or
else we will have many different behaviors, none of which can be fully
overridden by the app. I worry this will make life hard for people trying
to develop sophisticated apps.


> B) Using the media plane, i.e. RTCP for this signaling lets it in most
> case go directly between the encoding and the decoding entity in the
> code. There is no need to involve the JS nor the signaling server. One
> issue of using JSEP and SDP is that the state machine lock-out effects
> that can occur if one has sent an Offer. Then that browser may not be
> able to send a new updated Offer reflecting the latest change until the
> answer has been properly processed. COP doesn't have these limitations.
> It can send a new parameter request immediately, only limited by RTCP
> bandwidth restrictions. Using the media plane in my view guarantees that
> COP is never worse than what the signaling plane can perform at its best.
>
> C) As the general restrictions are determined in the initial
> negotiation, COP doesn't have the issue that in-flight media streams can
> become out of bounds. Thus no need for a two phase change of signaling
> parameters.
>
> D) Relying on draft-lennox-mmusic-sdp-source-selection-04 has several
> additional implications that should be discussed separately. The draft
> currently includes the following functionalities.
>
>  D1) It contains a media stream pause proposal. This makes it subject
> to the architectural debate currently ongoing in dispatch around
> draft-westerlund-avtext-rtp-stream-pause-00 which is a competing
> proposal for the same functionality.
>
>  D2) The inclusion of max desired frame rate in a SSRC specific way
>
>  D3) Extending the image attribute to be per SSRC specific expression
> of desired resolutions.
>
>  D4) Expressing relative priority in receiving different SSRCs.
>
>  D5) Providing an "information" on a sent SSRC
>
>  D6) Indication if the media sender is actively sending media using the
> given SSRC.
>
> Is it a correct observation that only D2 and D3 are required for the
> functionality of Resolution negotiation?
>
> E) The standardization situation is similar for both proposals. They are
> both relying on Internet drafts that are currently individual
> submissions. Both are partially caught in the architectural discussion
> which was held in Paris in the DISPATCH WG around Media pause/resume
> (draft-westerlund-avtext-rtp-stream-pause-00) and Media Stream Selection
> (draft-westerlund-dispatch-stream-selection-00) on what the most
> appropriate level of discussion are. This discussion will continue on
> the RAI-Area mailing list.
>
> F) As seen by the discussion on the mailing list the imageattr
> definitions may not be a 100% match to what is desired with Harald's
> proposal. I believe that COP's are more appropriate especially the
> "target" values possibility. In addition these are still open for
> adjustment and if they don't match WebRTC's requirements.
>
> I would also like to point out that I believe this functionality is also
> highly desirable for CLUE and that their requirements should be taken
> into account. I do think that this is one of the aspects where having
> matching functionality will make it much easier to have WebRTC to CLUE
> interworking.
>
> Thanks for reading all the way here!
>
> Cheers
>
> Magnus Westerlund
>
> ----------------------------------------------------------------------
> Multimedia Technologies, Ericsson Research EAB/TVM
> ----------------------------------------------------------------------
> Ericsson AB                | Phone  +46 10 7148287
> Färögatan 6                | Mobile +46 73 0949079
> SE-164 80 Stockholm, Sweden| mailto: magnus.westerlund@ericsson.com
> ----------------------------------------------------------------------
>
>
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb
>