Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)
Harald Alvestrand <harald@alvestrand.no> Tue, 24 April 2012 16:43 UTC
Return-Path: <harald@alvestrand.no>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD92821F8709 for <rtcweb@ietfa.amsl.com>; Tue, 24 Apr 2012 09:43:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.499
X-Spam-Level:
X-Spam-Status: No, score=-110.499 tagged_above=-999 required=5 tests=[AWL=0.100, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kGwkL0JCf3Hd for <rtcweb@ietfa.amsl.com>; Tue, 24 Apr 2012 09:43:47 -0700 (PDT)
Received: from eikenes.alvestrand.no (eikenes.alvestrand.no [158.38.152.233]) by ietfa.amsl.com (Postfix) with ESMTP id EE39C21F8702 for <rtcweb@ietf.org>; Tue, 24 Apr 2012 09:43:46 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id C525B39E0CD; Tue, 24 Apr 2012 18:43:45 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PqdWbrt4WxhO; Tue, 24 Apr 2012 18:43:44 +0200 (CEST)
Received: from hta-dell.lul.corp.google.com (62-20-124-50.customer.telia.com [62.20.124.50]) by eikenes.alvestrand.no (Postfix) with ESMTPSA id 3F4AE39E089; Tue, 24 Apr 2012 18:43:44 +0200 (CEST)
Message-ID: <4F96D83F.8020705@alvestrand.no>
Date: Tue, 24 Apr 2012 18:43:43 +0200
From: Harald Alvestrand <harald@alvestrand.no>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.28) Gecko/20120313 Thunderbird/3.1.20
MIME-Version: 1.0
To: Magnus Westerlund <magnus.westerlund@ericsson.com>
References: <4F869648.2020605@alvestrand.no> <4F96B7C9.1030609@ericsson.com>
In-Reply-To: <4F96B7C9.1030609@ericsson.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: "rtcweb@ietf.org" <rtcweb@ietf.org>
Subject: Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Apr 2012 16:43:48 -0000
I'm happy to see this - there are a number of quibbles I want to make, but overall, I like it. My big worry is the IPR issue - I think that negotiation of video size during a call is likely to be "MUST implement", and I don't want to require implementation of a protocol where people have filed non-RF IPR claims against it if there are viable alternatives. Harald On 04/24/2012 04:25 PM, Magnus Westerlund wrote: > Harald, WG, > > This is posted as individual. When it comes to this topic I will not > engage in any chair activities because I do have an alternative proposal > based on > https://datatracker.ietf.org/doc/draft-westerlund-avtext-codec-operation-point/. > > > My proposal has some significant differences in how it works compared to > Harald's. I will start with a discussion of requirements, in addition to > Harald's, then an overview of my proposal, and ending with a discussion > of the differences between the proposals. This is quite long but I do > hope you will read it to the end. > > Requirements > ------------ > > Let's start with the requirements that Harald have written. I think they > should be formulated differently. First of all I think the requirements > are negotiated, indicated etc in the context of a peer connection. > > Secondly, when it comes to maximum resolution and maximum frame-rate, > there are actually three different limits that I think are important; > Highest spatial resolution, highest frame-rate, and maximum complexity > for a given video stream. The maximum complexity is often expressed as > the number of macroblocks a video codec can process per second. This is > a well-established complexity measure, used as part of standardized > video codec's "level" definitions since the introduction of H.263 Annex > X in 2004. As this is basically a joint maximum requirement on the > maximum amount of pixels per frame times the maximum frame-rate, there > exist cases where this complexity figure actually is the constraining > factor forcing a sender or receiver to either request higher frame rate > and lower resolution or higher resolution and a lower frame rate. > > The requirements should also be clearer in that one needs to handle > multiple SSRCs per RTP session, including multiple cameras or audio > streams (microphones) from a single end-point, where each stream can be > encoded using different parameter values. > > I also think it is important that we consider what additional encoding > property parameters that would make sense for WebRTC to have a > possibility to dynamically negotiate, request and indicate. > > Some additional use cases that should be considered in the requirements: > > 1) We have use cases including multi-party communication. One way to > realize these is to use a central node and I believe everyone agrees > that we should have good support for getting this usage to work well. > Thus in a basic star topology of different participants there is going > to be different path characteristics between the central node and each > participant. > > 1A) This makes it necessary to consider how one can ensure that the > central node can deliver appropriate rates. One way is to de-couple the > links by having the central node perform individual transcodings to > every participant. A simpler non-transcoding central node, only > forwarding streams between end-points, would have to enforce the lowest > path characteristics to all. If one don't want to transcode at the > central node and not use the lowest path characteristics to all, one > need to consider either simulcast or scalable video coding. Both > simulcast and scalable video coding result in that at least in the > direction from a participant to the central node one needs to use > multiple codec operation points. Either one per peer connection, which > is how I see simulcast being realized with today's API and > functionality, or using an encoding format supporting scalable coding > within a single peer connection. > > 1B) In cases where the central node has minimal functionality and > basically is a relay and an ICE plus DTLS-SRTP termination point (I > assume EKT to avoid having to do re-encryption), there is a need to be > able to handle sources from different participants. This puts extra > requirements on how to successfully negotiate the parameters. For > example changing the values for one media source should not force one to > renegotiate with everyone. > > 2) The non-centralized multiparty use case appears to equally or more > stress the need for having timely dynamic control. If each sender has a > number of peer-connections to its peers, it may use local audio levels > to determine if its media stream is to be sent or not. Thus the amount > of bit-rate and screen estate needed to display will rapidly change as > different users speaks. Thus the need for minimal delay when changing > preferences are important. > > 3) We also have speech and audio including audio-only use cases. For > audio there could also exist desire to request or indicate changes in > the audio bandwidth required, or usage of multi-channel. > > 4) Adaptation to legacy node from a central node in a multi-party > conference. In some use cases legacy nodes might have special needs that > are within the profiles a WebRTC end-point is capable of producing. Thus > the central node might request the nodes to constrain themselves to > particular payload types, audio bandwidth etc to meet a joining session > participant. > > 5) There appear to exist a need for expressing dynamic requests for > target bit-rate as one parameter. This can be supported by TMMBR > (RFC5104) but there exist additional transport related parameters could > help with the adaptation. These include MTU, limits on packet rate, and > amount of aggregation of audio frames in the payload. > > Overview > -------- > > The basic idea in this proposal is to use JSEP to establish the outer > limits for behavior and then use Codec Operation Point (COP) proposal as > detailed in draft-westerlund-avtext-codec-operation-point to handle > dynamic changes during the session. > > So highest resolution, frame-rate and maximum complexity are expressed > in JSEP SDP. Complexity is in several video codecs expressed by profile > and level. I know that VP8 currently doesn't have this but it is under > discussion when it comes to these parameters. > > During the session the browser implementation detects when there is need > to use COP to do any of the following things. > > A) Request new target values for codec operation, for example due to > that the GUI element displaying a video has changed due to window resize. > > B) Indicate when the end-point in its role as sender change parameters. > > In addition to just spatial resolution and video frame rate, I propose > that the following parameters are considered as parameters that could be > dynamically possible to indicate and request. > > Spatial resolution (as x and y resolution), Frame-rate, Picture Aspect > ratio, Sample Aspect Ratio, Payload Type, Bit-rate, Token Bucket Size > (To control burstiness of sender), Channels, Sampling Rate, Maximum RTP > Packet Size, Maximum RTP Packet Rate, and Application Data Unit > Aggregation (to control amount of audio frames in the same RTP packet). > > > Differences > ----------- > > A) Using COP and using SDP based signaling for the dynamic changes are > two quite different models in relation to how the interaction happens. > > For COP this all happens in the browser, normally initiated by the > browser's own determination that a COP request or notification is > needed. Harald's proposal appears to require that the JS initiate a > renegotiation. This puts a requirement on the implementation to listen > to the correct callbacks to know when changes happens, such as window > resize. To my knowledge there are not yet any proposals for how the > browser can initiate a JSEP renegotiation. > > Thus COP has the advantage that there is no API changes to get browser > triggered parameter changes. W3C can select too but are not required to > add API methods to allow JS to make codec parameter requests. > > The next thing is that COP does not require the JS application to have > code to detect and handle re-negotiation. This makes it simpler for the > basic application to get good behavior and they are not interrupted nor > do they need to handle JSEP& Offer/Answer state machine lock-out > effects due to dynamic changes. > > How big impact these API issues have are unclear as W3C currently appear > not to have included any discussion of how the browser can initiate a > offer/answer exchange towards the JS when it determines a need to change > parameters. > > But I am worried that using SDP and with an API that requires the > application to listen for triggers that could benefit from a codec > parameter renegotiation. This will likely only result in good behavior > for the JS application implementors that are really good and find out > what listeners and what signaling tricks are needed with JSEP to get > good performance. I would much rather prefer good behavior by default in > simple applications, i.e. using the default behavior that the browser > implementor have put in. > > B) Using the media plane, i.e. RTCP for this signaling lets it in most > case go directly between the encoding and the decoding entity in the > code. There is no need to involve the JS nor the signaling server. One > issue of using JSEP and SDP is that the state machine lock-out effects > that can occur if one has sent an Offer. Then that browser may not be > able to send a new updated Offer reflecting the latest change until the > answer has been properly processed. COP doesn't have these limitations. > It can send a new parameter request immediately, only limited by RTCP > bandwidth restrictions. Using the media plane in my view guarantees that > COP is never worse than what the signaling plane can perform at its best. > > C) As the general restrictions are determined in the initial > negotiation, COP doesn't have the issue that in-flight media streams can > become out of bounds. Thus no need for a two phase change of signaling > parameters. > > D) Relying on draft-lennox-mmusic-sdp-source-selection-04 has several > additional implications that should be discussed separately. The draft > currently includes the following functionalities. > > D1) It contains a media stream pause proposal. This makes it subject > to the architectural debate currently ongoing in dispatch around > draft-westerlund-avtext-rtp-stream-pause-00 which is a competing > proposal for the same functionality. > > D2) The inclusion of max desired frame rate in a SSRC specific way > > D3) Extending the image attribute to be per SSRC specific expression > of desired resolutions. > > D4) Expressing relative priority in receiving different SSRCs. > > D5) Providing an "information" on a sent SSRC > > D6) Indication if the media sender is actively sending media using the > given SSRC. > > Is it a correct observation that only D2 and D3 are required for the > functionality of Resolution negotiation? > > E) The standardization situation is similar for both proposals. They are > both relying on Internet drafts that are currently individual > submissions. Both are partially caught in the architectural discussion > which was held in Paris in the DISPATCH WG around Media pause/resume > (draft-westerlund-avtext-rtp-stream-pause-00) and Media Stream Selection > (draft-westerlund-dispatch-stream-selection-00) on what the most > appropriate level of discussion are. This discussion will continue on > the RAI-Area mailing list. > > F) As seen by the discussion on the mailing list the imageattr > definitions may not be a 100% match to what is desired with Harald's > proposal. I believe that COP's are more appropriate especially the > "target" values possibility. In addition these are still open for > adjustment and if they don't match WebRTC's requirements. > > I would also like to point out that I believe this functionality is also > highly desirable for CLUE and that their requirements should be taken > into account. I do think that this is one of the aspects where having > matching functionality will make it much easier to have WebRTC to CLUE > interworking. > > Thanks for reading all the way here! > > Cheers > > Magnus Westerlund > > ---------------------------------------------------------------------- > Multimedia Technologies, Ericsson Research EAB/TVM > ---------------------------------------------------------------------- > Ericsson AB | Phone +46 10 7148287 > Färögatan 6 | Mobile +46 73 0949079 > SE-164 80 Stockholm, Sweden| mailto: magnus.westerlund@ericsson.com > ---------------------------------------------------------------------- > > >
- [rtcweb] Resolution negotiation - a contribution Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Timothy B. Terriberry
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Marshall Eubanks
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Stephan Wenger
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Stephan Wenger
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- [rtcweb] VP8 payload, decoder processing capabili… Stephan Wenger
- Re: [rtcweb] [payload] VP8 payload, decoder proce… Yuepeiyu (Roy)
- Re: [rtcweb] Resolution negotiation - mandatory /… Roni Even
- Re: [rtcweb] Resolution negotiation - mandatory /… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - mandatory /… Roni Even
- Re: [rtcweb] Resolution negotiation - mandatory /… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - mandatory /… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - mandatory /… Roni Even
- [rtcweb] Alternative Proposal for Dynamic Codec P… Magnus Westerlund
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Harald Alvestrand
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Timothy B. Terriberry
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Bernard Aboba
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Harald Alvestrand
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Justin Uberti
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Magnus Westerlund
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Harald Alvestrand
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Magnus Westerlund
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Justin Uberti
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Magnus Westerlund
- Re: [rtcweb] [payload] VP8 payload, decoder proce… Cullen Jennings (fluffy)
- Re: [rtcweb] [payload] VP8 payload, decoder proce… Harald Alvestrand