[rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)
Magnus Westerlund <magnus.westerlund@ericsson.com> Tue, 24 April 2012 14:25 UTC
Return-Path: <magnus.westerlund@ericsson.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C859821F8807 for <rtcweb@ietfa.amsl.com>; Tue, 24 Apr 2012 07:25:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.126
X-Spam-Level:
X-Spam-Status: No, score=-106.126 tagged_above=-999 required=5 tests=[AWL=0.123, BAYES_00=-2.599, HELO_EQ_SE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cm9eLcQvrxR6 for <rtcweb@ietfa.amsl.com>; Tue, 24 Apr 2012 07:25:31 -0700 (PDT)
Received: from mailgw2.ericsson.se (mailgw2.ericsson.se [193.180.251.37]) by ietfa.amsl.com (Postfix) with ESMTP id 3023521F85AE for <rtcweb@ietf.org>; Tue, 24 Apr 2012 07:25:30 -0700 (PDT)
X-AuditID: c1b4fb25-b7b18ae000000dce-3d-4f96b7d9dc5e
Authentication-Results: mailgw2.ericsson.se x-tls.subject="/CN=esessmw0237"; auth=fail (cipher=AES128-SHA)
Received: from esessmw0237.eemea.ericsson.se (Unknown_Domain [153.88.253.125]) (using TLS with cipher AES128-SHA (AES128-SHA/128 bits)) (Client CN "esessmw0237", Issuer "esessmw0237" (not verified)) by mailgw2.ericsson.se (Symantec Mail Security) with SMTP id 98.AA.03534.9D7B69F4; Tue, 24 Apr 2012 16:25:30 +0200 (CEST)
Received: from [127.0.0.1] (153.88.115.8) by esessmw0237.eemea.ericsson.se (153.88.115.91) with Microsoft SMTP Server id 8.3.213.0; Tue, 24 Apr 2012 16:25:28 +0200
Message-ID: <4F96B7C9.1030609@ericsson.com>
Date: Tue, 24 Apr 2012 16:25:13 +0200
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1
MIME-Version: 1.0
To: Harald Alvestrand <harald@alvestrand.no>
References: <4F869648.2020605@alvestrand.no>
In-Reply-To: <4F869648.2020605@alvestrand.no>
X-Enigmail-Version: 1.4
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Brightmail-Tracker: AAAAAA==
Cc: "rtcweb@ietf.org" <rtcweb@ietf.org>
Subject: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Apr 2012 14:25:32 -0000
Harald, WG, This is posted as individual. When it comes to this topic I will not engage in any chair activities because I do have an alternative proposal based on https://datatracker.ietf.org/doc/draft-westerlund-avtext-codec-operation-point/. My proposal has some significant differences in how it works compared to Harald's. I will start with a discussion of requirements, in addition to Harald's, then an overview of my proposal, and ending with a discussion of the differences between the proposals. This is quite long but I do hope you will read it to the end. Requirements ------------ Let's start with the requirements that Harald have written. I think they should be formulated differently. First of all I think the requirements are negotiated, indicated etc in the context of a peer connection. Secondly, when it comes to maximum resolution and maximum frame-rate, there are actually three different limits that I think are important; Highest spatial resolution, highest frame-rate, and maximum complexity for a given video stream. The maximum complexity is often expressed as the number of macroblocks a video codec can process per second. This is a well-established complexity measure, used as part of standardized video codec's "level" definitions since the introduction of H.263 Annex X in 2004. As this is basically a joint maximum requirement on the maximum amount of pixels per frame times the maximum frame-rate, there exist cases where this complexity figure actually is the constraining factor forcing a sender or receiver to either request higher frame rate and lower resolution or higher resolution and a lower frame rate. The requirements should also be clearer in that one needs to handle multiple SSRCs per RTP session, including multiple cameras or audio streams (microphones) from a single end-point, where each stream can be encoded using different parameter values. I also think it is important that we consider what additional encoding property parameters that would make sense for WebRTC to have a possibility to dynamically negotiate, request and indicate. Some additional use cases that should be considered in the requirements: 1) We have use cases including multi-party communication. One way to realize these is to use a central node and I believe everyone agrees that we should have good support for getting this usage to work well. Thus in a basic star topology of different participants there is going to be different path characteristics between the central node and each participant. 1A) This makes it necessary to consider how one can ensure that the central node can deliver appropriate rates. One way is to de-couple the links by having the central node perform individual transcodings to every participant. A simpler non-transcoding central node, only forwarding streams between end-points, would have to enforce the lowest path characteristics to all. If one don't want to transcode at the central node and not use the lowest path characteristics to all, one need to consider either simulcast or scalable video coding. Both simulcast and scalable video coding result in that at least in the direction from a participant to the central node one needs to use multiple codec operation points. Either one per peer connection, which is how I see simulcast being realized with today's API and functionality, or using an encoding format supporting scalable coding within a single peer connection. 1B) In cases where the central node has minimal functionality and basically is a relay and an ICE plus DTLS-SRTP termination point (I assume EKT to avoid having to do re-encryption), there is a need to be able to handle sources from different participants. This puts extra requirements on how to successfully negotiate the parameters. For example changing the values for one media source should not force one to renegotiate with everyone. 2) The non-centralized multiparty use case appears to equally or more stress the need for having timely dynamic control. If each sender has a number of peer-connections to its peers, it may use local audio levels to determine if its media stream is to be sent or not. Thus the amount of bit-rate and screen estate needed to display will rapidly change as different users speaks. Thus the need for minimal delay when changing preferences are important. 3) We also have speech and audio including audio-only use cases. For audio there could also exist desire to request or indicate changes in the audio bandwidth required, or usage of multi-channel. 4) Adaptation to legacy node from a central node in a multi-party conference. In some use cases legacy nodes might have special needs that are within the profiles a WebRTC end-point is capable of producing. Thus the central node might request the nodes to constrain themselves to particular payload types, audio bandwidth etc to meet a joining session participant. 5) There appear to exist a need for expressing dynamic requests for target bit-rate as one parameter. This can be supported by TMMBR (RFC5104) but there exist additional transport related parameters could help with the adaptation. These include MTU, limits on packet rate, and amount of aggregation of audio frames in the payload. Overview -------- The basic idea in this proposal is to use JSEP to establish the outer limits for behavior and then use Codec Operation Point (COP) proposal as detailed in draft-westerlund-avtext-codec-operation-point to handle dynamic changes during the session. So highest resolution, frame-rate and maximum complexity are expressed in JSEP SDP. Complexity is in several video codecs expressed by profile and level. I know that VP8 currently doesn't have this but it is under discussion when it comes to these parameters. During the session the browser implementation detects when there is need to use COP to do any of the following things. A) Request new target values for codec operation, for example due to that the GUI element displaying a video has changed due to window resize. B) Indicate when the end-point in its role as sender change parameters. In addition to just spatial resolution and video frame rate, I propose that the following parameters are considered as parameters that could be dynamically possible to indicate and request. Spatial resolution (as x and y resolution), Frame-rate, Picture Aspect ratio, Sample Aspect Ratio, Payload Type, Bit-rate, Token Bucket Size (To control burstiness of sender), Channels, Sampling Rate, Maximum RTP Packet Size, Maximum RTP Packet Rate, and Application Data Unit Aggregation (to control amount of audio frames in the same RTP packet). Differences ----------- A) Using COP and using SDP based signaling for the dynamic changes are two quite different models in relation to how the interaction happens. For COP this all happens in the browser, normally initiated by the browser's own determination that a COP request or notification is needed. Harald's proposal appears to require that the JS initiate a renegotiation. This puts a requirement on the implementation to listen to the correct callbacks to know when changes happens, such as window resize. To my knowledge there are not yet any proposals for how the browser can initiate a JSEP renegotiation. Thus COP has the advantage that there is no API changes to get browser triggered parameter changes. W3C can select too but are not required to add API methods to allow JS to make codec parameter requests. The next thing is that COP does not require the JS application to have code to detect and handle re-negotiation. This makes it simpler for the basic application to get good behavior and they are not interrupted nor do they need to handle JSEP & Offer/Answer state machine lock-out effects due to dynamic changes. How big impact these API issues have are unclear as W3C currently appear not to have included any discussion of how the browser can initiate a offer/answer exchange towards the JS when it determines a need to change parameters. But I am worried that using SDP and with an API that requires the application to listen for triggers that could benefit from a codec parameter renegotiation. This will likely only result in good behavior for the JS application implementors that are really good and find out what listeners and what signaling tricks are needed with JSEP to get good performance. I would much rather prefer good behavior by default in simple applications, i.e. using the default behavior that the browser implementor have put in. B) Using the media plane, i.e. RTCP for this signaling lets it in most case go directly between the encoding and the decoding entity in the code. There is no need to involve the JS nor the signaling server. One issue of using JSEP and SDP is that the state machine lock-out effects that can occur if one has sent an Offer. Then that browser may not be able to send a new updated Offer reflecting the latest change until the answer has been properly processed. COP doesn't have these limitations. It can send a new parameter request immediately, only limited by RTCP bandwidth restrictions. Using the media plane in my view guarantees that COP is never worse than what the signaling plane can perform at its best. C) As the general restrictions are determined in the initial negotiation, COP doesn't have the issue that in-flight media streams can become out of bounds. Thus no need for a two phase change of signaling parameters. D) Relying on draft-lennox-mmusic-sdp-source-selection-04 has several additional implications that should be discussed separately. The draft currently includes the following functionalities. D1) It contains a media stream pause proposal. This makes it subject to the architectural debate currently ongoing in dispatch around draft-westerlund-avtext-rtp-stream-pause-00 which is a competing proposal for the same functionality. D2) The inclusion of max desired frame rate in a SSRC specific way D3) Extending the image attribute to be per SSRC specific expression of desired resolutions. D4) Expressing relative priority in receiving different SSRCs. D5) Providing an "information" on a sent SSRC D6) Indication if the media sender is actively sending media using the given SSRC. Is it a correct observation that only D2 and D3 are required for the functionality of Resolution negotiation? E) The standardization situation is similar for both proposals. They are both relying on Internet drafts that are currently individual submissions. Both are partially caught in the architectural discussion which was held in Paris in the DISPATCH WG around Media pause/resume (draft-westerlund-avtext-rtp-stream-pause-00) and Media Stream Selection (draft-westerlund-dispatch-stream-selection-00) on what the most appropriate level of discussion are. This discussion will continue on the RAI-Area mailing list. F) As seen by the discussion on the mailing list the imageattr definitions may not be a 100% match to what is desired with Harald's proposal. I believe that COP's are more appropriate especially the "target" values possibility. In addition these are still open for adjustment and if they don't match WebRTC's requirements. I would also like to point out that I believe this functionality is also highly desirable for CLUE and that their requirements should be taken into account. I do think that this is one of the aspects where having matching functionality will make it much easier to have WebRTC to CLUE interworking. Thanks for reading all the way here! Cheers Magnus Westerlund ---------------------------------------------------------------------- Multimedia Technologies, Ericsson Research EAB/TVM ---------------------------------------------------------------------- Ericsson AB | Phone +46 10 7148287 Färögatan 6 | Mobile +46 73 0949079 SE-164 80 Stockholm, Sweden| mailto: magnus.westerlund@ericsson.com ----------------------------------------------------------------------
- [rtcweb] Resolution negotiation - a contribution Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Timothy B. Terriberry
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Marshall Eubanks
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Stephan Wenger
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - a contribut… Stephan Wenger
- Re: [rtcweb] Resolution negotiation - a contribut… Harald Alvestrand
- [rtcweb] VP8 payload, decoder processing capabili… Stephan Wenger
- Re: [rtcweb] [payload] VP8 payload, decoder proce… Yuepeiyu (Roy)
- Re: [rtcweb] Resolution negotiation - mandatory /… Roni Even
- Re: [rtcweb] Resolution negotiation - mandatory /… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - mandatory /… Roni Even
- Re: [rtcweb] Resolution negotiation - mandatory /… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - mandatory /… Harald Alvestrand
- Re: [rtcweb] Resolution negotiation - mandatory /… Roni Even
- [rtcweb] Alternative Proposal for Dynamic Codec P… Magnus Westerlund
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Harald Alvestrand
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Timothy B. Terriberry
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Bernard Aboba
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Harald Alvestrand
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Justin Uberti
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Magnus Westerlund
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Harald Alvestrand
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Magnus Westerlund
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Justin Uberti
- Re: [rtcweb] Alternative Proposal for Dynamic Cod… Magnus Westerlund
- Re: [rtcweb] [payload] VP8 payload, decoder proce… Cullen Jennings (fluffy)
- Re: [rtcweb] [payload] VP8 payload, decoder proce… Harald Alvestrand