Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation

Magnus' proposal for only using SDP to set the limits of behavior makes
sense to me.  Clearly SDP is not potentially viable for rapid adjustment.

The question is really about what information it is useful
for a receiver or sender to provide.  For example, I would question 
whether an SVC sender necessarily needs to send a message saying 
it is adjusting its sending rate, assuming that the new rate is within 
the range negotiated.  

In any case, this really is not an RTCWEB-specific problem, and comimg
up with different solutions in different IETF WGs seems highly undesirable.

> > This is posted as individual. When it comes to this topic I will not
> > engage in any chair activities because I do have an alternative proposal
> > based on
> > https://datatracker.ietf.org/doc/draft-westerlund-avtext-codec-operation-point/.
> >
> >
> > My proposal has some significant differences in how it works compared to
> > Harald's. I will start with a discussion of requirements, in addition to
> > Harald's, then an overview of my proposal, and ending with a discussion
> > of the differences between the proposals. This is quite long but I do
> > hope you will read it to the end.
> >
> > Requirements
> > ------------
> >
> > Let's start with the requirements that Harald have written. I think they
> > should be formulated differently. First of all I think the requirements
> > are negotiated, indicated etc in the context of a peer connection.
> >
> > Secondly, when it comes to maximum resolution and maximum frame-rate,
> > there are actually three different limits that I think are important;
> > Highest spatial resolution, highest frame-rate, and maximum complexity
> > for a given video stream. The maximum complexity is often expressed as
> > the number of macroblocks a video codec can process per second. This is
> > a well-established complexity measure, used as part of standardized
> > video codec's "level" definitions since the introduction of H.263 Annex
> > X in 2004. As this is basically a joint maximum requirement on the
> > maximum amount of pixels per frame times the maximum frame-rate, there
> > exist cases where this complexity figure actually is the constraining
> > factor forcing a sender or receiver to either request higher frame rate
> > and lower resolution or higher resolution and a lower frame rate.
> >
> > The requirements should also be clearer in that one needs to handle
> > multiple SSRCs per RTP session, including multiple cameras or audio
> > streams (microphones) from a single end-point, where each stream can be
> > encoded using different parameter values.
> >
> > I also think it is important that we consider what additional encoding
> > property parameters that would make sense for WebRTC to have a
> > possibility to dynamically negotiate, request and indicate.
> >
> > Some additional use cases that should be considered in the requirements:
> >
> > 1) We have use cases including multi-party communication. One way to
> > realize these is to use a central node and I believe everyone agrees
> > that we should have good support for getting this usage to work well.
> > Thus in a basic star topology of different participants there is going
> > to be different path characteristics between the central node and each
> > participant.
> >
> > 1A) This makes it necessary to consider how one can ensure that the
> > central node can deliver appropriate rates. One way is to de-couple the
> > links by having the central node perform individual transcodings to
> > every participant. A simpler non-transcoding central node, only
> > forwarding streams between end-points, would have to enforce the lowest
> > path characteristics to all. If one don't want to transcode at the
> > central node and not use the lowest path characteristics to all, one
> > need to consider either simulcast or scalable video coding. Both
> > simulcast and scalable video coding result in that at least in the
> > direction from a participant to the central node one needs to use
> > multiple codec operation points. Either one per peer connection, which
> > is how I see simulcast being realized with today's API and
> > functionality, or using an encoding format supporting scalable coding
> > within a single peer connection.
> >
> > 1B) In cases where the central node has minimal functionality and
> > basically is a relay and an ICE plus DTLS-SRTP termination point (I
> > assume EKT to avoid having to do re-encryption), there is a need to be
> > able to handle sources from different participants. This puts extra
> > requirements on how to successfully negotiate the parameters. For
> > example changing the values for one media source should not force one to
> > renegotiate with everyone.
> >
> > 2) The non-centralized multiparty use case appears to equally or more
> > stress the need for having timely dynamic control. If each sender has a
> > number of peer-connections to its peers, it may use local audio levels
> > to determine if its media stream is to be sent or not. Thus the amount
> > of bit-rate and screen estate needed to display will rapidly change as
> > different users speaks. Thus the need for minimal delay when changing
> > preferences are important.
> >
> > 3) We also have speech and audio including audio-only use cases. For
> > audio there could also exist desire to request or indicate changes in
> > the audio bandwidth required, or usage of multi-channel.
> >
> > 4) Adaptation to legacy node from a central node in a multi-party
> > conference. In some use cases legacy nodes might have special needs that
> > are within the profiles a WebRTC end-point is capable of producing. Thus
> > the central node might request the nodes to constrain themselves to
> > particular payload types, audio bandwidth etc to meet a joining session
> > participant.
> >
> > 5) There appear to exist a need for expressing dynamic requests for
> > target bit-rate as one parameter. This can be supported by TMMBR
> > (RFC5104) but there exist additional transport related parameters could
> > help with the adaptation. These include MTU, limits on packet rate, and
> > amount of aggregation of audio frames in the payload.
> >
> > Overview
> > --------
> >
> > The basic idea in this proposal is to use JSEP to establish the outer
> > limits for behavior and then use Codec Operation Point (COP) proposal as
> > detailed in draft-westerlund-avtext-codec-operation-point to handle
> > dynamic changes during the session.
> >
> > So highest resolution, frame-rate and maximum complexity are expressed
> > in JSEP SDP. Complexity is in several video codecs expressed by profile
> > and level. I know that VP8 currently doesn't have this but it is under
> > discussion when it comes to these parameters.
> >
> > During the session the browser implementation detects when there is need
> > to use COP to do any of the following things.
> >
> > A) Request new target values for codec operation, for example due to
> > that the GUI element displaying a video has changed due to window resize.
> >
> > B) Indicate when the end-point in its role as sender change parameters.
> >
> > In addition to just spatial resolution and video frame rate, I propose
> > that the following parameters are considered as parameters that could be
> > dynamically possible to indicate and request.
> >
> > Spatial resolution (as x and y resolution), Frame-rate, Picture Aspect
> > ratio, Sample Aspect Ratio, Payload Type, Bit-rate, Token Bucket Size
> > (To control burstiness of sender), Channels, Sampling Rate, Maximum RTP
> > Packet Size, Maximum RTP Packet Rate, and Application Data Unit
> > Aggregation (to control amount of audio frames in the same RTP packet).
> >
> >
> > Differences
> > -----------
> >
> > A) Using COP and using SDP based signaling for the dynamic changes are
> > two quite different models in relation to how the interaction happens.
> >
> > For COP this all happens in the browser, normally initiated by the
> > browser's own determination that a COP request or notification is
> > needed. Harald's proposal appears to require that the JS initiate a
> > renegotiation. This puts a requirement on the implementation to listen
> > to the correct callbacks to know when changes happens, such as window
> > resize. To my knowledge there are not yet any proposals for how the
> > browser can initiate a JSEP renegotiation.
> >
> > Thus COP has the advantage that there is no API changes to get browser
> > triggered parameter changes. W3C can select too but are not required to
> > add API methods to allow JS to make codec parameter requests.
> >
> > The next thing is that COP does not require the JS application to have
> > code to detect and handle re-negotiation. This makes it simpler for the
> > basic application to get good behavior and they are not interrupted nor
> > do they need to handle JSEP&  Offer/Answer state machine lock-out
> > effects due to dynamic changes.
> >
> > How big impact these API issues have are unclear as W3C currently appear
> > not to have included any discussion of how the browser can initiate a
> > offer/answer exchange towards the JS when it determines a need to change
> > parameters.
> >
> > But I am worried that using SDP and with an API that requires the
> > application to listen for triggers that could benefit from a codec
> > parameter renegotiation. This will likely only result in good behavior
> > for the JS application implementors that are really good and find out
> > what listeners and what signaling tricks are needed with JSEP to get
> > good performance. I would much rather prefer good behavior by default in
> > simple applications, i.e. using the default behavior that the browser
> > implementor have put in.
> >
> > B) Using the media plane, i.e. RTCP for this signaling lets it in most
> > case go directly between the encoding and the decoding entity in the
> > code. There is no need to involve the JS nor the signaling server. One
> > issue of using JSEP and SDP is that the state machine lock-out effects
> > that can occur if one has sent an Offer. Then that browser may not be
> > able to send a new updated Offer reflecting the latest change until the
> > answer has been properly processed. COP doesn't have these limitations.
> > It can send a new parameter request immediately, only limited by RTCP
> > bandwidth restrictions. Using the media plane in my view guarantees that
> > COP is never worse than what the signaling plane can perform at its best.
> >
> > C) As the general restrictions are determined in the initial
> > negotiation, COP doesn't have the issue that in-flight media streams can
> > become out of bounds. Thus no need for a two phase change of signaling
> > parameters.
> >
> > D) Relying on draft-lennox-mmusic-sdp-source-selection-04 has several
> > additional implications that should be discussed separately. The draft
> > currently includes the following functionalities.
> >
> >    D1) It contains a media stream pause proposal. This makes it subject
> > to the architectural debate currently ongoing in dispatch around
> > draft-westerlund-avtext-rtp-stream-pause-00 which is a competing
> > proposal for the same functionality.
> >
> >    D2) The inclusion of max desired frame rate in a SSRC specific way
> >
> >    D3) Extending the image attribute to be per SSRC specific expression
> > of desired resolutions.
> >
> >    D4) Expressing relative priority in receiving different SSRCs.
> >
> >    D5) Providing an "information" on a sent SSRC
> >
> >    D6) Indication if the media sender is actively sending media using the
> > given SSRC.
> >
> > Is it a correct observation that only D2 and D3 are required for the
> > functionality of Resolution negotiation?
> >
> > E) The standardization situation is similar for both proposals. They are
> > both relying on Internet drafts that are currently individual
> > submissions. Both are partially caught in the architectural discussion
> > which was held in Paris in the DISPATCH WG around Media pause/resume
> > (draft-westerlund-avtext-rtp-stream-pause-00) and Media Stream Selection
> > (draft-westerlund-dispatch-stream-selection-00) on what the most
> > appropriate level of discussion are. This discussion will continue on
> > the RAI-Area mailing list.
> >
> > F) As seen by the discussion on the mailing list the imageattr
> > definitions may not be a 100% match to what is desired with Harald's
> > proposal. I believe that COP's are more appropriate especially the
> > "target" values possibility. In addition these are still open for
> > adjustment and if they don't match WebRTC's requirements.
> >
> > I would also like to point out that I believe this functionality is also
> > highly desirable for CLUE and that their requirements should be taken
> > into account. I do think that this is one of the aspects where having
> > matching functionality will make it much easier to have WebRTC to CLUE
> > interworking.
> >
> > Thanks for reading all the way here!
> >
> > Cheers
> >
> > Magnus Westerlund
> >
> > ----------------------------------------------------------------------
> > Multimedia Technologies, Ericsson Research EAB/TVM
> > ----------------------------------------------------------------------
> > Ericsson AB                | Phone  +46 10 7148287
> > Färögatan 6                | Mobile +46 73 0949079
> > SE-164 80 Stockholm, Sweden| mailto: magnus.westerlund@ericsson.com
> > ----------------------------------------------------------------------
> >
> >
> >
> 
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb

Re: [rtcweb] Alternative Proposal for Dynamic Codec Parameter Change (Was: Re: Resolution negotiation - a contribution)