Re: [rtcweb] Use of offer / answer semantics

(sorry about the length of the email)

Well, you asked for rough consensus, so I'll give my 2 cents: 

I think building-in/requiring full SDP and in particular the offer/answer model is a big mistake.  As I said at the mic at IETF-81, by doing this you are restricting/limiting the application of Rtcweb.  You are imposing a particular model of usage into the Javascript API, which is contrary to the very notion of having a Javascript API to begin with.  

Note here I am talking about full SDP and offer/answer, not the tokens assigned for media info that we happen to encode in SDP, and which only comprise a portion of SDP - we'll need standardized tokens or enum values for those in the API itself, so those are fine as IANA-registered strings.

In particular, here're my arguments:
1) It should be recognized that for anything but the simplest case of SIP-based calling from the rtcweb server to SIP peers, the domain/scope of this SDP offer/answer will only be between the browser and its server.  In other words, the SDP state, sess-version number, media lines, etc., will only be meaningful between the browser and server, and a new set of those will exist between the server and any peer(s) it talks to using SIP.  Because if anything more complicated happens in SIP, such as forked calls, then either the browser SDP code has to understand such scenarios as well, or else the server will handle it itself - and I don't think you really want to complicate the browser to handle all possible SDP offer/answer cases.  Furthermore, while we assume the server will speak SIP to peers, it may well speak XMPP, H.323, IAX, or whatever.  And thus again this SDP exchange will only be happening (from an SDP layer perspective) between the browser and server.  I only point this out because people may think they're getting something for free here, and minimizing the complexity of the javascript or server - that's not the case.

2) One of the advantages of an rtcweb model is the clients (ie, browsers) don't need to be upgraded for new functionality - only the javascript does, unless something needs changing in the API itself.  But if the browser is generating the full SDP and handling offer/answer, as your proposal has it, then whenever we need to change SDP handling we need to upgrade the browsers.  For example, there's this lovely sdp-cap-neg mechanism.  If we don't specify the browser has to handle receiving sdp-cap-neg offers on day-1, then the extra attribute lines will be ignored and the SDP answer won't do it.  But it's possible for the javascript script to be upgraded to handle sdp-cap-neg offers itself and choose alternates from within it without upgrading the browser... except now we won't because the browser's doing SDP, and the browser needs to be updated.  Or I suppose the javascript could munge the SDP it gives to the browser, and munge back what it gets back from the browser - but that just shows how silly it is to have SDP be used in the browser to begin with.

3) The converse problem of #2: browsers upgrading with new SDP functionality that the javascript doesn't know about but needs to.  For example, suppose you've built a rtcweb app for trading floor communications.  I won't go into the details of how trading floor calls work, but one of their requirements is no time lag when they press a specific line on their phone, and they are instantly talking (no ringing/handset pickup).  So what your javascript does is actually create a voice channel to every line at the beginning, and just puts them all on "hold" until the user presses a specific line.  To avoid the delay of signaling when the user presses to talk, your javascript never tells the other side about the media being put on/off hold - it's always active bidirectional RTP media as far as that's concerned, and you don't pass the SDP back/forth unless something really changes like addressing info.  Internet Explorer v24 doesn't seem to have a problem with this, because it happens not to know/support/care-about the SDP direction attributes either.  So you deploy this thing and all seems well.  Unbeknownst to you, Internet Explorer v25 makes a small change to its SDP engine, adding support for the direction attribute such that whenever its media is on hold it sets the SDP to a=inactive, and if it receives a=inactive it doesn't send RTP.  Your Rtcweb app will now break, as the call starts out with the media being a=inactive and no SDP gets exchanged to update it back to a=sendrecv.

4) You may feel like SDP belongs in the browser because it's about "media stuff", but that's misleading - its descriptions relate to media sessions, but it includes information from the application layer as well, rather than purely from a media library.  The *browser* only knows a portion of the details.  For example, the following IANA-registered SDP attributes would be unknown to a media library in the browser and only known to the javascript: cat, keywds, tool, type, charset, lang, setup, connection, confid, userid, floorid, and probably a bunch more that I can't be bothered to look up.  And here's a use-case example: suppose I create an Rtcweb app which can make media-loopback calls for testing stuff.  It can't answer such calls, since the browser media library doesn't support loopback, but it can *generate* the calls since the generator side can simply let the user speak into the mic and hear himself/herself on the speakers to verify the media works to the loopback end. (see http://tools.ietf.org/html/draft-ietf-mmusic-media-loopback-15)  All the *browser* needs to do is support normal audio RTP, whereas the javascript or server needs to generate an SDP offer with the loopback attribute, if it uses SIP.  It won't work if the browser is the one creating SDP and handling the offer/answer.  

5) Architecturally, I don't understand why the *browser* needs to know about sessions.  I'm not talking about individual "media sessions", but rather the SIP concept of "sessions".  SDP offer/answer requires this knowledge - it needs to know when a session begins, ends, and its context/state.  Minimally it needs to know this to group media info together (ie, to know that the audio and video are tied to the same session), and to handle the origin line sess-version number, and session-level attributes.  Why should the _browser_ need to know that?  What should it care whether the audio and video being used by a javascript are tied to one call vs. two separate calls??  Why should it even know there is such a thing as a "call"?  It's a media library.  Keep it simple stupid.

6) You ask below in your email what the alternative to using SDP offer/answer would be between the browser and server.  The answer to it seems obvious to me: NOTHING.  There is no need for a standardized protocol to convey media info between the browser and server.  The only need is for a standard to convey the media/ICE/SRTP capabilities and command/setting primitives in an API between javascript and the browser.  That's it.  The javascript developer can convey that information however he/she feels it appropriate to the server, in HTTP, if it even needs conveying to the server.  That's kinda the whole POINT of using javascript!  Let the developer develop.  If they want to use an SDP offer/answer model, they can code javascript to do so.  If they want to send a binary blob instead, even to a separate server, they can.  The HTTP interface is their job, not ours.

-hadriel

On Sep 6, 2011, at 10:46 AM, Cullen Jennings wrote:

> 
> In my roll as an individual contributor, I want to propose some text that I think we can get rough consensus on around that helps specify which parts of the signaling issues we agree on and which we don't. 
> 
> At the last meeting, my read of the the room was there was a fair amount of agreement in the room that offer / answer semantics  with SDP are what we want to use. I don't think there was was broad agreement on if one should use SIP or not, or for that matter jingle. If we can nail down this decisions as the direction the WG is going, it will really help make progress. What I would like to do is propose some following principles in the text below. If we have agreement on these, then they would go into the overview document and help guide the design of other documents. I want to highlight that none of the principles below imply that we would need to use SIP in the browsers - the principals would all work fine if we there was signaling gateway in the web server that converged SIP to whatever proprietary HTML / JS  / HTTP that the applications wanted to use between the browser and the web server. 
> 
> 
> 
> 1) The media negotiations will be done using the same SDP offer/answer semantics that are used in SIP. 
> 
> 2) It will be possible to gateway between legacy SIP devices that support ICE and appropriate RTP / SDP mechanisms and codecs without using a media gateway. A signaling gateway to convert between the signaling on the web side to the SIP signaling may be needed. 
> 
> 3) When a new codec is specified, and the SPD for the new codec is specified in the MMUSIC WG, no other standardization would should be required for it to be possible to use that in the web browsers. Adding a new codecs which might have new SDP parameters should not change the APIs between the browser and javascript application. As soon as the browsers support the new codec, old applications written before the codecs was specified should automatically be able to use the new codec where appropriate with no changes to the JS applications. 
> 
> 
> People  has looked at alternatives to all these in a fair amount of detail. For example, we have considered alternatives to the SDP offer / answer such as the advertisement proposal draft (draft-peterson-sipcore-advprop-01) and discussed that several times in the WG. The primary issues identified with this was concerns over mapping this to legacy SDP. Similarly people have considered a replacement for SDP in the SDPng work which was eventually abandoned due to the difficulty of having a incentive for implementations to migrate from SDP to SDPng. 
> 
> We have also considered just sending audio and video directly over something like DTLS and not suing RTP. The WG has clearly rejected this due to a variety of reasons - the desire not to to have the operating expense of media gateway and reduction in quality of experience is obviously a high priority goal for the designs of the RTP multiplexing draft. 
> 
> The JS API is being developed by W3C but when proposing APIs that should violate the third principal, such as the the API in section 4 of draft-jennings-rtcweb-api-00, it is clear that many people that are more from the browser and web application world do not want such an API. 
> 
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb