Re: [rtcweb] codec and connection negotiation

Colin Perkins <csp@csperkins.org> Mon, 08 August 2011 10:28 UTC

Return-Path: <csp@csperkins.org>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7D4D721F8AC3 for <rtcweb@ietfa.amsl.com>; Mon, 8 Aug 2011 03:28:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.599
X-Spam-Level:
X-Spam-Status: No, score=-103.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vuz6bRV8pCfL for <rtcweb@ietfa.amsl.com>; Mon, 8 Aug 2011 03:28:47 -0700 (PDT)
Received: from anchor-msapost-1.mail.demon.net (anchor-msapost-1.mail.demon.net [195.173.77.164]) by ietfa.amsl.com (Postfix) with ESMTP id 51C1621F8634 for <rtcweb@ietf.org>; Mon, 8 Aug 2011 03:28:47 -0700 (PDT)
Received: from mangole.dcs.gla.ac.uk ([130.209.247.112]) by anchor-post-1.mail.demon.net with esmtpsa (AUTH csperkins-dwh) (TLSv1:AES128-SHA:128) (Exim 4.69) id 1QqN56-0001H2-gD; Mon, 08 Aug 2011 10:29:12 +0000
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset="us-ascii"
From: Colin Perkins <csp@csperkins.org>
In-Reply-To: <4E3C377A.5090105@skype.net>
Date: Mon, 08 Aug 2011 11:29:11 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <F939D1B7-61ED-4E4B-9509-6CDB87FD3450@csperkins.org>
References: <4E3C377A.5090105@skype.net>
To: Matthew Kaufman <matthew.kaufman@skype.net>
X-Mailer: Apple Mail (2.1084)
Cc: "rtcweb@ietf.org" <rtcweb@ietf.org>
Subject: Re: [rtcweb] codec and connection negotiation
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Aug 2011 10:28:48 -0000

[inline]

On 5 Aug 2011, at 19:33, Matthew Kaufman wrote:
> I put this together to try to help myself, and perhaps others, understand the various ways in which codec and connection negotiation might be decomposed and then put together for RTCWEB applications. It is a bit of a stream of consciousness, but hopefully we can get some good discussion provoked.
> 
> A. How to encode codec capabilities or choices
> 
> A1 - Don't provide a canonical way of encoding codec settings. The choice of which codec and what parameters is encoded in an ad-hoc way that requires no standardization.
> 
> A2 - Encode codec settings using SDP. The choice of which codec and what parameters is encoded using the syntax and semantics of SDP and reuses the SDP standardization work.
> 
> A3 - Encode codec settings using a JSON encoding of SDP. The choice of which codec and what parameters is using the semantics of SDP but a JSON syntax. Reuses the SDP standardization plus new standards around how to encode SDP in JSON.

The codec name and parameters are not defined in terms of SDP. Rather, SDP uses the MIME type of the codec, and the registered parameters of that MIME type. I don't much care whether the syntax used to convey codec MIME types and parameters is SDP, JSON, XML, or whatever, but I strongly suggest reusing that existing database of codec MIME types (registered following RFC 3555), rather than trying to re-invent it. 

Colin




> B. How to expose codec capabilities and controls
> 
> B1. Expose codec capabilities and settings via individual Javascript APIs. For example one might say "camera.encodeMode = "H.264"; camera.encodeWidth = 640;" or "if (camera.canEncode("H.264")) ..."
> 
> B2. Expose codec capabilities and settings via a combined Javascript API that concatenates all the capabilities and settings into a single object. Calling "camera.encodeCapabilities()" would return a large object that is a full enumeration of what it can do.
> 
> B3. Expose codec capabilities and settings via a combined Javascript API that produces and consumes SDP strings.
> 
> B4. Don't expose codec capbilities and settings via Javascript, rather handle this outside of Javascript.
> 
> C. How to agree on which codec and settings to use
> 
> C1. The server receives information about the capabilities, decides what settings should be used, and communicates them to the endpoints
> 
> C2. The endpoints agree using offer-answer, using the server as a communication channel
> 
> C3. The endpoints agree using offer-answer, directly between the two endpoints
> 
> 
> (There's obviously a few additional nuances, like whether to use something like RFC 5939 SDP capability negotiation, that aren't well-captured above.)
> 
> Now, with the above, we find that some of the models that have been discussed are combinations of the above choices.
> 
> X1 - Use A2, B3, C2 - The endpoints generate SDP for offer-answer by firing an SDP event, having it sent via a server to the far end, where it is injected as SDP into the Javascript.
> 
> X2 - Use A1, B1, C1 - The endpoints run Javascript that inspects their capabilities, sends that to the server which decides what is best, and sends back information that the Javascript uses to set up the encoders.
> 
> X3 - Use A2, B4, C3 - The endpoints establish a communication channel using ICE and an out-of-band channel, they then use SDP inside of SIP to directly negotiate using offer-answer, the server has no information about what was chosen and does not participate.
> 
> We also find that almost all the models can map, though with varying degrees of pain, to others.
> 
> For instance:
> 
> Y1 - Use A2, B1, C2 - The endpoints run Javascript that inspects their capabilities and encodes it into an SDP string. This is sent via the server to the far end, where the SDP is parsed in Javascript and the individual APIs called to implement SDP offer-answer.
> 
> Y2 - Use A1, B3, C1 - The endpoints generate SDP for offer-answer by firing an SDP event. The endpoint then runs Javascript that extracts from the SDP all the individual parameters and re-encodes them using an ad-hoc scheme. The server determines what each end should do and sends ad-hoc messages to the endpoints which then generate the SDP answer that causes the desired outcome.
> 
> It should be noted that extracting a full set of capabilities when only offer-answer is available via the API is particularly painful, as it might be necessary to explore a very wide space of fake offers to get answers that clarify things like "can do G.722 audio but only if not doing H.264 video".
> 
> We might be able to learn from some previous experiences:
> 
> 1. Web-based email. Web browsers run Javascript that uses ad-hoc signaling to/from the web site in order to send and receive email. The browser does not have SMTP, POP, or IMAP implementations, or even know how to parse RFC822 headers on its own. This argues for A1, B1 (or maybe B2), C1.
> 
> 2. Web-based image display. Web browsers send an "Accept" header via HTTP indicating which MIME types they can handle. Knowing whether a browser can do JPEG is as simple as looking for "image/jpeg" in the header. This is probably at the wrong layer for sophisticated web applications (as this isn't about what comes back over HTTP, but what can be supported over a separate channel), but might be appropriate if we could agree on a small number of supported codecs and simply have the browser tell the server in a similar manner whether or not it can do RTCWEB.
> 
> 3. SIP. SIP endpoints use SDP directly between the endpoints (though possibly with intermediaries that rarely change the SDP) to do offer-answer. This is most like A2, B3 (or B4), C3 (or C2).
> 
> 4. MGCP. MGCP uses SDP for both capabilities and settings, but the server is in control. This is most like A2, B3, C1.
> 
> 5. H.323 (H.245) uses a protocol other than SDP for both capabilities and settings, but the server is in control. This is like A3 (or some other encoding), B2, C1
> 
> As for connection negotiation, most of the separation previously also applies... again whether to put the ICE candidates into SDP or not, whether to have APIs that take blobs or individual calls, etc.
> 
> But the other issue that arises when we start talking about connection negotiation is that if you choose SDP offer-answer for codec negotiation, and SDP for ICE candidates, there is the temptation to combine the SDP for the two, thus conflating the process of establishing a logical channel over which various types of media might flow, and establishing just which media should be sent over that channel.
> 
> Matthew Kaufman
> 
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb



-- 
Colin Perkins
http://csperkins.org/