[rtcweb] codec and connection negotiation

Matthew Kaufman <matthew.kaufman@skype.net> Fri, 05 August 2011 18:33 UTC

Return-Path: <matthew.kaufman@skype.net>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B4A1411E807F for <rtcweb@ietfa.amsl.com>; Fri, 5 Aug 2011 11:33:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7WrTaLGWpkMF for <rtcweb@ietfa.amsl.com>; Fri, 5 Aug 2011 11:33:17 -0700 (PDT)
Received: from mx.skype.net (mx.skype.net [78.141.177.88]) by ietfa.amsl.com (Postfix) with ESMTP id 8FE2C21F8AFD for <rtcweb@ietf.org>; Fri, 5 Aug 2011 11:33:17 -0700 (PDT)
Received: from mx.skype.net (localhost [127.0.0.1]) by mx.skype.net (Postfix) with ESMTP id B63B2170B for <rtcweb@ietf.org>; Fri, 5 Aug 2011 20:33:34 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=skype.net; h=message-id :date:from:mime-version:to:subject:content-type: content-transfer-encoding; s=mx; bh=FiwnBwCemqIhStjaWJ0Lgv8qGqA= ; b=AndFaCzL3sYHisvpteXhoJl2bfYu5kug7DseVNothdXRzf1kWSyeFG4mMj1B 6lKqU6yIJOZOn4ZeXpPH34bFIzBh16oDPBRBchrANleh36WXALUm+oTAGZ7JTVzG ME0uHQ9nOe2YJvMJA4HgWnVRBrQGWFOJuN+T+kLmAwA8AHQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=skype.net; h=message-id:date:from :mime-version:to:subject:content-type:content-transfer-encoding; q=dns; s=mx; b=gaBNO6IUG08/qHXPvc9xRm9HTCQlXRA8TxqoEnRhhpULXRLn +boj85yNDun4AoHB+Elih11S8eSb+MsCA8MwAw0+7SpXNtT6eANNs4qbf+zRWCgQ vLp9qeFUs7KFp/7QG/mKkmsaGdA4sihPUTaE1RMvcAsw/IUOvH7J/soQSXM=
Received: from zimbra.skype.net (zimbra.skype.net [78.141.177.82]) by mx.skype.net (Postfix) with ESMTP id B06DACF for <rtcweb@ietf.org>; Fri, 5 Aug 2011 20:33:34 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1]) by zimbra.skype.net (Postfix) with ESMTP id 91E433507815 for <rtcweb@ietf.org>; Fri, 5 Aug 2011 20:33:34 +0200 (CEST)
X-Virus-Scanned: amavisd-new at lu2-zimbra.skype.net
Received: from zimbra.skype.net ([127.0.0.1]) by localhost (zimbra.skype.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GyH0Gdke7XjD for <rtcweb@ietf.org>; Fri, 5 Aug 2011 20:33:33 +0200 (CEST)
Received: from [10.10.155.2] (unknown [198.202.199.254]) by zimbra.skype.net (Postfix) with ESMTPSA id 5BEF43506DDE for <rtcweb@ietf.org>; Fri, 5 Aug 2011 20:33:33 +0200 (CEST)
Message-ID: <4E3C377A.5090105@skype.net>
Date: Fri, 05 Aug 2011 11:33:30 -0700
From: Matthew Kaufman <matthew.kaufman@skype.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11
MIME-Version: 1.0
To: "rtcweb@ietf.org" <rtcweb@ietf.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: [rtcweb] codec and connection negotiation
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Aug 2011 18:33:18 -0000

I put this together to try to help myself, and perhaps others, 
understand the various ways in which codec and connection negotiation 
might be decomposed and then put together for RTCWEB applications. It is 
a bit of a stream of consciousness, but hopefully we can get some good 
discussion provoked.

A. How to encode codec capabilities or choices

A1 - Don't provide a canonical way of encoding codec settings. The 
choice of which codec and what parameters is encoded in an ad-hoc way 
that requires no standardization.

A2 - Encode codec settings using SDP. The choice of which codec and what 
parameters is encoded using the syntax and semantics of SDP and reuses 
the SDP standardization work.

A3 - Encode codec settings using a JSON encoding of SDP. The choice of 
which codec and what parameters is using the semantics of SDP but a JSON 
syntax. Reuses the SDP standardization plus new standards around how to 
encode SDP in JSON.

B. How to expose codec capabilities and controls

B1. Expose codec capabilities and settings via individual Javascript 
APIs. For example one might say "camera.encodeMode = "H.264"; 
camera.encodeWidth = 640;" or "if (camera.canEncode("H.264")) ..."

B2. Expose codec capabilities and settings via a combined Javascript API 
that concatenates all the capabilities and settings into a single 
object. Calling "camera.encodeCapabilities()" would return a large 
object that is a full enumeration of what it can do.

B3. Expose codec capabilities and settings via a combined Javascript API 
that produces and consumes SDP strings.

B4. Don't expose codec capbilities and settings via Javascript, rather 
handle this outside of Javascript.

C. How to agree on which codec and settings to use

C1. The server receives information about the capabilities, decides what 
settings should be used, and communicates them to the endpoints

C2. The endpoints agree using offer-answer, using the server as a 
communication channel

C3. The endpoints agree using offer-answer, directly between the two 
endpoints


(There's obviously a few additional nuances, like whether to use 
something like RFC 5939 SDP capability negotiation, that aren't 
well-captured above.)

Now, with the above, we find that some of the models that have been 
discussed are combinations of the above choices.

X1 - Use A2, B3, C2 - The endpoints generate SDP for offer-answer by 
firing an SDP event, having it sent via a server to the far end, where 
it is injected as SDP into the Javascript.

X2 - Use A1, B1, C1 - The endpoints run Javascript that inspects their 
capabilities, sends that to the server which decides what is best, and 
sends back information that the Javascript uses to set up the encoders.

X3 - Use A2, B4, C3 - The endpoints establish a communication channel 
using ICE and an out-of-band channel, they then use SDP inside of SIP to 
directly negotiate using offer-answer, the server has no information 
about what was chosen and does not participate.

We also find that almost all the models can map, though with varying 
degrees of pain, to others.

For instance:

Y1 - Use A2, B1, C2 - The endpoints run Javascript that inspects their 
capabilities and encodes it into an SDP string. This is sent via the 
server to the far end, where the SDP is parsed in Javascript and the 
individual APIs called to implement SDP offer-answer.

Y2 - Use A1, B3, C1 - The endpoints generate SDP for offer-answer by 
firing an SDP event. The endpoint then runs Javascript that extracts 
from the SDP all the individual parameters and re-encodes them using an 
ad-hoc scheme. The server determines what each end should do and sends 
ad-hoc messages to the endpoints which then generate the SDP answer that 
causes the desired outcome.

It should be noted that extracting a full set of capabilities when only 
offer-answer is available via the API is particularly painful, as it 
might be necessary to explore a very wide space of fake offers to get 
answers that clarify things like "can do G.722 audio but only if not 
doing H.264 video".

We might be able to learn from some previous experiences:

1. Web-based email. Web browsers run Javascript that uses ad-hoc 
signaling to/from the web site in order to send and receive email. The 
browser does not have SMTP, POP, or IMAP implementations, or even know 
how to parse RFC822 headers on its own. This argues for A1, B1 (or maybe 
B2), C1.

2. Web-based image display. Web browsers send an "Accept" header via 
HTTP indicating which MIME types they can handle. Knowing whether a 
browser can do JPEG is as simple as looking for "image/jpeg" in the 
header. This is probably at the wrong layer for sophisticated web 
applications (as this isn't about what comes back over HTTP, but what 
can be supported over a separate channel), but might be appropriate if 
we could agree on a small number of supported codecs and simply have the 
browser tell the server in a similar manner whether or not it can do RTCWEB.

3. SIP. SIP endpoints use SDP directly between the endpoints (though 
possibly with intermediaries that rarely change the SDP) to do 
offer-answer. This is most like A2, B3 (or B4), C3 (or C2).

4. MGCP. MGCP uses SDP for both capabilities and settings, but the 
server is in control. This is most like A2, B3, C1.

5. H.323 (H.245) uses a protocol other than SDP for both capabilities 
and settings, but the server is in control. This is like A3 (or some 
other encoding), B2, C1

As for connection negotiation, most of the separation previously also 
applies... again whether to put the ICE candidates into SDP or not, 
whether to have APIs that take blobs or individual calls, etc.

But the other issue that arises when we start talking about connection 
negotiation is that if you choose SDP offer-answer for codec 
negotiation, and SDP for ICE candidates, there is the temptation to 
combine the SDP for the two, thus conflating the process of establishing 
a logical channel over which various types of media might flow, and 
establishing just which media should be sent over that channel.

Matthew Kaufman