Re: [rtcweb] My Opinion: Why I think a negotiating protocol is a Good Thing

Tim Panton <> Tue, 18 October 2011 12:53 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 1D16421F8B7E for <>; Tue, 18 Oct 2011 05:53:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id xmF93UT9+eNi for <>; Tue, 18 Oct 2011 05:53:44 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 8DD5B21F8B43 for <>; Tue, 18 Oct 2011 05:53:43 -0700 (PDT)
Received: from [] (unknown []) by (Postfix) with ESMTP id 82B1137A902; Tue, 18 Oct 2011 14:06:28 +0100 (BST)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary="Apple-Mail-5--814682935"
From: Tim Panton <>
In-Reply-To: <>
Date: Tue, 18 Oct 2011 13:53:36 +0100
Message-Id: <>
References: <>
To: Harald Alvestrand <>
X-Mailer: Apple Mail (2.1084)
Cc: "" <>
Subject: Re: [rtcweb] My Opinion: Why I think a negotiating protocol is a Good Thing
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 18 Oct 2011 12:53:45 -0000

On 18 Oct 2011, at 12:43, Harald Alvestrand wrote:

> (Apologies for the length of this note - but I wanted to follow this argument a bit deeply)
> In the discussions about RTCWEB signalling, I’ve had a change of heart.
> Or perhaps it’s just a change of terminology.
> The high bit is this:
> I believe we need to standardize a negotiation protocol as part of RTCWEB.
> This needs to be described as a protocol, and cannot usefully be described as an API.
> This note is to explain to my fellow WG members what led me to this conclusion, and - in some detail - what I think the words in the above paragraph mean.
> Those who don't want to read a long message can stop here.
> ----------------------------------------------------------------------------
> The context of RTCWEB is the well known trapezoid:
> +-----------+ +-----------+
> | Web | | Web |
> | | Signalling | |
> | |-------------| |
> | Server | path | Server |
> | | | |
> +-----------+ +-----------+
> / \
> / \ Proprietary over
> / \ HTTP/Websockets
> / \
> / Proprietary over \
> / HTTP/Websockets \
> / \
> +-----------+ +-----------+
> +-----------+ +-----------+
> +-----------+ +-----------+
> | | | |
> | | | |
> | Browser | ------------------------- | Browser |
> | | Media path | |
> | | | |
> +-----------+ +-----------+
> or even the triangle, where the triangle is formed when the two Web severs on top are collapsed into one.
> A design criterion for RTCWEB has been that it should be possible to write applications on top of RTCWEB simply - that is, without deep knowledge about the world of codecs, RTP sessions and the like.
> Another design criterion is that interworking should be possible - which means that SOMEWHERE in the system, deep knowledge about the world of codecs, RTP sessions and the like must be embedded; we can’t just simplify our options until everything’s simple.

I've generally found that "simplify our options until everything’s simple" is a very good policy for Version one of anything. 
But instead we have a complex set of PSTN-interop centric requirements, it is no surprise that we will come up with a complex PSTN centric solution.

Sadly this will be such a straight-jacket that no other new uses will flourish.

> There’s one place in the ecosystem where this knowledge HAS to be - and that is within the browser, the component that takes care of implementing the codecs, the RTP sessions and the related features. If we can avoid requiring embedding it twice, that’s a feature.

Actually it isn't a universally desired feature. The web is full of multiple ways of manipulating the same object.
an HTML table can be specified in HTML, manipulated in a DOM via javascript, and styled in CSS . 
The writer of each of those bits of code needs to understand what a table is or does at least partially.

> It used to be that I was a believer in APIs - that we should make the API the “king”, and describe the way you generate an RTP session as “you turn this knob, and get this effect”.
> After looking at the problem of Web applications that don’t have domain knowledge for a while, I’ve concluded that this doesn’t work. There’s the need for one browser to communicate with the other browser, and if the intermediate components are to have the ability to ignore the details, what’s communicated has to be treated like a blob - something passed out of one browser’s API and into another browser’s API, unchanged by the application - because the application must be able to ignore the details.

Extend that argument a bit further - the codecs  at each end can agree about their mutual compatibility very simply - without involving the rest of
the stack. All they have to do is expose a method on the decoder that can check the self-description sent by the encoder.

Instead we are making a monolithic codec/network/negotiation/(signalling soon) hunk of code instead of splitting out the parts and (for 
example ) exposing the codecs as objects with behaviours.

> OK, now we have the API with blobs. We also have to make some assumptions about how those blobs are transported, who’s responsible for acting on them, and so on. And we have to make sure different browsers implement the blob the same way - that is, it has to be standardized.
> What’s more - we DO want to enable applications that are NOT simple. Including gateways, which are not browsers. So applications must be free to look inside the blob - break the blob boundary - when they need to. So this pulls in the same direction as the need for interoperability - the format, semantics and handling rules for these blobs has to be specified. In some detail.

We have an API with blobs only because we chose to stick with the ugliness that is SDP. Once you take that decision, the rest follows.

> So we have:
> - a data format
> - a transmission path
> - a set of handling rules, in the form of states, processes and timers
> This doesn’t look like an API any more. This looks like a protocol. We’ve got experience in describing protocols - but it becomes much easier to apply that experience when we call it a protocol.

Ok, lets design a protocol (if we must), but let's base it on the real capabilities of browsers, web coders abilities and the real needs of web users, 
not constantly looking over our shoulders at a legacy protocol that barely works in the environment it was designed for.

And will this new protocol have a standard API ?