Re: [rtcweb] Additional requirement - audio-only communication

Matthew Kaufman <> Thu, 25 August 2011 17:04 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 7905221F8AD8 for <>; Thu, 25 Aug 2011 10:04:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id pydPYq2hCvuo for <>; Thu, 25 Aug 2011 10:04:05 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 2D7EF21F8AD9 for <>; Thu, 25 Aug 2011 10:04:04 -0700 (PDT)
Received: from (localhost []) by (Postfix) with ESMTP id 3BD201708; Thu, 25 Aug 2011 19:05:16 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; s=mx; bh=TugqRjzabFrT88 JJR/AxKIM7gqo=; b=TiWSiVwgPKPYefp3rcR1UXmgHVWXPpY2/4BYPEr7LouHE9 MfKwTggB7LgpxJxDpCzVt7vJQpj+nL+QZoNxcim3yUweZK4+wb7n2YRXFOjJyxsV c0taLFVAGvi0kqY0ocLre77Xdrj28wKFjCNrrs7f1qfptaOYJSRTdqgn9OAi4=
DomainKey-Signature: a=rsa-sha1; c=nofws;; h=message-id:date:from :mime-version:to:cc:subject:references:in-reply-to:content-type: content-transfer-encoding; q=dns; s=mx; b=WtR2PTEXh/okAy6wgl5oHc KdJsy9jrXSxlqE55QNimDdZNf8fMSh342EyF+wgD1k3jIG4NLFMyF9nnTm+iUbaV rjWRM8BkfKf4cJ+Cw5u92hL09bAhoyIiPhXHPnm3oY0gIDdV1yHVgXsD57nX/AbD 8+Wn4G8j7fHPXmPBr7d+Q=
Received: from ( []) by (Postfix) with ESMTP id 3A3BF7F6; Thu, 25 Aug 2011 19:05:16 +0200 (CEST)
Received: from localhost (localhost []) by (Postfix) with ESMTP id 1A5AB3507D45; Thu, 25 Aug 2011 19:05:16 +0200 (CEST)
X-Virus-Scanned: amavisd-new at
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id D9xjV0u+Er35; Thu, 25 Aug 2011 19:05:14 +0200 (CEST)
Received: from [] (unknown []) by (Postfix) with ESMTPSA id 18568350740C; Thu, 25 Aug 2011 19:05:13 +0200 (CEST)
Message-ID: <>
Date: Thu, 25 Aug 2011 10:04:48 -0700
From: Matthew Kaufman <>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20110804 Thunderbird/3.1.12
MIME-Version: 1.0
To: Randell Jesup <>
References: <> <>, <> <> <> <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "" <>,
Subject: Re: [rtcweb] Additional requirement - audio-only communication
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 25 Aug 2011 17:04:06 -0000

On 8/25/2011 9:24 AM, Randell Jesup wrote:
> On 8/25/2011 11:55 AM, Matthew Kaufman wrote:
>> On 8/24/2011 4:36 AM, Harald Alvestrand wrote:
> Re: negotiate offer/answer separately in each direction
>>> I think that:
>>> a) this doesn't make sense - it's a completely new SDP/RTP practice,
>>> and we should not depart from established practice without a good
>>> reason; it also flies against the "keep the number of RTP sessions as
>>> low as we can" conclusion that came out of all the discussions about 
>>> ICE.
>>> b) it's not consistent with section 4.1.2 step 7.
>>> I think step 16 of section 4:
>>> "If connection's ICE started flag is still false, start the
>>> PeerConnection ICE Agent and send the initial offer. The initial offer
>>> must include a media description for the PeerConnection data UDP media
>>> stream, marked as "sendrecv", and for all the streams in localStreams
>>> (marked as "sendonly")."
>>> is neither correct nor complete.
>> I agree that "this doesn't make sense" and it is just yet another reason
>> that I think SDP offer-answer is entirely inappropriate for WEBRTC.
>> The web user agent should do what the web site's HTML and cooperating
>> Javascript tell it to do. It should not be engaged in direct negotiation
>> with the far end such that the outcome is either indeterminate or even
>> unexpected, except where direct negotiation is explicitly required to
>> meet a security requirement (the initial ICE handshake to determine that
>> it is permitted to send data to that endpoint).
>> Note that any perceived gains by doing this negotiation (like "what if
>> my browser is on a slow connection and only wants to receive audio") are
>> immediately negated the moment the site changes the SDP enroute to add
>> "wants HD-resolution video" for you.
> Ok, so that's a bad web-app - don't use it.
> Are you really suggesting "send video to someone who doesn't want it 
> anyways"?

No, I don't think that's a good idea. But one of the arguments I've 
heard *for* doing O-A between the two ends is that it somehow ensures 
that the sender won't send things that the receiver isn't prepared to 

I was simply pointing out that this isn't true at all. Because the O-A 
can easily be modified in-transit, either end can be trivially convinced 
to do things it otherwise shouldn't be doing.

> "perceived gains" - sending me video at (say) 500K or more plus audio 
> at<50K,
> when I'm on a 128K link will kill my connection until I can somehow 
> get the
> other side to back off.  Horrid user experience.

Yes. And it will be possible for a "bad web site" to do this whether we 
use O-A or something else.

> Remember people without
> broadband or with limited broadband will be using this.  What if I 
> have 768
> down, 128 up - but Johnny in the his bedroom is watching youtube 
> videos or
> downloading torrents, etc?

In theory the sender will back off once they get reports of massive 
loss, assuming we do congestion control for the media. If not, they 
you're both out of luck. No different really than anything else 
though... I can easily build a web site that lets you connect over HTTP 
and then runs a modified TCP that doesn't slow down when there's loss.

>> In addition, because the spec is currently written with offer-answer, a
>> wide array of use cases that would be possible if capabilities were
>> instead exposed via Javascript become impossible. As an example, it
>> should be possible for me as a web site developer to create a page that
>> can determine, without prompting you to use your camera, whether or not
>> a camera is available and if so what codecs it supports. That way I can
>> put "I see you have a high-resolution camera and can encode H.264
>> video... click here to call a live agent who can help you find the exact
>> replacement part" for users who have that, and not if they don't have a
>> codec that works with my call center or a camera with sufficient
>> resolution to examine the parts I sell.
> In what way is that use-case blocked by offer-answer?  Access to the 
> video
> needs user confirmation; access to capabilities info shouldn't.

I'm reading

I cannot see how to get it to generate an SDP offer before I try to open 
a media connection.

I also cannot see how one could possibly know *what* SDP to offer until 
you call "getUserMedia", which prompts the user. (As an example, I have 
two cameras attached to this computer over USB. One of them has an 
on-board H.264 encoder. The other does not. If my browser can do 
pass-through of the encoded H.264 but doesn't have its own H.264 
encoder, what offer do I generate before the camera is selected? SDP 
doesn't have a good way to encode "maybe".)

What we need are a set of Javascript APIs that let us enumerate the 
available audio input devices and encoders, video input devices and 
encoders, audio output devices and decoders, video output devices and 
decoders. I hate to say it but even H.245 is probably a better model for 
how to collect the capabilities than SDP O-A.

> And so the
> page can generate an offer with audio and video, or just audio if they 
> have
> no camera.  A user declining to send video doesn't necessarily mean you
> don't negotiate it - they advertise receive-only for video (if the web 
> app
> wants to).

I don't understand what you mean by "(if the web app wants to)"... the 
current proposed specification doesn't have a way for the web app to 
control whether the SDP offer is receive-only for video or not.

> You state above there are major possibilities blocked by offer-answer 
> (I can
> think of some minor issues that with O-A require a second O-A pass) - 
> can you
> detail some use-cases so we can see the benefit and not just the 
> assertion?

It is certainly possible that a repeated set of fake offer-answer 
exchanges can be used by either Javascript or the server to determine 
what the capabilities are, and that a set of rich Javascript-exposed 
capabilities may be used to generate SDP offers and answers, so in that 
sense they are mappable from one to another.

But I would argue that turning capabilities into SDP is easier than the 
reverse. This is really a question of whether we are trying to turn the 
browser into a *platform for building applications* (in which case we 
should be exposing, as an operating system does, APIs for determining 
what is possible and APIs for control) or turning the browser into *a 
phone*, in which case sure, SIP and SDP are both fine choices.

I think there's a whole lot of potential applications beyond what we're 
currently thinking of if we provide a platform and not just a phone.

I've already outlined one case (offering the user the ability to place a 
video call *if* there is an acceptable camera and encoder on the system).

Another obvious one is where there's a dozen people already in a video 
call and one more wishes to join... if the server knows what 
capabilities exist, the server can tell the new joiner "your browser 
can't support video in a format that the other users need" or can tell 
the existing user browsers to switch to a different codec that is now 
compatible with everyone or whatever, rather than having to re-run 
offer-answer with every party.

> Then we could balance that against the advantage of O-A being 
> well-speced and
> known and implemented.

But it *isn't* well-speced for use cases other than one-to-one calling, 
really. It works poorly for recording, it works poorly for large 
multi-party conferences, etc.

And on top of that, it is missing important attributes that we'll want 
to control (like whether the Opus codec is being forced to "music" mode 
or not).

> (And the issues surrounding how well and how
> interoperably web-app developers can implement their own capability
> negotiations/etc)

For interoperability, the browser (in Javascript) or the server (in any 
language you wish) can generate SDP, if that's how they wish to 

Turning capability and control APIs into SDP is exactly the same problem 
as turning operating system APIs into SDP... the browser should be the 
new operating system, not just a hardcoded phone.

Matthew Kaufman