Re: [rtcweb] Forking & Early Media - Proposal

Randell Jesup <> Wed, 21 September 2011 07:08 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id B8DD121F8A71 for <>; Wed, 21 Sep 2011 00:08:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.613
X-Spam-Status: No, score=-2.613 tagged_above=-999 required=5 tests=[AWL=-0.014, BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id VdFMJe85pelQ for <>; Wed, 21 Sep 2011 00:08:49 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 6222921F8B3E for <>; Wed, 21 Sep 2011 00:08:49 -0700 (PDT)
Received: from ([] helo=[]) by with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <>) id 1R6Gxg-0002wb-LO for; Wed, 21 Sep 2011 02:11:16 -0500
Message-ID: <>
Date: Wed, 21 Sep 2011 03:07:51 -0400
From: Randell Jesup <>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1
MIME-Version: 1.0
References: <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
Subject: Re: [rtcweb] Forking & Early Media - Proposal
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 21 Sep 2011 07:08:50 -0000

NOTE: Attached below is a proposed set of forking/early-media
and clipping-avoidance rules, so don't glance and delete!  :-)

Also note: I started writing this earlier today, so it was largely
done before much of today's discussion on forking.  I'll note that
I include in this a method to minimize chances of answer-time
clipping.  (For any who don't know (if there are any), this is
where the first fraction of a second after pickup is lost while
answering, starting codecs, doing ICE, etc.)

On 9/20/2011 9:40 AM, Olle E. Johansson wrote:
>  20 sep 2011 kl. 15:15 skrev Christer Holmberg:
>>>>  Once we start requiring that the PeerConnection know the
>>>>  difference between "early" media and "late" media, it seems
>>>>  to me we're slipping down a slippery slope.
>>>  The difference between early and late media is purely a
>>>  billing decision in PSTN. I don't think we should separate
>>>  these on the rtcweb side. It's a PSTN gateway issue, not
>>>  something to be bothered with in rtcweb.
>>  It's not about knowing the difference between "early" and "late" media - it's about whether the API and browser need to support multiple SIMULTANOUS SDP answers - or whether we assume that the JS SIP app will always, at any given time, only provide ONE SDP answer to the API and browser.
>  I just wanted to get rid of the early/late media discussion. As you state, the forking issue with getting multiple responses is a separate issue.
>  Do we have any use cases using forking? Is forking a desired feature or something that SIP brought in?

No, this is something inherent in a person you want to converse with
possibly being in different places.  Different phones in a home,
different computers in a home or out of it (your desktop, your laptop,
your tablet, your work computer, your Android phone) - when someone
wants to talk to you on Skype or what have you, often the service will
want to offer the connection to any and all devices you're logged into
the service from.  So, it forks the request.  We'd have this issue
even if we totally disallow SIP and disallow PSTN connectivity.  If
you require that the website/server handle this and only provide one
answer, you're much more likely to clip the answer (lose audio right
after accept while the channels are being opened).

Two things in particular appear here.  One is early media (I want to
send media to you but no one has accepted).  I do not propose that
rtcweb generate early media; some sort of "alerting" notification is
enough (equivalent to 180).  (Realize that means no custom callback
tones or video, or weird cases like sitting on hold or in an IVR while
not actually "in" a call).  If so, we only have to worry about interop
cases - calling out to legacy, or *maybe* a call forked in rtcweb
where one of the forks goes to a legacy device or gateway that sends
early media.

The other is choosing which answer to accept if multiple arrive; that
can be up to the application I think (though 99% likely the app will
want to use the first answer).  I don't think we have to *mandate*
that the first answer is the one we use though I can't think of any
cases where we wouldn't, but I'm pretty sure they exist and I wouldn't
want to outlaw them for no reason).  If it makes any use-cases easier
to mandate the first answer, that may change my opinion.  If you're
using SIP (JS or not) that might affect the answer, of course.

While waiting for an acceptance, it makes *lots* of sense to "warm up"
the connection(s) so that when the call is accepted there's minimal
delay or pickup loss.  "warm up" means to do an ICE exchange and
possibly even instantiate codecs, etc.  This is complicated by not
knowing the final answer until the user decides how to answer, but you
could warm up the likely streams/codecs in most cases, and drop some
if needed on ACCEPT.  In the forking case, you could warm up
connections to some or all possible answers.  (Pacing may be an issue
here, but often there are 5-20 seconds to do it in.)

Implicit in this is separating ANSWERs from "acceptance", and
verifying on "acceptance" that the correct ANSWER is used (for
example, we warm up audio and video, and the person answers
audio-only, or for some reason chooses a different codec).

So, to summarize in psuedo-spec language:

0)   I'm assuming an Offer-Answer model here, though not assuming SDP.
      If you want, read "SDP ANSWER" for "ANSWER", etc to map to Harald's
      proposals.  Note that I add "ACCEPT".
0.1) Rough mapping to SIP:
      a) INVITE ->  OFFER
      b) 183 ->  ANSWER
      c) 180 ->  ANSWER-with-no-media-streams
      c) 200 ->  ANSWER (may be suppressed) + ACCEPT
0.2) I'm assuming OFFERs and ANSWERs and ACCEPTs are delivered on
      a reliable, in-order channel.

1) webrtc clients WILL NOT send early media
    [See below; I see no real need for webrtc<->webrtc client connections
     to send early media, but SIP/PSTN interop cases may require it, so
     I have an alternative below]
2) when a webrtc client receives a OFFER, it MAY generate a speculative
    ANSWER in order to allow pre-starting the PeerConnection in a disabled
    state.  If pre-started, NO media shall be sent until the call has been
    ACCEPTED.  Note that the OFFERer may receive data before seeing
    the ACCEPT.
3) if the ANSWERer generated a speculative ANSWER, it may replace that
    with an alternative ANSWER before sending ACCEPT.  This alternative
    SHOULD use the same connection address as the original, and if so
    the existing PeerConnection established or being established SHOULD
    be retained, but the mediastream configuration changed to match
    the new ANSWER.
4) the OFFERer SHOULD pre-start PeerConnections on a speculative ANSWER, or
    they MAY wait until an ACCEPT and then start the last ANSWER from that
    source.  If multiple sources supply speculative ANSWERs, the OFFERer
    MAY pre-start some, none or all of them as it wishes.
    [Open question: do we pre-start MediaStreams in each pre-starting
     PeerConnection, or do (can) we defer this until ACCEPT?]
5) when the OFFERer receives an ACCEPT, it MAY close other PeerConnections
    opened speculatively.
6) when an ANSWERer sends an accept, it MAY begin sending media immediately
    if the PeerConnection was pre-started.  It SHOULD be ready to receive
    media before sending the ACCEPT.
7) servers handling signalling for webrtc clients MAY fork a call offer
    to multiple webrtc clients
8) if a call is forked, the webrtc client MAY receive either a single
    ANSWER and ACCEPT, or MAY receive multiple ANSWERs with one or more
    ACCEPTs, depending on how the server works.

The provides a way to minimize the chances of start-of-call clipping,
and handles forking with minimal clipping (with cooperation of the
app).  Note that there may be a implementation limit on the number of
PeerConnections that can be "warmed up" before an ACCEPT.

Yes, if we remove 1) and replace it with (probably lower down)

N) webrtc clients MAY send "early media" on a pre-started PeerConnection
    but MUST NOT send any media without explicit action or consent of the
    user.  webrtc clients MAY play the early media.


N) webrtc media gateways MAY send "early media" on a pre-started PeerConnection,
    and webrtc clients receiving "early media" MAY play it, and MAY send
    media (such as DTMF) but MUST NOT send any media without explicit
    action or consent of the user.

   (and you have to change 2) above)

you get something that is pretty interoperable with legacy SIP devices
and especially PSTN gateways or border controllers, including the infamous
American Airlines DTMF trick.  This assumes a WebRTC<->legacy media gateway
is in use (note that all the above is about PeerConnections).  I have not
tried to figure out how non-gatewayed legacy would work into this, but it
should be doable.

Randell Jesup