Re: [rtcweb] Forking & Early Media - Proposal

Roman Shpount <> Wed, 21 September 2011 14:57 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 6A5AB1F0C75 for <>; Wed, 21 Sep 2011 07:57:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.94
X-Spam-Status: No, score=-1.94 tagged_above=-999 required=5 tests=[AWL=-0.630, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1, SARE_HTML_USL_OBFU=1.666]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 8EVzdM0Yz-59 for <>; Wed, 21 Sep 2011 07:57:47 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id E0A9D1F0C70 for <>; Wed, 21 Sep 2011 07:57:46 -0700 (PDT)
Received: by gyd12 with SMTP id 12so1481354gyd.31 for <>; Wed, 21 Sep 2011 08:00:15 -0700 (PDT)
Received: by with SMTP id o11mr1040479yba.433.1316617215357; Wed, 21 Sep 2011 08:00:15 -0700 (PDT)
Received: from ( []) by with ESMTPS id x23sm1972539ybd.20.2011. (version=TLSv1/SSLv3 cipher=OTHER); Wed, 21 Sep 2011 08:00:15 -0700 (PDT)
Received: by ywa6 with SMTP id 6so1483994ywa.31 for <>; Wed, 21 Sep 2011 08:00:13 -0700 (PDT)
MIME-Version: 1.0
Received: by with SMTP id t2mr1484676pbt.241.1316617213193; Wed, 21 Sep 2011 08:00:13 -0700 (PDT)
Received: by with HTTP; Wed, 21 Sep 2011 08:00:11 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <>
Date: Wed, 21 Sep 2011 11:00:11 -0400
Message-ID: <>
From: Roman Shpount <>
To: Randell Jesup <>
Content-Type: multipart/alternative; boundary="bcaec52159a54c2eb104ad74d5d6"
Subject: Re: [rtcweb] Forking & Early Media - Proposal
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 21 Sep 2011 14:57:48 -0000


Are we going to create multiple PeerConnections in case of forking when
multiple ANSWER + ACCEPT are received?
Roman Shpount

On Wed, Sep 21, 2011 at 3:07 AM, Randell Jesup <>wrote:

> NOTE: Attached below is a proposed set of forking/early-media
> and clipping-avoidance rules, so don't glance and delete!  :-)
> Also note: I started writing this earlier today, so it was largely
> done before much of today's discussion on forking.  I'll note that
> I include in this a method to minimize chances of answer-time
> clipping.  (For any who don't know (if there are any), this is
> where the first fraction of a second after pickup is lost while
> answering, starting codecs, doing ICE, etc.)
> On 9/20/2011 9:40 AM, Olle E. Johansson wrote:
>>  20 sep 2011 kl. 15:15 skrev Christer Holmberg:
>>   Once we start requiring that the PeerConnection know the
>>>>>  difference between "early" media and "late" media, it seems
>>>>>  to me we're slipping down a slippery slope.
>>>>  The difference between early and late media is purely a
>>>>  billing decision in PSTN. I don't think we should separate
>>>>  these on the rtcweb side. It's a PSTN gateway issue, not
>>>>  something to be bothered with in rtcweb.
>>>  It's not about knowing the difference between "early" and "late" media -
>>> it's about whether the API and browser need to support multiple SIMULTANOUS
>>> SDP answers - or whether we assume that the JS SIP app will always, at any
>>> given time, only provide ONE SDP answer to the API and browser.
>>  I just wanted to get rid of the early/late media discussion. As you
>> state, the forking issue with getting multiple responses is a separate
>> issue.
>>  Do we have any use cases using forking? Is forking a desired feature or
>> something that SIP brought in?
> No, this is something inherent in a person you want to converse with
> possibly being in different places.  Different phones in a home,
> different computers in a home or out of it (your desktop, your laptop,
> your tablet, your work computer, your Android phone) - when someone
> wants to talk to you on Skype or what have you, often the service will
> want to offer the connection to any and all devices you're logged into
> the service from.  So, it forks the request.  We'd have this issue
> even if we totally disallow SIP and disallow PSTN connectivity.  If
> you require that the website/server handle this and only provide one
> answer, you're much more likely to clip the answer (lose audio right
> after accept while the channels are being opened).
> Two things in particular appear here.  One is early media (I want to
> send media to you but no one has accepted).  I do not propose that
> rtcweb generate early media; some sort of "alerting" notification is
> enough (equivalent to 180).  (Realize that means no custom callback
> tones or video, or weird cases like sitting on hold or in an IVR while
> not actually "in" a call).  If so, we only have to worry about interop
> cases - calling out to legacy, or *maybe* a call forked in rtcweb
> where one of the forks goes to a legacy device or gateway that sends
> early media.
> The other is choosing which answer to accept if multiple arrive; that
> can be up to the application I think (though 99% likely the app will
> want to use the first answer).  I don't think we have to *mandate*
> that the first answer is the one we use though I can't think of any
> cases where we wouldn't, but I'm pretty sure they exist and I wouldn't
> want to outlaw them for no reason).  If it makes any use-cases easier
> to mandate the first answer, that may change my opinion.  If you're
> using SIP (JS or not) that might affect the answer, of course.
> While waiting for an acceptance, it makes *lots* of sense to "warm up"
> the connection(s) so that when the call is accepted there's minimal
> delay or pickup loss.  "warm up" means to do an ICE exchange and
> possibly even instantiate codecs, etc.  This is complicated by not
> knowing the final answer until the user decides how to answer, but you
> could warm up the likely streams/codecs in most cases, and drop some
> if needed on ACCEPT.  In the forking case, you could warm up
> connections to some or all possible answers.  (Pacing may be an issue
> here, but often there are 5-20 seconds to do it in.)
> Implicit in this is separating ANSWERs from "acceptance", and
> verifying on "acceptance" that the correct ANSWER is used (for
> example, we warm up audio and video, and the person answers
> audio-only, or for some reason chooses a different codec).
> So, to summarize in psuedo-spec language:
> 0)   I'm assuming an Offer-Answer model here, though not assuming SDP.
>     If you want, read "SDP ANSWER" for "ANSWER", etc to map to Harald's
>     proposals.  Note that I add "ACCEPT".
> 0.1) Rough mapping to SIP:
>     a) INVITE ->  OFFER
>     b) 183 ->  ANSWER
>     c) 180 ->  ANSWER-with-no-media-streams
>     c) 200 ->  ANSWER (may be suppressed) + ACCEPT
> 0.2) I'm assuming OFFERs and ANSWERs and ACCEPTs are delivered on
>     a reliable, in-order channel.
> 1) webrtc clients WILL NOT send early media
>   [See below; I see no real need for webrtc<->webrtc client connections
>    to send early media, but SIP/PSTN interop cases may require it, so
>    I have an alternative below]
> 2) when a webrtc client receives a OFFER, it MAY generate a speculative
>   ANSWER in order to allow pre-starting the PeerConnection in a disabled
>   state.  If pre-started, NO media shall be sent until the call has been
>   ACCEPTED.  Note that the OFFERer may receive data before seeing
>   the ACCEPT.
> 3) if the ANSWERer generated a speculative ANSWER, it may replace that
>   with an alternative ANSWER before sending ACCEPT.  This alternative
>   SHOULD use the same connection address as the original, and if so
>   the existing PeerConnection established or being established SHOULD
>   be retained, but the mediastream configuration changed to match
>   the new ANSWER.
> 4) the OFFERer SHOULD pre-start PeerConnections on a speculative ANSWER, or
>   they MAY wait until an ACCEPT and then start the last ANSWER from that
>   source.  If multiple sources supply speculative ANSWERs, the OFFERer
>   MAY pre-start some, none or all of them as it wishes.
>   [Open question: do we pre-start MediaStreams in each pre-starting
>    PeerConnection, or do (can) we defer this until ACCEPT?]
> 5) when the OFFERer receives an ACCEPT, it MAY close other PeerConnections
>   opened speculatively.
> 6) when an ANSWERer sends an accept, it MAY begin sending media immediately
>   if the PeerConnection was pre-started.  It SHOULD be ready to receive
>   media before sending the ACCEPT.
> 7) servers handling signalling for webrtc clients MAY fork a call offer
>   to multiple webrtc clients
> 8) if a call is forked, the webrtc client MAY receive either a single
>   ANSWER and ACCEPT, or MAY receive multiple ANSWERs with one or more
>   ACCEPTs, depending on how the server works.
> The provides a way to minimize the chances of start-of-call clipping,
> and handles forking with minimal clipping (with cooperation of the
> app).  Note that there may be a implementation limit on the number of
> PeerConnections that can be "warmed up" before an ACCEPT.
> Yes, if we remove 1) and replace it with (probably lower down)
> N) webrtc clients MAY send "early media" on a pre-started PeerConnection
>   but MUST NOT send any media without explicit action or consent of the
>   user.  webrtc clients MAY play the early media.
> or
> N) webrtc media gateways MAY send "early media" on a pre-started
> PeerConnection,
>   and webrtc clients receiving "early media" MAY play it, and MAY send
>   media (such as DTMF) but MUST NOT send any media without explicit
>   action or consent of the user.
>  (and you have to change 2) above)
> you get something that is pretty interoperable with legacy SIP devices
> and especially PSTN gateways or border controllers, including the infamous
> American Airlines DTMF trick.  This assumes a WebRTC<->legacy media gateway
> is in use (note that all the above is about PeerConnections).  I have not
> tried to figure out how non-gatewayed legacy would work into this, but it
> should be doable.
> --
> Randell Jesup
> ______________________________**_________________
> rtcweb mailing list