[rtcweb] WebRTC offer/answer design and corresponding SIP gateway design

worley@ariadne.com (Dale R. Worley) Wed, 17 October 2012 20:05 UTC

Return-Path: <worley@shell01.TheWorld.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 6A8FB21F86C5 for <rtcweb@ietfa.amsl.com>; Wed, 17 Oct 2012 13:05:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.916
X-Spam-Status: No, score=-2.916 tagged_above=-999 required=5 tests=[AWL=0.064, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, RCVD_IN_SORBS_WEB=0.619]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id jMvOSgS5JZok for <rtcweb@ietfa.amsl.com>; Wed, 17 Oct 2012 13:05:08 -0700 (PDT)
Received: from TheWorld.com (pcls5.std.com []) by ietfa.amsl.com (Postfix) with ESMTP id 1213921F86B6 for <rtcweb@ietf.org>; Wed, 17 Oct 2012 13:05:07 -0700 (PDT)
Received: from shell.TheWorld.com (nevins@shell01.theworld.com []) by TheWorld.com (8.14.5/8.14.5) with ESMTP id q9HK457l012976 for <rtcweb@ietf.org>; Wed, 17 Oct 2012 16:04:07 -0400
Received: from shell01.TheWorld.com (localhost []) by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id q9HK45Hf4780584 for <rtcweb@ietf.org>; Wed, 17 Oct 2012 16:04:05 -0400 (EDT)
Received: (from worley@localhost) by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id q9HK45QU4781362; Wed, 17 Oct 2012 16:04:05 -0400 (EDT)
Date: Wed, 17 Oct 2012 16:04:05 -0400
Message-Id: <201210172004.q9HK45QU4781362@shell01.TheWorld.com>
From: worley@ariadne.com
Sender: worley@ariadne.com
To: rtcweb@ietf.org
In-reply-to: <C5E08FE080ACFD4DAE31E4BDBF944EB111886565@xmb-aln-x02.cisco.com> (fluffy@cisco.com)
Subject: [rtcweb] WebRTC offer/answer design and corresponding SIP gateway design
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Oct 2012 20:05:11 -0000

> From: "Cullen Jennings (fluffy)" <fluffy@cisco.com>
> But we all need to get this thread to specific examples about real
> world use cases that people can understand.

This is an analysis of how SIP/WebRTC gateway design depends upon and
reflects the offer/answer architecture that is chosen for WebRTC.  The
unexpected result is that a SIP/WebRTC gateway can work well with a
*simpler* WebRTC offer/answer architecture than is being discussed,
indeed, an O/A architecture resembling a single caller/callee pair.

* Background

The overall task that must be performed by a SIP/WebRTC-server gateway
combined with the WebRTC-client is the same as what a SIP telephone
does.  A detailed discussion is in my previous message
The important point for this discussion is:  Quality handing of
outgoing SIP calls requires integrating (mixing) information from both
the signaling and media for all early dialogs to generate the media
stream to be presented to the user.

Let me now outline three architectures and their characteristics.  I
still haven't checked the consequences of requiring ICE, but I believe
that it does not fundamentally change the analysis.

* cloning

In the cloning architecture, the WebRTC protocol reveals the multiple
dialogs of the SIP call.  When the client Javascript initiates the
call, a "call object" is created.  When the SIP signaling reveals to
the gateway the existence of a new early dialog, the gateway uses
WebRTC to tell the client of the new dialog, and a child "dialog
object" is created.  The dialog objects are continually updated by
information from the WebRTC server to tell the Javascript the status
of the early dialogs.

The WebRTC client is the source/target of the media streams.  (If it
is possible for an early dialog's media stream to arrive at the client
before corresponding SIP signaling, the client stack creates
additional dialog objects for those early dialogs.)

The client stack or Javascript must integrate the information in the
signaling and media to produce the audio stream to be presented to the

This architecture is the most complicated form of WebRTC.  It also
requires that the client stack and/or Javascript implement the full
processing required by a SIP phone in all cases, because in general,
the client-side code never knows whether a communication session is
directed to a SIP gateway or not.


In the PRANSWER architecture, the WebRTC protocol carries SDP in a
manner that is similar to SDP offer/answer between two endpoints.  The
exception is that the offer provided with call initiation may receive
a sequence of answers, which inform the client of the origins of
successive early dialogs that it should listen to.

The WebRTC client is the source/target of the media streams.

This architecture presupposes that there will be only one early dialog
at a time that is producing media that the user needs to hear, and
that the gateway can determine from the SIP signaling what this
sequence of dialogs is.

The difficulty is that neither of these presuppositions is correct:
Given parallel forking, the call may have reached destinations that
are both producing important feedback at the same time.  Also, the SIP
signaling does not completely inform the SIP caller which destinations
are producing media streams that contain information -- that decision
requires determining if RTP is arriving from each destination and
whether the RTP is non-silent.  In addition, if the signaling shows
that a destination is ringing but is not providing media, synthetic
ringback media should be provided.  This architecture requires that
the gateway make these decisions despite not being the endpoint of the
media streams.

* simple

In the "simple" architecture, the WebRTC protocol models a *single*
offer/answer dialog, and all of the media processing required for a
SIP early dialog is placed in the gateway:  Initially, the gateway
provides an answer giving *itself* as the server end of the media
streams.  During the early phase of the call, the gateway decodes,
synthesizes, and mixes all of the early media according to the
principles for SIP telephones, and sends a single RTP stream to the
WebRTC client that is the audio the user should hear.  This is
relatively easy because the gateway has direct access to both the
signaling and the media from the SIP destinations.

At some point (probably when one dialog becomes established), the
gateway decides to remove itself from the media processing, and does a
"Z operation" to pass a new offer/answer pair between the now-unique
UAS and the WebRTC client.

(A "Z operation" is named for its appearance on protocol diagrams:  An
intermediate device sends a request to the endpoint in one direction
demanding that it produce an offer; the intermediate device passes the
offer through to the other endpoint, which produces an answer; the
intermediate device passes the answer through back to the first

This architecture has several advantages:

(1) The WebRTC protocol implements only a single offer/answer

(2) The gateway is responsible for combining the media and signaling
information of the early dialogs; the multiple early dialogs are not
revealed in WebRTC.  This processing is relatively easy to implement,
as it can be implemented in about the same way that SIP phones now
implement it.

(3) Improving the handling of multiple early dialogs can be done
entirely by updating the gateway; the WebRTC protocol does not
commit the processing to any particular model.

(4) WebRTC client stacks and application Javascript provide no special
support for SIP forking; all connections appear to them to be "a direct
connection to a media server".