[rtcweb] The SIP gateway use case

worley@ariadne.com (Dale R. Worley) Wed, 10 October 2012 20:27 UTC

Return-Path: <worley@shell01.TheWorld.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 45F6111E809B for <rtcweb@ietfa.amsl.com>; Wed, 10 Oct 2012 13:27:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.67
X-Spam-Status: No, score=-2.67 tagged_above=-999 required=5 tests=[AWL=0.310, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, RCVD_IN_SORBS_WEB=0.619]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id XXwbjJrdnXou for <rtcweb@ietfa.amsl.com>; Wed, 10 Oct 2012 13:27:09 -0700 (PDT)
Received: from TheWorld.com (pcls6.std.com []) by ietfa.amsl.com (Postfix) with ESMTP id 53DB411E808D for <rtcweb@ietf.org>; Wed, 10 Oct 2012 13:27:09 -0700 (PDT)
Received: from shell.TheWorld.com (root@shell01.theworld.com []) by TheWorld.com (8.14.5/8.14.5) with ESMTP id q9AKQHRt004278 for <rtcweb@ietf.org>; Wed, 10 Oct 2012 16:26:20 -0400
Received: from shell01.TheWorld.com (localhost.theworld.com []) by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id q9AKQHV74318061 for <rtcweb@ietf.org>; Wed, 10 Oct 2012 16:26:17 -0400 (EDT)
Received: (from worley@localhost) by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id q9AKQH8o4341543; Wed, 10 Oct 2012 16:26:17 -0400 (EDT)
Date: Wed, 10 Oct 2012 16:26:17 -0400 (EDT)
Message-Id: <201210102026.q9AKQH8o4341543@shell01.TheWorld.com>
From: worley@ariadne.com (Dale R. Worley)
Sender: worley@ariadne.com (Dale R. Worley)
To: rtcweb@ietf.org
Subject: [rtcweb] The SIP gateway use case
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 20:27:10 -0000

I'm new to this discussion.  Being from the SIP community, my "default
use case" for WebRTC is when the web server routes the WebRTC media to
a SIP gateway, which I expect will be a significant fraction of early
usage of WebRTC.  Within this context, the WebRTC architecture needs
to be able to support the media stream operations that a SIP endpoint
("user agent", UA) must implement.  Those operations are not
immediately clear from a reading of the standards (as many a SIP
implementer has discovered), partly due to the complexity of the
situation and partly because a number of critical problems are not
manifested in the protocol operations, and hence the RFCs do not
address them.  The proper, indeed necessary, operations only become
clear from an extended discussion of various cases and what strategies
will and won't work in practice.

My apologies if any of this duplicates what others have already
described.  I've tried to leave out irrelevant parts of the SIP
signaling.  I may have made some mistakes.  In particular, I'm not
familiar with the details of ICE.  Please correct any mistakes.

I've divided the discussion into three sections
- Background
- The situation the SIP caller is faced with
- Architectural questions

* Background

Let me describe the actions by a SIP endpoint that initiates a call (a
"user agent client", UAC).  The operation of a SIP endpoint that
receives a call (a "user agent server", UAS) is similar but much
simpler, and I will leave it to the reader to construct that
discussion.  Similarly, I will assume that the UAC sends an initial
offer; when the UAC does not send an initial offer, the operations are
similar but simpler.

The UAC sends the INVITE containing the initial offer.  Over time,
INVITE reaches various UASs via request forwarding, serial forking,
and parallel forking.  Between the UAC and each UAS, SIP constructs a
"dialog", which is the stream of SIP requests and responses between
them through intermediary SIP entities, and the associated dialog

Because of network delays, because some SIP responses are not reliably
transmitted, and because some SIP responses are absorbed by
intermediate SIP entities, the UAC does not have complete knowledge of
the dialogs it is participating in.

Connected with each dialog is a "session", which is a collection of
media streams.  Each session is described by the two SDPs which have
been sent from each UA to the other, and that description can change
over time due to SIP messages carrying SDP offers and answers.  The
sending and receiving of SDP between the UAC and each UAS is
separately governed by the offer/answer rules and has its own
offer/answer state.  (Excepting that all offer/answer states
necessarily start with the same initial offer.)

ICE negotiation is performed within any session for which SDP has been
sent and received.

In practice, the UAC uses a separate port for each media stream
(m-line) in its offer, but for any media stream, uses the same port to
communicate the clone of that media stream to/from each UAS.  In
practice, each UAS uses a different ports from every other UAS, and a
different port for each media stream.  The RTP of a media stream is
not labeled to match it with the governing SDP for its session; the
matching is done heuristically between the port used by the UAS and
the port listed in the SDP.

Because SIP responses may not reach the UAC promptly, the UAC can be
participating in sessions that it has received no SDP for.

A dialog and its session can be terminated in various ways:
- Final termination of the call by sending/receiving a BYE
- Termination of an early dialog by the UAC sending a BYE (uncommon,
but allowed)
- Termination of all but one of several 200 responses received in a
race situation by the UAC sending BYEs
- The UAS sending a failure final response.  (But some such responses
are received by intermediate proxies and not passed on, and trigger
serial forking.)
- The UAS sending a 199 response (or a proxy sending a 199 response on
behalf of the UAS), which may reach the UAC

* The situation the SIP caller is faced with

Thus, the SIP caller (UAC) is participating in a dynamically changing
collection of sessions.

Some of the sessions the UAC has complete knowledge of, and maintains
an offer/answer state for.  The current SDP for a session can be
updated by either UA.  The corresponding RTP streams are identified by
their UAS-end addresses.

Some of the sessions the UAC has no knowledge of, other than that it
is receiving RTP for the session.  The UAC may later receive SDP for
such a session.

The situation will be simplified to one dialog and session when:
- the dialog is ended by the UAC receiving a failure final response to
the initial INVITE
- the call is established by the UAC receiving a 200 final response
(but the UAC must be prepared to receive additional 200 final
responses and terminate those dialogs with BYE)

Regarding the media received in each particular session, there are
several alternatives:
- The media are being received, and are not silence.  This stream
should be mixed into what the user hears, as it contains information
from a UAS.
- The media are absent or silent, but the SIP signaling has indicated
that the far end is ringing (a 180 has been received on this dialog).
For this stream, a ringback signal should be mixed into what the user
- The media are absent or silent, and the SIP signaling has not
indicated that the far end is ringing.  For this stream, nothing
should be mixed into what the user hears.

Regarding the media sent to each particular session:
- If a session is for an established dialog, the user's voice should
be sent to the (necessarily unique) UAS.
- If a session is the only early dialog, the user's voice should be
sent to the unique UAS.  (This is because some legitimate callees
delay proving a 200 response and have extended information interchange
during the early dialog.)
- If there are multiple early dialogs, practice seems not to be

* Architectural questions

In this context, the major architectural questions seem to be the
degree to which this messy reality is revealed to the WebRTC
browser-side code.  The choices on these questions are reflected in
both the wire protocol and the Javascript API.

Does WebRTC reveal the multiple sessions to the multiple UASs?  (It
seems that any sessions for which outgoing RTP can be generated (at
the browser), must be individually revealed to the browser, as each
UAS can provide answer SDP that has no codec in common with other
answer SDPs.)

Does WebRTC reveal/permit later offer/answer cycles for a session?
(Note that the SIP UA can prevent later offer/answer cycles by not
sending re-INVITEs/UPDATEs and giving failure responses to any
received re-INVITEs/UPDATEs.  But this is unlikely to be tolerated in
practice, as later offer/answer cycles are used in many SIP systems.)

Does WebRTC transport signaling-level ring and call failure
information (e.g., the SIP call progress and termination response
codes), or just the initiation and termination events of sessions?

Does WebRTC transport or reveal in the API the complete SDP or just
selected information from it?

Does WebRTC provide facilities for the Javascript to affect the
handling of multiple sessions (e.g., forcibly terminating an early
dialog with undesirable properties)?  (Note that in SIP phones, the
user is usually not provided with any such facility.)