[rtcweb] The SIP gateway use case
worley@ariadne.com (Dale R. Worley) Wed, 10 October 2012 20:27 UTC
Return-Path: <worley@shell01.TheWorld.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 45F6111E809B for <rtcweb@ietfa.amsl.com>; Wed, 10 Oct 2012 13:27:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.67
X-Spam-Level:
X-Spam-Status: No, score=-2.67 tagged_above=-999 required=5 tests=[AWL=0.310, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, RCVD_IN_SORBS_WEB=0.619]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XXwbjJrdnXou for <rtcweb@ietfa.amsl.com>; Wed, 10 Oct 2012 13:27:09 -0700 (PDT)
Received: from TheWorld.com (pcls6.std.com [192.74.137.146]) by ietfa.amsl.com (Postfix) with ESMTP id 53DB411E808D for <rtcweb@ietf.org>; Wed, 10 Oct 2012 13:27:09 -0700 (PDT)
Received: from shell.TheWorld.com (root@shell01.theworld.com [192.74.137.71]) by TheWorld.com (8.14.5/8.14.5) with ESMTP id q9AKQHRt004278 for <rtcweb@ietf.org>; Wed, 10 Oct 2012 16:26:20 -0400
Received: from shell01.TheWorld.com (localhost.theworld.com [127.0.0.1]) by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id q9AKQHV74318061 for <rtcweb@ietf.org>; Wed, 10 Oct 2012 16:26:17 -0400 (EDT)
Received: (from worley@localhost) by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id q9AKQH8o4341543; Wed, 10 Oct 2012 16:26:17 -0400 (EDT)
Date: Wed, 10 Oct 2012 16:26:17 -0400
Message-Id: <201210102026.q9AKQH8o4341543@shell01.TheWorld.com>
From: worley@ariadne.com
Sender: worley@ariadne.com
To: rtcweb@ietf.org
Subject: [rtcweb] The SIP gateway use case
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 20:27:10 -0000
I'm new to this discussion. Being from the SIP community, my "default use case" for WebRTC is when the web server routes the WebRTC media to a SIP gateway, which I expect will be a significant fraction of early usage of WebRTC. Within this context, the WebRTC architecture needs to be able to support the media stream operations that a SIP endpoint ("user agent", UA) must implement. Those operations are not immediately clear from a reading of the standards (as many a SIP implementer has discovered), partly due to the complexity of the situation and partly because a number of critical problems are not manifested in the protocol operations, and hence the RFCs do not address them. The proper, indeed necessary, operations only become clear from an extended discussion of various cases and what strategies will and won't work in practice. My apologies if any of this duplicates what others have already described. I've tried to leave out irrelevant parts of the SIP signaling. I may have made some mistakes. In particular, I'm not familiar with the details of ICE. Please correct any mistakes. I've divided the discussion into three sections - Background - The situation the SIP caller is faced with - Architectural questions * Background Let me describe the actions by a SIP endpoint that initiates a call (a "user agent client", UAC). The operation of a SIP endpoint that receives a call (a "user agent server", UAS) is similar but much simpler, and I will leave it to the reader to construct that discussion. Similarly, I will assume that the UAC sends an initial offer; when the UAC does not send an initial offer, the operations are similar but simpler. The UAC sends the INVITE containing the initial offer. Over time, this INVITE reaches various UASs via request forwarding, serial forking, and parallel forking. Between the UAC and each UAS, SIP constructs a "dialog", which is the stream of SIP requests and responses between them through intermediary SIP entities, and the associated dialog state. Because of network delays, because some SIP responses are not reliably transmitted, and because some SIP responses are absorbed by intermediate SIP entities, the UAC does not have complete knowledge of the dialogs it is participating in. Connected with each dialog is a "session", which is a collection of media streams. Each session is described by the two SDPs which have been sent from each UA to the other, and that description can change over time due to SIP messages carrying SDP offers and answers. The sending and receiving of SDP between the UAC and each UAS is separately governed by the offer/answer rules and has its own offer/answer state. (Excepting that all offer/answer states necessarily start with the same initial offer.) ICE negotiation is performed within any session for which SDP has been sent and received. In practice, the UAC uses a separate port for each media stream (m-line) in its offer, but for any media stream, uses the same port to communicate the clone of that media stream to/from each UAS. In practice, each UAS uses a different ports from every other UAS, and a different port for each media stream. The RTP of a media stream is not labeled to match it with the governing SDP for its session; the matching is done heuristically between the port used by the UAS and the port listed in the SDP. Because SIP responses may not reach the UAC promptly, the UAC can be participating in sessions that it has received no SDP for. A dialog and its session can be terminated in various ways: - Final termination of the call by sending/receiving a BYE - Termination of an early dialog by the UAC sending a BYE (uncommon, but allowed) - Termination of all but one of several 200 responses received in a race situation by the UAC sending BYEs - The UAS sending a failure final response. (But some such responses are received by intermediate proxies and not passed on, and trigger serial forking.) - The UAS sending a 199 response (or a proxy sending a 199 response on behalf of the UAS), which may reach the UAC * The situation the SIP caller is faced with Thus, the SIP caller (UAC) is participating in a dynamically changing collection of sessions. Some of the sessions the UAC has complete knowledge of, and maintains an offer/answer state for. The current SDP for a session can be updated by either UA. The corresponding RTP streams are identified by their UAS-end addresses. Some of the sessions the UAC has no knowledge of, other than that it is receiving RTP for the session. The UAC may later receive SDP for such a session. The situation will be simplified to one dialog and session when: - the dialog is ended by the UAC receiving a failure final response to the initial INVITE - the call is established by the UAC receiving a 200 final response (but the UAC must be prepared to receive additional 200 final responses and terminate those dialogs with BYE) Regarding the media received in each particular session, there are several alternatives: - The media are being received, and are not silence. This stream should be mixed into what the user hears, as it contains information from a UAS. - The media are absent or silent, but the SIP signaling has indicated that the far end is ringing (a 180 has been received on this dialog). For this stream, a ringback signal should be mixed into what the user hears. - The media are absent or silent, and the SIP signaling has not indicated that the far end is ringing. For this stream, nothing should be mixed into what the user hears. Regarding the media sent to each particular session: - If a session is for an established dialog, the user's voice should be sent to the (necessarily unique) UAS. - If a session is the only early dialog, the user's voice should be sent to the unique UAS. (This is because some legitimate callees delay proving a 200 response and have extended information interchange during the early dialog.) - If there are multiple early dialogs, practice seems not to be standardized. * Architectural questions In this context, the major architectural questions seem to be the degree to which this messy reality is revealed to the WebRTC browser-side code. The choices on these questions are reflected in both the wire protocol and the Javascript API. Does WebRTC reveal the multiple sessions to the multiple UASs? (It seems that any sessions for which outgoing RTP can be generated (at the browser), must be individually revealed to the browser, as each UAS can provide answer SDP that has no codec in common with other answer SDPs.) Does WebRTC reveal/permit later offer/answer cycles for a session? (Note that the SIP UA can prevent later offer/answer cycles by not sending re-INVITEs/UPDATEs and giving failure responses to any received re-INVITEs/UPDATEs. But this is unlikely to be tolerated in practice, as later offer/answer cycles are used in many SIP systems.) Does WebRTC transport signaling-level ring and call failure information (e.g., the SIP call progress and termination response codes), or just the initiation and termination events of sessions? Does WebRTC transport or reveal in the API the complete SDP or just selected information from it? Does WebRTC provide facilities for the Javascript to affect the handling of multiple sessions (e.g., forcibly terminating an early dialog with undesirable properties)? (Note that in SIP phones, the user is usually not provided with any such facility.) Dale
- [rtcweb] The SIP gateway use case Dale R. Worley