[Sip] Requested Expert Review Request: draft-ietf-mediactrl-vxml-00

Paul Kyzivat <pkyzivat@cisco.com> Mon, 17 December 2007 20:32 UTC

Return-path: <sip-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1J4MdU-0006pB-1E; Mon, 17 Dec 2007 15:32:24 -0500
Received: from sip by megatron.ietf.org with local (Exim 4.43) id 1J4MdT-0006p5-2W for sip-confirm+ok@megatron.ietf.org; Mon, 17 Dec 2007 15:32:23 -0500
Received: from [] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1J4MdS-0006ov-Lo for sip@ietf.org; Mon, 17 Dec 2007 15:32:22 -0500
Received: from rtp-iport-1.cisco.com ([]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1J4MdR-0007MM-Ln for sip@ietf.org; Mon, 17 Dec 2007 15:32:22 -0500
Received: from rtp-dkim-2.cisco.com ([]) by rtp-iport-1.cisco.com with ESMTP; 17 Dec 2007 15:32:21 -0500
Received: from rtp-core-1.cisco.com (rtp-core-1.cisco.com []) by rtp-dkim-2.cisco.com (8.12.11/8.12.11) with ESMTP id lBHKWLPk013828; Mon, 17 Dec 2007 15:32:21 -0500
Received: from xbh-rtp-201.amer.cisco.com (xbh-rtp-201.cisco.com []) by rtp-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id lBHKW4vB016167; Mon, 17 Dec 2007 20:32:14 GMT
Received: from xfe-rtp-202.amer.cisco.com ([]) by xbh-rtp-201.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 17 Dec 2007 15:32:01 -0500
Received: from [] ([]) by xfe-rtp-202.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 17 Dec 2007 15:32:01 -0500
Message-ID: <4766DCC6.3030708@cisco.com>
Date: Mon, 17 Dec 2007 15:32:06 -0500
From: Paul Kyzivat <pkyzivat@cisco.com>
User-Agent: Thunderbird (Windows/20071031)
MIME-Version: 1.0
To: Dean Willis <dean.willis@softarmor.com>
References: <C389609A.1208D%eburger@bea.com> <3A12F8FF-8623-484C-ABD4-DE00FBD03B87@softarmor.com>
In-Reply-To: <3A12F8FF-8623-484C-ABD4-DE00FBD03B87@softarmor.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 17 Dec 2007 20:32:01.0029 (UTC) FILETIME=[E0A26350:01C840EB]
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=7910; t=1197923541; x=1198787541; c=relaxed/simple; s=rtpdkim2001; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=pkyzivat@cisco.com; z=From:=20Paul=20Kyzivat=20<pkyzivat@cisco.com> |Subject:=20Requested=20Expert=20Review=20Request=3A=20draf t-ietf-mediactrl-vxml-00 |Sender:=20 |To:=20Dean=20Willis=20<dean.willis@softarmor.com>; bh=oEni4JbKiFPKXGa5MXUBPAclJ1m/fuBXP1f8lihnyBQ=; b=M1ooPRSydsLvgGjA0pEQusTnVw1jF+JXwtxSxdlpGCpO543xey7G3cp2Lo YUInDBZjZRhoUqEI+ocyvsNO5iQijf9sN6pnX4GrMkwtXUz90AZouO5miAXL 6unOseu4fT;
Authentication-Results: rtp-dkim-2; header.From=pkyzivat@cisco.com; dkim=pass ( sig from cisco.com/rtpdkim2001 verified; );
X-Spam-Score: -4.0 (----)
X-Scan-Signature: d890c9ddd0b0a61e8c597ad30c1c2176
Cc: sip-chairs@tools.ietf.org, SIP IETF <sip@ietf.org>, Mark Scott <mscott@voicegenie.com>, David Burke <daveburke@google.com>
Subject: [Sip] Requested Expert Review Request: draft-ietf-mediactrl-vxml-00
X-BeenThere: sip@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Session Initiation Protocol <sip.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/sip>, <mailto:sip-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:sip@ietf.org>
List-Help: <mailto:sip-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/sip>, <mailto:sip-request@ietf.org?subject=subscribe>
Errors-To: sip-bounces@ietf.org

I was asked by Dean to provide a SIP Expert Review of this draft, with 
special note to section 2.3. In general I like this draft. It begins to 
exploit the power of sip, and to unify it with other protocols. I did 
find some issues, all of which should be fairly straightforward to address.

I've broken my comments down by section. In many cases I have includes 
snips from the draft for reference and followed those with pertinent 


Section 2.1:

The ABNF of the URI seems excessively rigid in the ordering of 
parameters - first the init-parameters, then the vxml-parameters, then 
then any other uri-parameters. It also doesn't note that 
'uri-parameters' is defined in 3261.

I would suggest something like:

   dialog-parameters = *(";" dialog-parameter)

   dialog-parameter  = init-param /
                       vxml-param /
                       uri-parameter ; defined in RFC 3261

   init-param        = (dialog-param /
                        maxage-param /
                        maxstale-param /
                        method-param /

   vxml-param        = vxml-keyword "=" vxml-value

That achieves the same effect but allows all the params in any order. 
This is almost a nit, but the rigid ordering is likely to be a source of 
interop problems.

I'm confused about postbody:

    postbody:  Used to set the application/x-www-form-urlencoded encoded
       [HTML4] HTTP body for "post" requests (or is otherwise ignored).
       The postbody value is the prepared application/
       x-www-form-urlencoded content, subsequently URL-encoded (see note


    Note: Special characters in Request-URI parameter values need to be
    URL-encoded as required by the SIP URI syntax, for example '?' (%3f),
    '=' (%3d), and ';' (%3b).  The VoiceXML Media Server MUST therefore
    unescape Request-URI parameter values before making use of them or
    exposing them to running VoiceXML applications.  It is important that
    the VoiceXML Media Server only unescape the parameter values once
    since the desired VoiceXML URI value could itself be URL encoded, for
    example.  When a postbody is included, its entire content including
    any line breaks (represented by a CR LF pair) is encoded as a single
    parameter value following the above rules (such that the line breaks
    would be replaced by '%0D%0A', for example).

[HTML4] says:


    This is the default content type. Forms submitted with this content
    type must be encoded as follows:

    1. Control names and values are escaped. Space characters are
       replaced by `+', and then reserved characters are escaped as
       described in [RFC1738], section 2.2: Non-alphanumeric characters
       are replaced by `%HH', a percent sign and two hexadecimal digits
       representing the ASCII code of the character. Line breaks are
       represented as "CR LF" pairs (i.e., `%0D%0A').
    2. The control names/values are listed in the order they appear in
       the document. The name is separated from the value by `=' and
       name/value pairs are separated from each other by `&'.

The interaction between these two escaping rules seems potentially 
confusing. I *think* when this is all put together it means that the 
body must first be encoded according to the [HTML4] section above. At 
that point it will already be almost conformant to the 3261 syntax of a 
token, except for the use of '&'. Then it needs to be encoded again, 
which will take care of the ampersands, but which will re-encode the '%' 
characters of the first encoding.

Exactly what, if anything, I would recommend changing depends on whether 
I understood what is expected. I guess I might just recommend that you 
clarify further. (Perhaps I'm just being dense. If so please just tell 
me so.)

Section 2.2:

    The Application Server SHOULD insert its own URI in the Record-Route
    header so that it remains in the signaling path for subsequent
    signaling related to the session.  This is of particular importance
    for call transfers so that upstream Application Servers or proxy
    servers see signaling originating from the Application Server and not
    the VoiceXML Media Server itself.

I don't understand the purpose of the above. The SHOULD strength of this 
requirement suggests to me an assumption of a particular operating 
environment. In the general case, why should this be more than MAY strength?

Section 2.3:

IMO the use of a media-less session is an entirely valid sip usage. The 
only concern I might potentially have is if it were to catch some UA 
unaware, because some UAs just aren't prepared to handle this case. But 
the recommended usage here always puts the choice of doing this in the 
hands of *other* UA, not the media server. So I see no problem.

I do find it disconcerting that the initial invite and subsequent 
reinvites are handled in different ways. For the initial one you stall 
the VXML awaiting a media stream, but on subsequent ones you assume the 
absence of a stream is equivalent to having a stream that doesn't send 
anything. Why isn't the behavior consistent in these cases? If you need 
to support both behaviors, then it might be better to explicit and 
unique signaling for each. For instance, you might define that a media 
stream with a=inactive or a=sendonly (from the client's perspective - 
client putting media server "on hold") could be treated as the absence 
of input but the absence of the stream means you should wait for a 
stream to be negotiated.

Section 2.4:

I don't feel qualified to comment on the validity of the mappings of 
History-Info. It might be good to get somebody else who knows it well to 
comment on that.

       In addition, the array's toString() function returns the full SIP
       Request-URI.  For example, assuming a Request-URI of sip:dialog@
       example.com;voicexml=http://example.com;obj={"x":1,"y":true} then

I don't believe the above URI is valid. The ',' '{' and '}' aren't 
syntactically correct in a URI pvalue. You would need to escape them.

Section 2.6.2:

IIUC, message (1) contains no offer, (5) contains an offer with media, 
and (6) accepts the call but rejects the media. If so, then (9) will 
most likely be invalid. To make it valid, the o-line from (8) needs to 
be replaced with one consistent to that used in (6), with the version 
number incremented. And if (5) has more m-lines than (8), then (9) needs 
to be padded with extra (rejected) m-lines.

Section 5.2:

    On receipt of the REFER request, the VoiceXML Media Server MUST issue
    a provisional response, 100 Trying.  The 202 Accepted response
    indicates that the VoiceXML document has been fetched and parsed
    correctly.  The VoiceXML Media Server proceeds to place the outbound
    INVITE and will execute the application after the ACK is sent.

The rules of RFC 4320 need to be followed here. REFER is a non-invite 
transaction and so the timing of the 100 must be as specified in 4320.

In the call flow, the sending of the initial NOTIFY before the 202 for 
the REFER, and especially before determining if REFER is going to 
succeed or fail, is at best unusual and almost certainly incorrect. 
Sending the NOTIFY and then sending a failure response would certainly 
be incorrect.

I think you have two choices:

- wait until the get is complete before sending the NOTIFY, and probably 
send it after the 202.

- send a 202 for the REFER before doing the GET. Inform of a GET failure 
via a NOTIFY.

Section 6.3:

In the call flow I think you probably need another NOTIFY between 
messages (6) and (7). Its potentially too long until (13).

Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors@cs.columbia.edu for questions on current sip
Use sipping@ietf.org for new developments on the application of sip