[Sip] Re: Requested Expert Review Request: draft-ietf-mediactrl-vxml-00

Hi Paul,

Thanks a lot for the review. Mark and I have discussed each of these and
I've responded to the comments below.

Cheers,

Dave

[snip]

Section 2.1:
>
> The ABNF of the URI seems excessively rigid in the ordering of
> parameters - first the init-parameters, then the vxml-parameters, then
> then any other uri-parameters. It also doesn't note that
> 'uri-parameters' is defined in 3261.
>

DB> Good observation - ordering of the parameters was never intended. Your
proposal looks good. However, while looking over this again, we found
another issue which crept in late in the process related to the addition of
JSON support. Specifically, the problem is that it is impossible to
differentiate where vxml-params end and uri-parameters begin and hence which
parameter values to interpret as JSON values and which as plain strings. We
propose a change to limit JSON values to just the named VoiceXML session
variables 'ccxml' and 'aai' (these were the two variables for which JSON
support was originally sought).

  dialog-parameters = *(";" dialog-parameter)

  dialog-parameter  = init-param / url-parameter ; defined in [RFC 3261]

  init-param        = ";" (dialog-param /
                           maxage-param /
                           maxstale-param /
                           method-param /
                           postbody-param /
                           ccxml-param /
                           aai-param)

  dialog-param      = "voicexml=" vxml-url ; vxml-url follows the URI
                                           ; syntax defined in [RFC3986]
  maxage-param      = "maxage=" 1*DIGIT

  maxstale-param    = "maxstale=" 1*DIGIT

  method-param      = "method=" ("get" / "post")

  postbody-param    = "postbody=" token

  ccxml-param       = "ccxml=" json-value

  aai-param         = "aai=" json-value

  json-value        =  false /
                       null /
                       true /
                       object /
                       array /
                       number /
                       string ; see RFC4627

>
>
> I'm confused about postbody:
>
> [snip]

>
> The interaction between these two escaping rules seems potentially
> confusing. I *think* when this is all put together it means that the
> body must first be encoded according to the [HTML4] section above. At
> that point it will already be almost conformant to the 3261 syntax of a
> token, except for the use of '&'. Then it needs to be encoded again,
> which will take care of the ampersands, but which will re-encode the '%'
> characters of the first encoding.

[snip]

DB> Your interpretation of the intent is correct. The postbody value is an
application/x-www-form-urlencoded string which we subsequenty re-encode (the
latter step affecting each & and % character). We will clarify the text that
this is the case. Moreover, we'll be more specific and call out which
parameter values are subject to this encoding step , namely vxml-url, the
postbody token, and the json-value.

>
> Section 2.2:
>
>    The Application Server SHOULD insert its own URI in the Record-Route
>    header so that it remains in the signaling path for subsequent
>    signaling related to the session.  This is of particular importance
>    for call transfers so that upstream Application Servers or proxy
>    servers see signaling originating from the Application Server and not
>    the VoiceXML Media Server itself.
>
> I don't understand the purpose of the above. The SHOULD strength of this
> requirement suggests to me an assumption of a particular operating
> environment. In the general case, why should this be more than MAY
> strength?

DB> Actually, this paragraph is quite problematic. We shoudn't be mandating
how the AS behaves (and indeed a B2BUA doesn't Record-Route anyway). This
paragraph is also a hang over from before we recommended against using
VoiceXML transfer features with application servers. We propose to delete
it.

>
> Section 2.3:
>
> IMO the use of a media-less session is an entirely valid sip usage. The
> only concern I might potentially have is if it were to catch some UA
> unaware, because some UAs just aren't prepared to handle this case. But
> the recommended usage here always puts the choice of doing this in the
> hands of *other* UA, not the media server. So I see no problem.
>
> I do find it disconcerting that the initial invite and subsequent
> reinvites are handled in different ways. For the initial one you stall
> the VXML awaiting a media stream, but on subsequent ones you assume the
> absence of a stream is equivalent to having a stream that doesn't send
> anything. Why isn't the behavior consistent in these cases? If you need
> to support both behaviors, then it might be better to explicit and
> unique signaling for each. For instance, you might define that a media
> stream with a=inactive or a=sendonly (from the client's perspective -
> client putting media server "on hold") could be treated as the absence
> of input but the absence of the stream means you should wait for a
> stream to be negotiated.
>

DB> We're conscious of the inconsistency but we took a judgement call to
live with it. Preparing a session is hugely valuable for latency hiding.
However, to support this kind of pause/resume feature once the dialog is
executing is not practical since VoiceXML has no concept of such semantics
and it is difficult to implement especially when external speech recognition
and synthesis systems are used and operate on their own timers without
pausing capabilities.

[snip]

>       In addition, the array's toString() function returns the full SIP
>       Request-URI.  For example, assuming a Request-URI of sip:dialog@
>       example.com;voicexml=http://example.com;obj={<http://example.com;obj=%7B>"x":1,"y":true}
> then
>
> I don't believe the above URI is valid. The ',' '{' and '}' aren't
> syntactically correct in a URI pvalue. You would need to escape them.

DB> Good catch, will fix

>
> Section 2.6.2:
>
> IIUC, message (1) contains no offer, (5) contains an offer with media,
> and (6) accepts the call but rejects the media. If so, then (9) will
> most likely be invalid. To make it valid, the o-line from (8) needs to
> be replaced with one consistent to that used in (6), with the version
> number incremented. And if (5) has more m-lines than (8), then (9) needs
> to be padded with extra (rejected) m-lines.

DB> Good catch. The intent here is to that the similarly SDP messages share
the same m- and a- lines only. We propose to update the diagram and perhaps
using a prime to denote a derivative SDP message, e.g. [offer2'], and
clarifying in the text.

>
>
> Section 5.2:
>
>    On receipt of the REFER request, the VoiceXML Media Server MUST issue
>    a provisional response, 100 Trying.  The 202 Accepted response
>    indicates that the VoiceXML document has been fetched and parsed
>    correctly.  The VoiceXML Media Server proceeds to place the outbound
>    INVITE and will execute the application after the ACK is sent.
>
> The rules of RFC 4320 need to be followed here. REFER is a non-invite
> transaction and so the timing of the 100 must be as specified in 4320.
>

DB> Will fix

>
> In the call flow, the sending of the initial NOTIFY before the 202 for
> the REFER, and especially before determining if REFER is going to
> succeed or fail, is at best unusual and almost certainly incorrect.
> Sending the NOTIFY and then sending a failure response would certainly
> be incorrect.
>
> I think you have two choices:
>
> - wait until the get is complete before sending the NOTIFY, and probably
> send it after the 202.
>
> - send a 202 for the REFER before doing the GET. Inform of a GET failure
> via a NOTIFY.
>

DB> There is currently an ongoing thread on the list about whether we remove
this feature (and hence section) altogether. For the sake of argument, I'll
assume it stays, in which case we think option 2 is the most conventional
(especially since most systems responds to the REFER instantaneously). A GET
failure would be conveyed by a NOTIFY with a 500 Server Internal Failure
sipfrag message (same status used for the INVITE case).

>
> Section 6.3:
>
> In the call flow I think you probably need another NOTIFY between
> messages (6) and (7). Its potentially too long until (13).

DB> This is correct but we note in the text "(provisional responses and
NOTIFY
   messages corresponding to provisional responses have been omitted for
   clarity)"