Re: [Speechsc] AD review of draft-ietf-speechsc-mrcpv2-20

Arsen Chaloyan <achaloyan@yahoo.com> Wed, 30 September 2009 17:05 UTC

Return-Path: <achaloyan@yahoo.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E6C9C3A6934 for <speechsc@core3.amsl.com>; Wed, 30 Sep 2009 10:05:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.264
X-Spam-Level:
X-Spam-Status: No, score=-2.264 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, IP_NOT_FRIENDLY=0.334]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gsqQPDHPpOqB for <speechsc@core3.amsl.com>; Wed, 30 Sep 2009 10:05:26 -0700 (PDT)
Received: from web111310.mail.gq1.yahoo.com (web111310.mail.gq1.yahoo.com [67.195.15.185]) by core3.amsl.com (Postfix) with SMTP id B0D843A6970 for <speechsc@ietf.org>; Wed, 30 Sep 2009 10:05:26 -0700 (PDT)
Received: (qmail 17057 invoked by uid 60001); 30 Sep 2009 17:06:48 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1254330408; bh=qA65PDIPYEgiiST7qhuTyxAddZ+kFSiLxsEM0dVED58=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=pbNaSrdxt2brukKjrvaSna3bZBNf0s35fni5j+LVI/DEvhW0K8CJ60sRLQFyu7gdbP+5mdLN6fe8O5zF/PxRn70lcRiUaBDQ3hjx0JYOYEZqDpRUP3kU0tEwEU/ewjQYeOg5WqgrdShI6Z/+UwNYCcFfc5YordLOczRZMjUGJSg=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=ge+cEHDtKjoUEUKPnzPHGzLbdTtLB1F86wo9MPCyPh5TDPXAWGUg+6tHO1tipG9Z5fKf8HtpbXKNYtEFolBrd5dlbwTIShFX5j9GlBt49ClnevMGyG74esrCLcUUv31rjDGayrDQP2Gs629JJ2RnJzEuVf0HZJUPtC1YRAPoRWw=;
Message-ID: <463519.16867.qm@web111310.mail.gq1.yahoo.com>
X-YMail-OSG: _QZ1gjcVM1k9wF_DscdhwyFLK6FWBkqp0wbI3HhymXv46lmMoJRuFmVp9b6r0eCzR0LGSMGNdDTMcUhSl178V7vEALuXZcHWxEulhvstoRgGDsrVG1GIiuKP6CSVbUBKbYrDkeJx.FyyW2vGCvU_do91FB7i5_4fP3LjgUy7iRk3C4nNBfNk7bq.IkEy1geSUHrZcRUd4BG.dbHAQY2REXcizRoZ8Li9WXw75gKcxjW7uMqayNmPrOz1De4WEEHg2gmVtxgtJh1_vSx.eZVlv3SZZLVLIB.PmctNP.zZ7gvE0abkQ0.RrKeJnkpMxA--
Received: from [87.241.189.106] by web111310.mail.gq1.yahoo.com via HTTP; Wed, 30 Sep 2009 10:06:47 PDT
X-Mailer: YahooMailRC/157.18 YahooMailWebService/0.7.347.3
References: <862ADFEF-C942-4945-8252-48BE7A7D420F@nostrum.com>
Date: Wed, 30 Sep 2009 10:06:47 -0700
From: Arsen Chaloyan <achaloyan@yahoo.com>
To: Robert Sparks <rjsparks@nostrum.com>, speechsc@ietf.org
In-Reply-To: <862ADFEF-C942-4945-8252-48BE7A7D420F@nostrum.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="0-1258817033-1254330407=:16867"
Cc: draft-ietf-speechsc-mrcpv2@tools.ietf.org, speechsc-chairs@tools.ietf.org
Subject: Re: [Speechsc] AD review of draft-ietf-speechsc-mrcpv2-20
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Sep 2009 17:05:29 -0000

Hi Robert,

It'd be great to have all the mentioned items clarified.
Please let me just add a few more items.

A. Media Type  (Sections 9.4.9, 10.4.8, 11.4.11)

> Media Type in which to store captured audio or video such as the one captured and returned by the Waveform-URI header.

Probably, it would be reasonable to have list of mandatory and optional media types for both audio and video.
I have noticed .wav is used in examples, also there is a reference to sun audio format in Section 8.5.1

B. Mapping of audio, video and control channels
There is no usage examples for video. Also it's not clarified how an MRCP control channel such as Recorder should be associated with audio and video streams. Probably, one should consider, that a control channel should have two "cmid" attributes, one for audio stream and another for video...


C. Yet another very small typo, I have just accidentally noticed (Section 9.8)

Find "applicationt/x-nlsml" and replace with "application/x-nlsml"


Thanks,
-- 
Arsen Chaloyan
The author of UniMRCP 
http://www.unimrcp.org



________________________________
From: Robert Sparks <rjsparks@nostrum.com>
To: speechsc@ietf.org
Cc: draft-ietf-speechsc-mrcpv2@tools.ietf.org; speechsc-chairs@tools.ietf.org
Sent: Tuesday, September 29, 2009 8:06:47 PM
Subject: [Speechsc] AD review of draft-ietf-speechsc-mrcpv2-20

Hi Folks -

I'm working on moving MRCPv2 along. I've found several things so far
that I'd like to discuss and/or have the document address before we take
the document into IETF last call.

This is a large and complex document. Apologies that my review has taken so long.

After talking with Eric and Dave, I'm sending these all in one message instead
of splitting them into several threads at the beginning. When you reply to a particular
point, it would be useful to me if you adjusted the subject line to indicate which point
you are replying to.

These are not listed in any particular order. Nits are grouped at the end.

Thanks!

RjS
----------------------------------------------------------------------------------------------
(The following apply to revision -20)

1 The Introduction points to RFC4313 for a discussion on why MRCPv2
  does not use RTSP and details on alternatives, but I don't find that
  discussion in 4313. Was that discussion captured somewhere? If so,
  please point to that. Otherwise, modify this text.

2 The SIP examples throughout the draft need to be adjusted to reflect
  correct syntax and intended use. There are several aspects of the SIP
  messages, in particular, that are currently in error. Consider
  showing partial SIP headers focusing only on what's important to the
  example as an alternative to showing full messages (that will have to
  be carefully reviewed). Some examples of issues that need to be
  corrected (this is not exhaustive)
    2.1 Several responses are missing "received=" in their Via header
      fields
    2.2 The o= line in answers (as in offer/answer) must be different
      from the o= line in the offer.
    2.3 The branch parameter values need to be reviewed very carefully -
      the first example incorrectly reuses the branch from the INVITE
      in an ACK to a 200 OK. Then the _next transaction_ also reuses
      the branch.
    2.4 There is a to-tag in the OPTIONS request on page 44

3 The MRCP examples need to be similarly reviewed
    3.1 are all the content-lengths correct? (I think the 2nd message's
      on page 59 isn't)
    3.2 It's ok, probably even a good idea, to elide values that are not
      important to understanding an example, but please be consistent -
      the first message on page 65, for example, has an explicit
      (probably incorrect) MRCP length, but elided the mime-body length
    3.3 The example in section 9.14 shows two RECOGNITION-COMPLETE
      messages to the same RECOGNIZE. Were these intended to show two
      alternate possible responses? If so, the document should make
      that more clear.

4 Returning a SIP 501 at the end of section 4.3 is not the right thing
  to do. 501 means the responding element does not implement the
  method. You are probably looking for 488 Not Acceptable Here.

5 This needs to be run through an ABNF checker. There are production
  rules and terminals missing - they either need to be defined or
  pointers to where they are defined need to be added.

6 The document occasionally mentions an MRCP proxy (there is a 503
  Proxy Timeout code even), but I can't find where such proxies are
  defined? Page 32 also talks about intermediaries.

7 Some additional discussion around connection establishment and
  sharing/reuse is probably needed
    7.1 Where does an element look in a peer's certificate to determine
      it's reached the peer it has intended to reach?
    7.2 What happens if a connection gets closed?
    7.3 Must events come over the connection the request was sent on?
    7.4 There should be some guidance on only reusing connections when
      the identity of the peer matches what was confirmed when the
      connection was opened. (Specifically, if it's possible for an
      MRCPv2 server to host services for more than one domain, you
      don't want to blindly reuse the connection you made to talk to A
      to talk to B just because DNS aimed you to the same address/port
      to reach them.)

8 Section 6.1.2 should be explicit about what it means by "empty header
  field"

9 With respect to the URI indirection mechanisms defined in the draft:
    9.1 Much of the text assumes these URIs will be HTTP/HTTPS. But other
      parts of the text, and the syntax goes out of the way to allow
      arbitrary URI types. Please help look for places where the
      recommendations and requirements stated only make sense when the
      URI is HTTP or HTTPS.
    9.2 There's currently no discussion about authenticating the
      requester seeking access to the resource pointed to by one of
      these URIs. Security considerations should call out that if the
      URI leaks, the content leaks. There should probably also be more
      explicit discussion of how long a server should be expected to
      hold onto the state indicated by such a URI (how long can a
      client expect it to be there, and when does a server decide a
      client or set of clients is mounting a state exhaustion attack?),
      whether it should allow multiple accesses from a single client,
      whether it should allow accesses from multiple clients, and what
      it means to a client if the attempt to access the resource fails.

10 Why is there both a "Fetch Hint" and a "Audio Fetch Hint". Why does
  the syntax allow for extensibility in the values for those fields?

11 On page 111, the document talks about timing between audio flows and
  RECOGNIZE methods. It claims there are "a number of mechanisms" for
  dealing with the race conditions. Would it be possible to list a few
  of these as informative examples? You might also consider pointing
  out that the delta between the start of an audio flow (or the point
  in an ongoing stream that you intended to start RECOGNIZEing) and the
  receipt of a RECOGNIZE command could be quite large if TCP is
  reacting to congestion. The prohibition at the end of the paragraph
  ("MUST NOT buffer anything it receives beforehand.") seems odd.
  What's the rationale for it? Finally - did the group consider
  indicating RTP timestamps in the RECOGNIZE request to indicate where
  to start recognition as one of the mechanisms pointed to above?

12 Why is the record semantic defined in 10.4.7 different from the one
  in 9.4.8/9.4.22 (specifically, by providing a way to request a server
  store something somewhere other than on that server)? Why does this
  section allow an arbitrary URI scheme to be passed in here? What is
  an implementation supposed to do if it doesn't know the scheme? What
  does it do if attempts to use a URI with a scheme it recognizes
  results in failure? The security considerations section should
  discuss how this might be abused by providing a URI that points at a
  victim.

13 What should an element do if it receives a status code that it
  doesn't recognize? If that's not already specified in the document,
  it should be added.

14 Consider additional clarification around "Note that "GET-PARAMS"
  returns header values that apply to the whole session and not values
  that have a request level scope."

15 How are parameters like "Confidence Threshold" and "Sensitivity
  Level" interoperable? Would you expect .5 to mean the same thing to
  two different implementations? I'm guessing that the intent is that
  the server gets to interpret these values in an
  implementation-specific way, and the utility of these knobs is that
  you tune them over time to a given server. If that's right, the text
  should explicitly point that out.

16 Something I'm still trying to think through and would like other
  folks to comment on - apologies if I've missed where this is treated
  already: Can a server ever issue a reINVITE affecting an MRCPv2
  session (to change codecs for example)? If so, are there any places
  in the text that need to call that out?

17 On page 15, there's a requirement that "There MUST be one SDP m-line
  for each MRCPv2 resource to be used in the session. " This looks like
  it would prevent offering things like alternates, v4 and v6, etc. Is
  this what's intended?

18 Nits
    18.1 Section 3 paragraph 2 sentence 1: SIP is not the "session
      management protocol"
    18.2 The word "pipe" is used ("control pipe", "audio pipe" for
      example) with no definition and there are well-defined terms that
      could be used instead.
    18.3 Paragraph spanning pages 15 and 16 - I suggest explicitly noting
      that the reINVITE receives an error response.
    18.4 There is an unnatural break in the flow of the prose on page 16
      when the text shifts from an overview of the protocol to giving
      an example. Suggest breaking the example into a subsection to
      make it clear what you're intending.
    18.5 Please use the terms "header" and "header
      field" consistently and align the use of those phrases with
      the definitions in section 2.1 of RFC 5322.
    18.6 The conditional language in 6.1.1 is hard to follow. In
      particular, the paragraph starting "If both error 404 and
      another" is awkward. Please consider clarifying these clauses.
    18.7 typo on page 43: "veriifcation"
    18.8 The string "ECMAScript" is used once with no definition.
    18.9 The term "kill-on-barge-in" is used without any definition.
      Please add a reference or a definition.
    18.10 Page 121 says: "The Personal-Grammar-URI,"..."is created"... . I
      think you meant to say the resource indicated by that URI is
      created.
    18.11 Consider using "octet" for "byte". In places where you are
      describing lengths, consider talking about whether leading 0s
      have meaning (it would probably be good to explicitly call out
      that you don't want such a string to be interpreted base-8).

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www.ietf.org/mailman/listinfo/speechsc
Supplemental web site:
<http://www.standardstrack.com/ietf/speechsc>