[Gen-art] Gen-ART review of draft-ietf-speechsc-mrcpv2-24.txt

"Miguel A. Garcia" <Miguel.A.Garcia@ericsson.com> Tue, 03 May 2011 06:39 UTC

Return-Path: <miguel.a.garcia@ericsson.com>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id AF1BBE073E for <gen-art@ietfa.amsl.com>; Mon, 2 May 2011 23:39:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.277
X-Spam-Status: No, score=-6.277 tagged_above=-999 required=5 tests=[AWL=-0.278, BAYES_00=-2.599, J_CHICKENPOX_15=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id L8EY-49x9fOR for <gen-art@ietfa.amsl.com>; Mon, 2 May 2011 23:39:40 -0700 (PDT)
Received: from mailgw9.se.ericsson.net (mailgw9.se.ericsson.net []) by ietfa.amsl.com (Postfix) with ESMTP id 864ACE06E1 for <gen-art@ietf.org>; Mon, 2 May 2011 23:39:38 -0700 (PDT)
X-AuditID: c1b4fb39-b7cc5ae000006f6d-d4-4dbfa3291093
Received: from esessmw0197.eemea.ericsson.se (Unknown_Domain []) by mailgw9.se.ericsson.net (Symantec Mail Security) with SMTP id FF.CC.28525.923AFBD4; Tue, 3 May 2011 08:39:37 +0200 (CEST)
Received: from [] ( by esessmw0197.eemea.ericsson.se ( with Microsoft SMTP Server id; Tue, 3 May 2011 08:39:36 +0200
Message-ID: <4DBFA327.7070404@ericsson.com>
Date: Tue, 03 May 2011 08:39:35 +0200
From: "Miguel A. Garcia" <Miguel.A.Garcia@ericsson.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/20110414 Thunderbird/3.1.10
MIME-Version: 1.0
To: Dan Burnett <dburnett@voxeo.com>, Saravanan Shanmugham <sarvi@cisco.com>, Eric Burger <eburger@standardstrack.com>, Dave Oran <oran@cisco.com>, Robert Sparks <rjsparks@nostrum.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Brightmail-Tracker: AAAAAA==
Cc: General Area Review Team <gen-art@ietf.org>
Subject: [Gen-art] Gen-ART review of draft-ietf-speechsc-mrcpv2-24.txt
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/gen-art>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 06:39:41 -0000

I have been selected as the General Area Review Team (Gen-ART)
reviewer for this draft. For background on Gen-ART, please see the FAQ at

Please resolve these comments along with any other comments you may receive.

Document: draft-ietf-speechsc-mrcpv2-24.txt
Reviewer: Miguel Garcia <miguel.a.garcia@ericsson.com>
Review Date: 2011-05-03
IETF LC End Date: 2011-04-13

Summary: The document is on the right track, but has some issues that 
should be addressed before publication as a standards track RFC.

Major issues: none

Minor issues:

- Section 4.2 reads:

    To remain backwards compatible with conventional SDP usage,
    the format field of the m-line MUST have the arbitrarily-selected
    value of "1".

The comment is that in other protocols, for example, MSRP [RFC4975] it
has been selected to use an asterisk "*" in the format field. I wonder if
it is possible to unify criteria, in order to allow usage of conventional
SDP parsers.

- Although at the general level the draft says that it reuses the SDP
offer/answer model, I think it would be good to explicitly mention it,
just explicitly indicate that "we aren't doing anything new here". For
example, in Section 4.2, the text reads

    When the client wants to add a media processing resource to the
    session, it issues a SIP re-INVITE transaction.

and I think it would be better to say:

    When the client wants to add a media processing resource to the
    session, it issues a new SDP offer, according to the procedures of
    RFC 3264 [RFC3264], in a SIP re-INVITE request.

Also, on page 17, the text reads:

    When the client wants to de-allocate the resource from this session,
    it issues a SIP re-INVITE transaction with the server.  The SDP MUST
    offer the control m-line with port 0.

It could be completed with:

    When the client wants to de-allocate the resource from this session,
    it issues a new SDP offer, according to RFC 3264 [RFC3264], where the
    control m-line port MUST be set to 0. This SDP offer is sent in a SIP
    re-INVITE request.

- Section 4.2 describes the usage of the a=setup attribute, but it does
not clarify if setting up MRCP sessions MUST always use the a=setup
attribute or not. I think the answer is yes, at least for TCP or TLS
transports (those defined in this version of the draft). But there should
be a "MUST use the 'setup' attribute as per RFC 4145" somewhere. I
recommend to add the following sentence at the beginning of the last
paragraph in page 16:

    MRCVv2 clients and servers using TCP as a transport protocol MUST use
the procedures specified in RFC 4145 [RFC4145] for setting up the TCP
connection, with the considerations described hereby. Similarly,  MRCVv2
clients and servers using TCP/TLS as a transport protocol MUST use the
procedures specified in RFC 4572 [RFC4572] for setting up the TLS
connection, with the considerations described hereby. The 'setup
attribute, as described ... [continue with the current text]

- Section 5.1. The ABNF of "mrcp-version", at the top of page 26, comes
out of the blue: there is no "mrcp-version" in the ABNF on page 25, so,
it is difficult to find out how to put "mrcp-version" into context.

- Section 5.4, in the client failure table, the description of the 407
contains a normative text "MAY BE". I think these tables are an
informative summary. All the normative text should be written in detail
elsewhere in the document. You can safely turn them lowercase.

- On Section 8.5.1 the spec talks about using the multipart/mixed Media 
Type. It is not clear to me if implementation of multipart/mixed is a 
MUST or a MAY that is negotiated.

- On Section 8.6, first paragraph, the text reads:

   The SPEAK" method can carry voice and prosody....

I guess the "can" should be replaced by a normative "MAY".

- On Sections 9.4.9, 10.4.8, and 11.4.11, there is no indication of the 
possible values of the "media-type-value". I guess you want to say that 
these are intended to be MIME types (so far, the word "MIME type" is not 
written), and in that case, you should also say that possible values are 
any of the values included in the MIME media types registry maintained by 

- On Sections 9.4.11, 10.4.3, and 11.4.16, the ABNF merely reads:

     completion-cause       = "Completion-Cause" ":" 3DIGIT SP
                                               1*VCHAR CRLF

  I am assuming that when the ABNF refers to "3DIGIT", these are supposed 
to be any of the values included in the "Cause-Code" column of the table 
in the same section. Similarly, I am assuming that the "1*VCHAR" should 
include any of the values of the "Cause-Name" column in the same table. I 
think this should be spelled out. I suggest to change the ABNF for this one:

     completion-cause       = "Completion-Cause" ":" cause-code SP
                                              cause-name CRLF
     cause-code                 = 3DIGIT
     cause-name                = *VCHAR

And then do the missing mapping with a text similar to:

"The 'cause-code' contains a numerical value selected from the Cause-Code 
column of the following table. The 'cause-name' contains the 
corresponding token selected from the Cause-Name column.

Note that the resolution of this issue should be also applied to Sections 
10.4.3 and 11.4.16.

- Question: In Sections 9.4.21 and 10.4.6, should "1*UTFCHAR" be included 
in quotes? It is true that there is no other subfield in the same header, 
meaning that there is no intention to parse the text. But somehow I feel 
safer if you enclosed the text in quotes. Also, remember that this text 
is coming from other protocol beyond your control, so you never know, for 
example, if the other protocol is going to add CRLF or something weird 
that will crash the recipient of the header.

Note that the resolution of this issue should also be applied to Section 

- On Section 9.4.26, the ABNF reads:

    recognition-mode   = "Recognition-Mode" ":" 1*ALPHA CRLF

However, there is only to values for choice, "normal" and "hotword". I 
think the following ABNF represents better that there is only two 
possible values:

   recognition-mode   = "Recognition-Mode" ":"
                                         normal-value / hotword-value
   normal-value            = "normal"
   hotword-value          = "hotword"

- Section 9.4.40. I wonder why the Phrase-NL is defined as UTFCHAR, while 
most of the other headers are defined as VCHAR

- Section 9.9, third paragraph, the text reads:

    "If the client needs to explicitly control grammar weights for the 
recognition operation, it must employ method 3 below. "

   I guess the "must" should be a "MUST".

- Section 9.9, the paragraph that goes after bullet point number "3", the 
text reads:

   "In addition to performing recognition on the input, the recognizer 
may also enroll the collected utterance in a personal grammar if the ..."

   Here the "may" should be a "MAY".

- Section 9.9, bullet point number 1 on page 114:

   "the recognizer must complete ...."

   I guess the "must" should be a "MUST". There is another instance with 
the same text on bullet point number 1 on page 115.

- Section 9.9, bullet point number 2 on page 115:

   "the RECOGNIZE completes with ..."

   I guess it should be normative: "the RECOGNIZE MUST complete with ..."

- Section 9.17, the text reads:

   "The END-PHRASE-ENROLLMENT method may be called ONLY during an active 
phrase enrollment session. "

   Two issues: First, the "may" should be a "MAY". Second, I guess the 
readability is increased if the sentence is turned into active. What 
about this:

   "The client MAY call the END-PHRASE-ENROLLMENT method ONLY during ..."

- Section 9.17, the text reads:

   "... the client can abort ..."

   I guess the "can" should be a "MAY".

- Section 9.17, second paragraph. The text should clearly indicate the 
name of header where the 'location/URI' should be included.

- In Section 10.4.7, I would like to clarify that if the record-uri is 
not sent in the header, then the actual audio should be sent as a MIME 
body. I would suggest to replace:

If this header field is not specified in the RECORD request, the server 
MUST capture the audio and send it in the "STOP" response or the RECORD- 
COMPLETE event as a message body.

with this text:

If this header field is not specified in the RECORD request, the server 
MUST capture the audio, MUST encode it as a MIME body, and MUST send it 
in the "STOP" response or the RECORD- COMPLETE event.

- In Section 10.4.7, I believe you should write normative text at the end 
of paragraphs 1 and in paragraph 3. Also, in paragraph 3, you need to add 
references to all the valid schemes and add the 'cid' scheme. And perhaps 
write all the scheme names in lowercase.

                                                  In this case, the response
  carrying the audio content MUST include a Content ID (cid) [RFC2392] 
value in this header
  pointing to the Content-ID in the message body.

And in the the third paragraph:

Implementations MUST only use 'http', 'https' [RFC2616], 'file' 
[RFC3986], and 'cid' [RFC2392] schemes in the URI. Other URI schemes MUST 
NOT be used. Note, however, that implementations already exist that 
support other schemes.

- Section 10.5. Perhaps a reference to Section 10.4.7 should be added, 
because the details are described in 10.4.7.

- Sections 4.2 and 10.6. The text says that if physical security is 
provided, one can void TLS and merely used TCP. I have the feeling that 
the connection of these two concepts (physical security and lack of TLS) 
is not sufficiently justified for the security experts. One could access 
the resources of the MRCP server remotely, via a TCP connection, not 
using TLS. Or even sniff the network. So, physical security is not enough 
for voiding TLS.

- Section 11.4.2. I am missing more normative strength text here. For 
   + a couple of occurrences of "is required" should be set to "MUST be 
   + "may" should be "MAY"
   + "can" should be "MAY"

- Section I am missing more normative strength text. In 
particular, a couple of occurrences of "must" in the first paragraph 
should be "MUST".

- Section 11.6, 7th paragraph. I am missing more normative strength text. 
When the text say:
   "... operations may be performed on the verifier resource". The "may" 
should be a "MAY".

- Section 12.1, bullet point 1. Although the text says that clients and 
servers MUST support digest authentication, I think they SHOULD use it 
(isn't that the intention). In particular, the server SHOULD authenticate 
the client using SIP digest authentication. Also, a reference to RFC 3261 
should be added here.

- Section 12.1, bullet point 2. I agree that clients and servers SHOULD 
employ 'sips' URIs, but I guess the 'sips' is just the format of the 
underlying TLS security. I guess you should add "including that clients 
and servers SHOULD setup up TLS [ref] connections".

- Section 12.3. The last word of this section, "recommended", should be 
normative: "RECOMMENDED".

- Section 13.1.6 describes a mechanism where vendor-specific extensions 
use the reverse DNS mechanism, for example., "com.example.foo". Then, if 
the vendor-specific extension is connected to DNS to avoid clashes in 
names, why is there a need for an expert review policy prior to its 
registration? I see a contradiction in having a self-managing registry by 
avoiding clashes due to the connection to DNS, and then having anything 
else than a volunteer registry.

- Section 13.7.2 registers new SDP "att-field" at the *session-level* (at 
least according to the title in Section 13.7.2. However, the text of both 
registers reads "media-level" in the type of attribute. So, are these two 
session-level or media-level SDP attributes? I think they are 
media-level, in which case, they both should be written in Section 
13.7.3, which is the SDP "att-field" registration at the media-level.

- I think it is not correct to have normative text (capitalized) in the 
examples, because examples are informative by nature. The text around the 
examples should describe what is happening. If there is a need for 
normative text, then it should be already written elsewhere, and if it 
isn't, it should be written in the normative sections, but not in the 
examples. So, I found the following instances of normative text in 
Section 14:
   + RECOMMENDED on page 186
   + A couple of MUST on page 187
   + MUST on page 189

- Sections 16.3 and 16.3. If these are schema definitions, then why don't 
they start with:
   <?xml version ="1.0" encoding="UTF-8"?>
   and the namespace definition, like any other XML schema document?


- Expand acronyms at first occurrence. This includes: SSML, NLSML, VXML, RTSP

- General to all the examples containing SDP. In Section 4.3. I think the 
"o=" lines are not correct, in particular, the IP address in the o= line 
is set to "" in both the offer and the answer. So, the first 
thing is that it should be a different value in the offer and the answer. 
The second thing is that it should typically be equal to the IP address 
that we see in the c= line, unless you have a very good reason for it. 
Since it is not the goal of the spec to deal with these rare use cases, I 
would recommend to set the IP address in the o= line to become equal to 
that in the c= line.

This  also applies to the example in Section 7 and the examples in 
Section 14.

- General to all the examples containing SDP. According to RFC 3264 you 
must include a "t=0 0" line in all the SDP offers and answers.

- General to all the examples containing SDP. There is an "s=" line that 
contains text. According to RFC 3264, it is RECOMMENDED that it contains 
a white space or a dash "-". If you want to add a subject, the SIP 
Subject header field serves the purpose.

- Section 5, add a formal reference to ISO 8859-1 when the text mentions it.

- It would be good to have numbers in those tables in Section 5.4. I
mean, there should be a caption saying "Table 1: Success 2xx response codes".

- In Sections 9.4.21 and 10.4.6, add a formal reference to RFC3629 when 
you name "UTF-8".

- Section, 1st bullet point:
   s/contains an float/contains a float

- Section, add a formal reference to ISO 8601 when the text 
mentions it.

- Section 9.9, bullet point numbered "3". Add a reference to the document 
that defines the  "3C grammar weights"

- Section 9.9, the first paragraph on page 114,
  s/The No-Input-Timer MUST BE started/The No-Input-Timer MUST be started

- Section 10, add a formal reference to RFC 2326 when mentioning RTSP.

- Around Section 10.4.7, it would be good to write a simple example 
showing a message body that includes the audio. This is to see an example 
of a 'cid' URI scheme.

- General: Names of URI schemes should be written in lowercase, for 
example: 'https' URI scheme. There is an instance in Section 10.6, page 
135. Other instances of "SIPS:" are in Section 12.1

- Starting in Section 10.6, but with many occurrences after that, there 
is a message from the Server to the Client which is written as 
"MRCP/2/0", where the second "/" should actually be a ".", such as: 

- Sections 9.20 and 9.21. The third message of the example starts with 
this line:


The request-id value is 543267. However, the request-id of the client is 
543266. I guess this request-id should also be 543266.

- Section 10.6, 10.7, 10.8. The request-id of the request is set to 
543257. However, the request-id of the following responses are all set 
to 456234. I guess they should all be set to the same value.

- Section 10.8, page 137, first paragraph:
   s/audio ./audio.

- Section 11, fourth paragraph:

- Section 11.1. The request-id of the fifth message is set to 314164, 
however, it should be set to 314162, because it is a response to the 
forth message.

- Section 11.5.1. There is a closing bracket ")" at the end of the first 
sentence, for which there isn't an opening bracket.

- Section 13.1.1. The registry of the set-cookie and set-cookie2 point to 
the wrong RFC. They should point to RFCs 2109 and 2965, respectively.

- Section 13.1.3. I would suggest to split this long registry and create 
five separate registries, one per resource type and the Generic. This 
simplifies if in the future someone wants to add a new, e.g., Synthesizer 
header. With the current registry, IANA will add the new at the end of 
the registry, after the last header of the Verifier, and most people will 
miss this new hypothetical Synthesizer header. If you create five 
different registries, then the problem will be solved.

- Sections 13.7.1 and 13.7.2. There are two tokens being registered in 
each Section. Can you add an empty line in between these two 
registrations (within the same Section) for the sake of readability?

- Section 14.1. The example at the top of page 182, the "m=" line reads:

   m=audio 49170 RTP/AVP 0 96

   However, there is no "a=rtpmap:96" line and an "a=fmtp:96" describing 
what the media type "96" is. I guess 96 represents a telephone event, in 
which case, you should add:

   a=rtpmap:96 telephone-event/8000
   a=fmtp:96 0-15

- Next to last paragraph on page 187, the text reads "... turn around and 
issued a ...".

- First sentence on page 189:
   s/a SIP BYE/a SIP BYE request

- Idnits reveals:
  ** Obsolete normative reference: RFC 3388 (Obsoleted by RFC 5888)
  ** Obsolete normative reference: RFC 2109 (Obsoleted by RFC 2965)
  ** Obsolete normative reference: RFC 4646 (Obsoleted by RFC 5646)
  ** Downref: Normative reference to an Experimental RFC: RFC 2483
  -- Obsolete informational reference (is this intentional?): RFC 2234 
(Obsoleted by RFC 4234)


       Miguel Garcia
Miguel A. Garcia
Ericsson Spain