Re: [Speechsc] RAI review of draft-ietf-speechsc-mrcpv2-19
Dan Burnett <dburnett@voxeo.com> Tue, 29 December 2009 11:01 UTC
Return-Path: <dburnett@voxeo.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C68B23A680B; Tue, 29 Dec 2009 03:01:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.301
X-Spam-Level:
X-Spam-Status: No, score=0.301 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_50=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_16=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zo95H47v-TAY; Tue, 29 Dec 2009 03:01:20 -0800 (PST)
Received: from voxeo.com (mmail.voxeo.com [66.193.54.208]) by core3.amsl.com (Postfix) with ESMTP id D3D5A3A6844; Tue, 29 Dec 2009 03:01:19 -0800 (PST)
Received: from [71.204.33.81] (account dburnett HELO [192.168.15.111]) by voxeo.com (CommuniGate Pro SMTP 5.2.3) with ESMTPSA id 55101526; Tue, 29 Dec 2009 11:00:52 +0000
Message-Id: <C46B7F31-9989-442C-B2F1-CA77E79F04F8@voxeo.com>
From: Dan Burnett <dburnett@voxeo.com>
To: Roni Even <Even.roni@huawei.com>
In-Reply-To: <027801ca1b1c$c2e8ee80$48bacb80$%roni@huawei.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-40-309409471"
Mime-Version: 1.0 (Apple Message framework v936)
Date: Tue, 29 Dec 2009 06:00:50 -0500
References: <033101c9ff3a$cbe33160$63a99420$%roni@huawei.com> <E2C626B8-8CA1-4A1D-A2CE-B6AB4B269DEE@voxeo.com> <027801ca1b1c$c2e8ee80$48bacb80$%roni@huawei.com>
X-Mailer: Apple Mail (2.936)
X-Mailman-Approved-At: Tue, 29 Dec 2009 03:37:58 -0800
Cc: speechsc@ietf.org, sarvi@cisco.com, oran@cisco.com, rai@ietf.org
Subject: Re: [Speechsc] RAI review of draft-ietf-speechsc-mrcpv2-19
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Dec 2009 11:01:25 -0000
Hi Roni, Just to finish up on your last comments . . . -- dan On Aug 12, 2009, at 3:15 AM, Roni Even wrote: > Hi Dan, > I understand your explanation about all these "vendor specific" > parameter. I think that since this a standard track document there > should be some text explaining the usage of these parameters as well > as making a note that since these are vendor specific information > you cannot compare the values coming from different vendors Thank you. I will note this in the next draft and suggest how these parameters may be used in light of their vendor dependence. > > > As for my comment number 5 on payload type 96. My comment was that > if the m-line has a payload type number of 96 you must have a > a=rtpmap line mapping 96 to a specific subtype name while for pcmu > it is not mandatory to have a=rtpmap like you have in your examples > since payload type number 0 is a static payload type number assigned > to pcmu > I'm sorry, I did not explain this very well. I understood your comment. My reply was that of the three examples, example 2 did actually provide the a=rtpmap line for 96. Since the payload type of 96 should not even have been included in the first and third examples, once I removed it from those two examples all three contained the proper a=rtpmap lines. Although not necessary to have an a=rtpmap line for payload type 0, others in the past had requested it so I left it in. > > Roni Even > > From: Dan Burnett [mailto:dburnett@voxeo.com] > Sent: Tuesday, August 11, 2009 9:22 PM > To: Roni Even > Cc: sarvi@cisco.com; oran@cisco.com; 'Eric Burger'; > speechsc@ietf.org; rai@ietf.org > Subject: Re: RAI review of draft-ietf-speechsc-mrcpv2-19 > > > On Jul 7, 2009, at 3:40 PM, Roni Even wrote: > > > Hi, > > I was assigned to do a RAI review of the draft. The draft looks > ready for publication to me. I have some comments mostly editorial. > > The only issue I see that is not pure editorial is the issue of the > different parameters like confidence threshold, sensitivity level > (see comments 11, 13, 15, 16 and 17). I think that some > clarification on the semantics and the scale (for example are the > values linearly spaced) as well as when they are useful will be > helpful to implementers. > > 1. In figure 1 Expand the abbreviations TTS, ASR, SV , SI and > how they are related to the media resource types in 3.1 > > > Done. Added some text explaining Figure 1 and enhanced Figure 1 > slightly for clarification. > > 2. In figure 1 there is a SIP dialog between the MRCPv2 client > and the media source/sink, what is this dialog, I only saw in > section 4 a dialog between the client and server. > > Clarified in the first example of section 4.2 that the SIP dialog > with the media source/sink is not shown. > 3. In section 3.2 you have “For example: > sip:mrcpv2@example.net” twice one after the other. > > Fixed. > > > 4. In the example in section 4.2 you “a=cmid:1”, cmid is > specified later in the document so maybe you can add some reference > to where it is specified > > Done. > > > > 5. In the example is section 4.2 and in following examples you > have “m=audio 49170 RTP/AVP 0 96” but do not have an rtpmap > parameter for mapping 96 (dynamic payload type number) to a media > encoding name. > > It is not in the first or third examples (Synthesizer only), but it > is in the second example (Recognizer). I have removed 96 as an > option for the Synthesizer-only examples but let it remain as an > addition for the Recognizer example. > > > > 6. In section 4.3 “Also note that more that one media session > can be associated with a single resource if need be, but this > scenario is not useful for the current set of resources”. There is a > typo the second “that” should be “than”. I am also not sure if the > current syntax in this document can support the mode. > > Fixed the typo. > > > > 7. In section 4.3 “The formatting of the"cmid" attribute in > SDP RFC3388 [RFC4566]”. I think you meant SDP grouping and need the > reference to RFC 3388. > > I removed the reference altogether because it already exists > (correctly) earlier in the paragraph. > > > > 8. In section 5.1 “The message-length field specifies the > length of the message, including the start-line” is the length in > Bytes, there is no unit specified. > > Changed "length of the message" to "length of the message in bytes". > > > > 9. In section 6.3.1, typo you have “Verfication “ instead of > verification. It appears twice in the section. > > Fixed. > > > > 10. In the example in section 7 you have “m=audio 0 RTP/AVP 0 1 3” > payload type 1 was deleted from the IANA registry, maybe have > another payload type number. > > I just removed that payload type. It is not germane to the example. > > > > 11. In section 9.4.1, 9.4.2 and 9.4.3 you specify confidence > threshold, sensitivity level and speed vs accuracy. What is the > scale here; is it linear between 0 and 1. What is the absolute value > of the number, if you receive the same confidence level from two > recognizers are they the same (e.g. when using context block to > switch servers). For the speed vs accuracy, how does the client > know what is the relation between the value and the number of > available sessions, since this seems to be the reason for using this > parameter. > > The interpretation of all of these parameters is implementation- > specific because the underlying technologies used to implement them > vary and can even be proprietary. In practice the speech > recognition and synthesis and speaker authentication communities > have lived with this state of affairs for many years, and users of > other APIs for this technology are well aware of and have built > applications that accommodate this variability in interpretation. > It is outside the scope of this specification to attempt to > standardize interpretations of these values. > > > 12. In 9.4.9 and in 10.4.8, 11.4.11 what are the values for media- > type-value, you also mention audio and video but it looks to me that > this document only discusses voice. > > Yes. Although the original intent was to record speech, application > authors today are beginning to look at ways to incorporate other > audio or video. The intent of the sentences in these sections is to > clarify that the specification itself imposes no restriction on the > types of media that are allowed. > > > > 13. In 9.4.35 and 9.4.36 what is the scale for the consistency > here. How does one know what close means. What is the consistency > between different recognizers. > > The answer to question 11, above, applies here as well. > > > > 14. In section 9.6.3.3 in the example (figure 2) confidence should > be 0.75 and not 75 > > Fixed. > > > > 15. In section 10.4.1 it is not clear how you measure the > sensitivity in order to specify, is it based on some SNR translated > to 0 to 1 scale? > > The answer to question 11, above, applies here as well. > > > > 16. In 11.4.6 the same issue with the scale, how does the client > know how to set a value when working with different speaker > verification servers. > > Ditto. I should point out that in all of these cases the parameters > are typically passed directly to the engine, and their > interpretations are defined (and described) in the vendors' > documentation. The most common MRCPv2 server implementations are by > the technology vendors themselves (the providers of the synthesis, > recognition, and verification engines). This is commonly understood > in this technology industry (meaning those who use this technology > regularly). > > > > 17. In 11.5.2.9 you state that the verification-score is not a > probability, so what is it. How can the client decide if, for > example, 0 is a good score for specifying the threshold. I also > noticed that the values in the example in section 11.5.2.10 are very > precise like 0.98514 is this the expected precision. The examples > here and in section 11.11 do not show the threshold, if the > threshold is required for this flow why not show it in the example? > > This parameter, as others mentioned above, has only a vendor- > specific interpretation. In practice authors interpret these values > based both on guidance from the technology vendors and via > experimentation on large sets of recorded data. > > The Min-Verification-Score threshold is not required to be set. In > many cases the technology vendor has a fairly good understanding of > what the default threshold should be. The verification-score is > returned, however, in case the application author determines > (through experimentation, as described above) that the default > threshold is not producing optimal results for the application. In > that case the author can set the threshold to a different value or > can set it to -1 and make the determination within the application > itself based on the verification-score values. > > > > 18. In section 12.3 the suggestion is to use SRTP as the mandatory > interoperability mode. If the reason for mandating SRTP is for a > common mode you should also decide on a key exchange mechanism. I > suggest you look at http://tools.ietf.org/html/draft-ietf-avt-srtp-not-mandatory-02 > for discussion on media security. > > Based on the discussion between you and Dan York on the list, I will > change this: > > 12.3. Media session protection > Sensitive data is also carried on media sessions terminating on > MRCPv2 servers (the other end of a media channel may or may not be > on the MRCPv2 client). This data includes the user's spoken > utterances and the output of text-to-speech operations. MRCPv2 > servers MUST support SRTP for protection of audio media sessions. > MRCPv2 clients that originate or consume audio similarly MUST > support SRTP. Alternative media channel protection MAY be used if > desired (e.g. IPSEC). > > to this: > > 12.3. Media session protection > Sensitive data is also carried on media sessions terminating on > MRCPv2 servers (the other end of a media channel may or may not be > on the MRCPv2 client). This data includes the user's spoken > utterances and the output of text-to-speech operations. MRCPv2 > servers MUST support a security mechanism for protection of audio > media sessions. MRCPv2 clients that originate or consume audio > similarly MUST support a security mechanism for protection of the > audio. If appropriate, usage of the Secure Real-time Transport > Protocol (SRTP) [RFC3711] is recommended. > > 19. In section13.7.2 you specify the attribute resource as session > level yet in the example in section 4.2 it is a media level > attribute. The same goes for the channel attribute > > I have corrected both in section 13.7.2 to be media-level. > > > > Thanks > > Roni Even > > >
- [Speechsc] RAI review of draft-ietf-speechsc-mrcp… Roni Even
- Re: [Speechsc] RAI review of draft-ietf-speechsc-… Eric Burger
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Francois Audet
- Re: [Speechsc] RAI review of draft-ietf-speechsc-… Roni Even
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Eric Burger
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Francois Audet
- Re: [Speechsc] RAI review of draft-ietf-speechsc-… Arsen Chaloyan
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Dan York
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Roni Even
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Dan York
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Roni Even
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Judith Markowitz
- Re: [Speechsc] RAI review of draft-ietf-speechsc-… Dan Burnett
- Re: [Speechsc] RAI review of draft-ietf-speechsc-… Roni Even
- Re: [Speechsc] RAI review of draft-ietf-speechsc-… Dan Burnett
- Re: [Speechsc] [RAI] RAI review of draft-ietf-spe… Roni Even