RE: [speechsc] Hotword Recognition and Timers
"Saravanan Shanmugham \(sarvi\)" <sarvi@cisco.com> Fri, 16 June 2006 18:46 UTC
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1FrJKa-0000Jq-U2; Fri, 16 Jun 2006 14:46:08 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1FrJKZ-0000Jl-L5 for speechsc@ietf.org; Fri, 16 Jun 2006 14:46:07 -0400
Received: from sj-iport-6.cisco.com ([171.71.176.117]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FrJKY-0004SC-0F for speechsc@ietf.org; Fri, 16 Jun 2006 14:46:07 -0400
Received: from sj-dkim-3.cisco.com ([171.71.179.195]) by sj-iport-6.cisco.com with ESMTP; 16 Jun 2006 11:46:05 -0700
Received: from sj-core-5.cisco.com (sj-core-5.cisco.com [171.71.177.238]) by sj-dkim-3.cisco.com (8.12.11/8.12.11) with ESMTP id k5GIk5Of027274; Fri, 16 Jun 2006 11:46:05 -0700
Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-5.cisco.com (8.12.10/8.12.6) with ESMTP id k5GIk5CU018356; Fri, 16 Jun 2006 11:46:05 -0700 (PDT)
Received: from xmb-sjc-229.amer.cisco.com ([128.107.191.122]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.211); Fri, 16 Jun 2006 11:46:05 -0700
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [speechsc] Hotword Recognition and Timers
Date: Fri, 16 Jun 2006 11:46:04 -0700
Message-ID: <C6A1C20DB743364EB446E923B2229FEF01EFB625@xmb-sjc-229.amer.cisco.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [speechsc] Hotword Recognition and Timers
Thread-Index: AcaK8/fdm8c0SZYVTLC8FadIOqZtaAGfC2Cw
From: "Saravanan Shanmugham (sarvi)" <sarvi@cisco.com>
To: Dan Burnett <dan_burnett2000@yahoo.com>, "IETF SPEECHSC (E-mail)" <speechsc@ietf.org>
X-OriginalArrivalTime: 16 Jun 2006 18:46:05.0113 (UTC) FILETIME=[1F6DD290:01C69175]
DKIM-Signature: a=rsa-sha1; q=dns; l=7691; t=1150483565; x=1151347565; c=relaxed/simple; s=sjdkim3001; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=sarvi@cisco.com; z=From:=22Saravanan=20Shanmugham=20\(sarvi\)=22=20<sarvi@cisco.com> |Subject:RE=3A=20[speechsc]=20Hotword=20Recognition=20and=20Timers=20; X=v=3Dcisco.com=3B=20h=3DWtanzVcH93PBFN4d2ftR/MYEhLo=3D; b=RV3Kf5WlCW+FRRJR1tPzxM/JPr50SSc5nTndzor4PYMNP2lh/+WDMnzfIKsDl3CMmo44PvdR 2C7VI0mNAUZGifIbYRSC9NsmCMUvH7XQYFWCjMUCCMTzdVdWHds4VSBP;
Authentication-Results: sj-dkim-3.cisco.com; header.From=sarvi@cisco.com; dkim=pass ( sig from cisco.com verified; );
X-Spam-Score: 0.0 (/)
X-Scan-Signature: ded6070f7eed56e10c4f4d0d5043d9c7
Cc:
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
Errors-To: speechsc-bounces@ietf.org
I can see that both No-Input-Timout and Recognition-Tiemout values will be usefull for Hotword recognition. But saying that Recognition-Timer is started after speech is detected bothers me. Also what do you expect typical values for these timers based on your proposed definitions. Hotword recognition is very often used to issue commands. So lets take the following scenario and look at possible cases. When the system reading out a long email, you should be able to issue command like "speedup" or "slow down" or "repeat" etc. 1. But then I might never say any command at all. So defining Recognition-Timer as starting after speech is detected makes no sense in this case. No-Input-Timer, if defined to be applicable to Hotword recognition might make sense in this case. 2. Then I might say something unintelligible in the middle. Which should be technically ignored. And then a little later I might actually speak a command, "speed up". Here when I said something unintelligible, the No-Input-Timer would be stopped. If we went with the definition proposed, the Recognition-Timer would be started here. If you assume No-Input-Timer would be sufficiently large and Recognition-Timer will be relatively small. This means that once we say something not matching a hotword(which should technically expected to be ignored), the RECOGNIZE would complete due to Recogition-Timeout. If we assume No-Input-Timer to be short and Recognition-Timer to be long, then we are requiring that the user MUST say something intelligible or unintelligible reasobaly quickly. Or the Recognize would terminate due to No-Input-timeout. If we assume No-Input-Timer to be large and Recognition-timer to be large as well. The depending on whether I say something unintelligible or not, the over all timeout could be pretty large upto max of No-Tinput-timer + Recognition-Timer. The way I would expect this to work is, that No-Input-Timer and Recognition-Timers are started at beginning of a hotword RECOGNIZE and both are reasonably large values. The No-Input-Timer being most likely possible equal to or smaller than Recognition-Timer. Now, if I said nothing at all an the No-Input-Timer expired, the RECOGNIZE commplete with no-input-timeout. The moment I say something, unintelligible or intelligible, the No-Input-timer is stopped. Recognition-Timer continues on. If the current speech or a future command matches a hotword grammar, the RECOGNIZE command, it completes with success. If nothing matches and the Recognition-Timer expires, the RECOGNIZE completes with recognition-timeout. This way for hotword, Recognition-Timer is the max recognition time for the RECOGNIZE. While No-Input-Timer would only be equal or smaller. Thx, Sarvi -----Original Message----- From: Dan Burnett [mailto:dan_burnett2000@yahoo.com] Sent: Thursday, June 08, 2006 5:06 AM To: IETF SPEECHSC (E-mail) Subject: Re: [speechsc] Hotword Recognition and Timers This email is a result of discussions by the MRCP subgroup of the VoiceXML Forum, in which I participated, so I already agree with the proposals given here. However, I would like to hear comments from others before applying these changes to the spec draft, preferably from those who did not participate in the VoiceXML Forum discussions. This has been added to the issue tracker (http://www.softarmor.com/roundup/speechsc) as issue 88. -- dan --- Andrew Wahbe <awahbe@voicegenie.com> wrote: > The description of how timers (no-input and > recognition) are used during > hotword recognition is inconsistent. In sections 9.4.7, it is stated > that "For a hotword recognition mode, this timer is started when the > user begins speaking. Note that for Hotword mode recognition the > START-OF-INPUT event is not generated." However, section 9.9 states > that for the hotword case: "The Recognition-Timer gets started at the > beginning of RECOGNIZE." > > It seems that section 9.9 is incorrect (or at least is inconsistent > with VoiceXML). > > Section 9.9 omits any mention of the no-input timer for the hotword > mode recognition case; however, none of the sections that deal with > the no-input timer make a distinction between the hotword and > non-hotword cases. VoiceXML also does not make this distinction. > It would seem that > section 9.9 should be changed to indicate that no-input timers are > started in the hotword case and that no-input-timeout is a valid > completion cause for a hotword recognition. > > A related question worth considering is if the recognition timer is > reset at any point, for example, on the detection of silence. Consider > the case when maxspeech has a value of say 20 seconds (a > typical/reasonable value) and hotword barge-in is being used on a > prompt that is 30 seconds long. This would mean that a user that spoke > briefly > 2 seconds into the prompt (and was silent for the remainder of the > prompt) would experience a maxspeech timeout at about 22 seconds into > the prompt. They would not hear the whole prompt which seems > inappropriate. The reason for maxspeech timeout is to catch continuous > noise and keep it from occupying a recognizer; but what should happen > in periods of silence in the hotword case? > > Similarly, when is the no-input timer canceled in the hotword case? Is > it when speech (not necessarily matching) is detected? Or is it only > upon a match? > > The correct behavior in my opinion is that the no-input timer is > canceled only on a match, and that the recognition timer should be > reset if silence (determined by complete timeout and incomplete > timeout) is detected. If we are just processing intermittent noise, > the no-input timer will eventually expire. Continuous noise is handled > by the recognition timer. Of course other there are other > possibilities as well, this is just one option that I think fits with > VoiceXML. > > begin:vcard > fn:Andrew Wahbe > n:Wahbe;Andrew > org:VoiceGenie Technologies INC. > adr:8th Floor;;1120 Finch Avenue W.;Toronto;ON;M3J 3H7;Canada > email;internet:awahbe@voicegenie.com > title:Senior Architect > tel;work:(416) 736-0905 ext. 258 > tel;fax:(416) 736-1551 > x-mozilla-html:TRUE > url:http://www.voicegenie.com > version:2.1 > end:vcard > > > _______________________________________________ > Speechsc mailing list > Speechsc@ietf.org > https://www1.ietf.org/mailman/listinfo/speechsc > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ Speechsc mailing list Speechsc@ietf.org https://www1.ietf.org/mailman/listinfo/speechsc _______________________________________________ Speechsc mailing list Speechsc@ietf.org https://www1.ietf.org/mailman/listinfo/speechsc
- [speechsc] Hotword Recognition and Timers Andrew Wahbe
- Re: [speechsc] Hotword Recognition and Timers Dan Burnett
- RE: [speechsc] Hotword Recognition and Timers Saravanan Shanmugham (sarvi)
- RE: [speechsc] Hotword Recognition and Timers Andrew Wahbe
- Re: [speechsc] Hotword Recognition and Timers Dave Burke
- RE: [speechsc] Hotword Recognition and Timers Andrew Wahbe
- Re: [speechsc] Hotword Recognition and Timers Dave Burke
- RE: [speechsc] Hotword Recognition and Timers Andrew Wahbe
- Re: [Speechsc] [speechsc] Hotword Recognition and… Joe Wong
- Re: [Speechsc] [speechsc] Hotword Recognition and… Dan Burnett