Re: [Speechsc] Speaker Verification - Insufficient or Noisy Speech
Eric Burger <eburger@standardstrack.com> Mon, 11 May 2009 13:58 UTC
Return-Path: <eburger@standardstrack.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4666B28C11C for <speechsc@core3.amsl.com>; Mon, 11 May 2009 06:58:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.101
X-Spam-Level:
X-Spam-Status: No, score=-2.101 tagged_above=-999 required=5 tests=[AWL=-0.102, BAYES_00=-2.599, J_CHICKENPOX_53=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nQRq7zToP66u for <speechsc@core3.amsl.com>; Mon, 11 May 2009 06:58:22 -0700 (PDT)
Received: from gs19.inmotionhosting.com (gs19.inmotionhosting.com [205.134.252.251]) by core3.amsl.com (Postfix) with ESMTP id 763FB3A688B for <speechsc@ietf.org>; Mon, 11 May 2009 06:58:22 -0700 (PDT)
Received: from c-75-68-112-157.hsd1.nh.comcast.net ([75.68.112.157] helo=[192.168.45.106]) by gs19.inmotionhosting.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from <eburger@standardstrack.com>) id 1M3W2o-00012J-7e; Mon, 11 May 2009 06:59:50 -0700
Message-Id: <6F3109CD-FF17-43A2-A4BE-71A6A488D22D@standardstrack.com>
From: Eric Burger <eburger@standardstrack.com>
To: Nik Waldron <nik.waldron@kaz-group.com>
In-Reply-To: <OF23016286.75EB7C53-ON4A2575B3.0007D2BC@kaz-group.com>
Content-Type: multipart/signed; boundary="Apple-Mail-14--397300717"; micalg="sha1"; protocol="application/pkcs7-signature"
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Mon, 11 May 2009 09:59:47 -0400
References: <OF23016286.75EB7C53-ON4A2575B3.0007D2BC@kaz-group.com>
X-Mailer: Apple Mail (2.930.3)
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - gs19.inmotionhosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - standardstrack.com
X-Source:
X-Source-Args:
X-Source-Dir:
Cc: speechsc@ietf.org
Subject: Re: [Speechsc] Speaker Verification - Insufficient or Noisy Speech
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2009 13:58:24 -0000
I would offer we save it for the book. On May 10, 2009, at 10:03 PM, Nik Waldron wrote: > Thanks for your response Dan, > > > > The additional code resolves the problem (2) of noisy or otherwise > ‘bad’ input, and (3) clarifies how to specify that additional data > is needed for training. > > > > I had not realised that result structure was intended be used in the > case of enrolments as well as verifications. I’m not sure if my > confusion has reach beyond myself and justifies an explanatory note > in the verification section. Thanks for the clarification in any > case. > > > > I think that the document would benefit from an appendix (or a > separate document as is the case for SDP) which has examples of all > of the major use cases. In my opinion examples often resolve > confusion for readers learning a new protocol. I note that there > are examples in the document, although not any training (enrolment) > examples that I recall for speaker verification. > > > > I appreciate the enormous effort that goes into producing a standard > protocol (everyone’s a critic). I’d be happy to contribute some > example conversations for Verification if such a section or document > eventuates. > > > > Best regards, > > > > > > > > NIK WALDRON > > > > From: dburnett@voxeo.com [mailto:dburnett@voxeo.com] > Sent: Wednesday, May 06, 2009 6:29 AM > To: Nik Waldron > Cc: speechsc@ietf.org > Subject: Re: [Speechsc] Speaker Verification - Insufficient or Noisy > Speech > > > > Nik, > > Thanks for your email. > > There are three cases in what you have described: > > 1. speech not detected (because of SNR problem, etc.). This will > return no-input-timeout, just as it would for a speech recognizer. > > 2. speech detected, neither too early (speech-too-early) nor too much > (too-much-speech-timeout), but still unusable by the training or > verification process. Note that this could happen if the speech > passes the endpointer threshold but is too garbled or noisy to be of > use to the verification engine. > This case is not handled in MRCP today. I have added error code 011, > "speech-not-usable", for this case. > > 3. additional turns are needed: the <decision> result element can be > used for this. "undecided" was the value we chose to represent the > case where the engine did not yet have enough data to decide on a > verification or training result. Note that training decisions can > also be "accepted" or "rejected" just like verification results -- the > former case means there is sufficient training data and the new > voiceprint is acceptable. The latter means there is sufficient > training data but the new voiceprint is rejected, because for example > it is too close to an existing voiceprint. > > -- dan > > On Jan 11, 2009, at 7:06 PM, Nik Waldron wrote: > > > I sent an email previously requesting information on how a speaker > > verification > > system implementing MRCPv2 should cope in the situation, where there > > was > > insufficient or poor quality speech arriving on the RTP audio > > stream. It > > seemed > > to me that was an area of some deficiency in the specification. I > > received no > > feedback other than one response saying that to his knowledge there > > were > > no > > other implementers for Speaker Verification. > > > > Below I outline the MRCPv2 exchanges for a training operation: > > > > C->S: MRCP/2.0 207 START-SESSION 314161 > > Channel-Identifier:32AECB23433801@speakverify > > Repository-URI:http://www.example.com/voiceprintdbase/ > > Voiceprint-Mode:train > > Voiceprint-Identifier:johnsmith.voiceprint > > > > S->C: MRCP/2.0 82 314161 200 COMPLETE > > Channel-Identifier:32AECB23433801@speakverify > > > > C->S: MRCP/2.0 76 VERIFY 314162 > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 85 314162 200 IN-PROGRESS > > Channel-Identifier:32AECB23433801@speakverify > > > > The end-point detector show insufficient data (which is buffered), > > or bad > > signal quality (bad SNR for example). Note that no START-OF-INPUT > > has NOT > > > > been sent although speech has begun. > > > > S->C: MRCP/2.0 140 VERIFICATION-COMPLETE 314162 COMPLETE > > Channel-Identifier:32AECB23433801@speakverify > > Completion-Cause:002 no-input-timeout > > > > This is undesirable from my perspective since it gives the > > impression to > > the > > client that no data has been received (untrue in the insufficient > data > > case), and > > provides no distinction between this and the "bad data" case. This > > information > > might be of utility to a call-flow designer in an IVR system. > > > > I also note that in the case of text-independent verifiers several > > turns > > worth of > > data may be required for a verification. Several rounds of "no > input" > > timeouts > > would surely be confusing to the client, yet this class of verifiers > > may > > be unable > > to generate and nlsml+xml response on the nth dialog turn. > > > > The enrolment might then continue: > > > > C->S: MRCP/2.0 76 VERIFY 314163 > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 85 314163 200 IN-PROGRESS > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 96 START-OF-INPUT 314163 IN-PROGRESS > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314163 COMPLETE > > Channel-Identifier:32AECB23433801@speakverify > > Completion-Cause:000 success > > > > C->S: MRCP/2.0 76 VERIFY 314164 > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 85 314164 200 IN-PROGRESS > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 96 START-OF-INPUT 314164 IN-PROGRESS > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314164 COMPLETE > > Channel-Identifier:32AECB23433801@speakverify > > Completion-Cause:000 success > > > > C->S: MRCP/2.0 81 END-SESSION 314174 > > Channel-Identifier:32AECB23433801@speakverify > > > > S->C: MRCP/2.0 82 314174 200 COMPLETE > > Channel-Identifier:32AECB23433801@speakverify > > > > Since I received no responses (perhaps due to being close to the > > holiday > > season), > > I will venture a proposal for extending the RFC to include the bad > > signal > > cases > > (+ indicates an addition, * a modification) > > > > +------------+-------------------------- > > +---------------------------+ > > | Cause-Code | Cause-Name | > > Description | > > +------------+-------------------------- > > +---------------------------+ > > | 000 | success | VERIFY > > or | > > | | | VERIFY-FROM- > > BUFFER | > > | | | request > > completed | > > | | | successfully. The > > verify | > > | | | decision can > > be | > > | | | "accepted", > > "rejected", | > > | | | or > > "undecided". | > > | 001 | error | VERIFY > > or | > > | | | VERIFY-FROM- > > BUFFER | > > | | | request > > terminated | > > | | | prematurely due to > > a | > > | | | verification resource > > or | > > | | | system > > error. | > > | 002 | no-input-timeout | VERIFY request > > completed | > > | | | with no result due to > > a | > > | | | no-input- > > timeout. | > > | 003 | too-much-speech-timeout | VERIFY request > > completed | > > | | | result due to too > > much | > > | | | > > speech. | > > | 004 | speech-too-early | VERIFY request > > completed | > > | | | with no result due > > to | > > | | | spoke too > > soon. | > > + | 005 | insufficient-speech | VERIFY > > or | > > + | | | VERIFY-FROM- > > BUFFER | > > + | | | request > > completed | > > + | | | successfully but > > had | > > + | | | insufficient speech > > to | > > + | | | complete. More > > speech | > > + | | | will complete the > > current | > > + | | | incremental > > operation | > > + | 006 | bad-speech | VERIFY > > or | > > + | | | VERIFY-FROM- > > BUFFER | > > + | | | request > > completed | > > + | | | unsuccessfully, > > the | > > + | | | speech quality was > > too | > > + | | | > > poor | > > * | 007 | buffer-empty | VERIFY-FROM- > > BUFFER | > > | | | request completed with > > no | > > | | | result due to > > empty | > > | | | > > buffer. | > > * | 008 | out-of-sequence | Verification > > operation | > > | | | failed due > > to | > > | | | out-of-sequence > > method | > > | | | invocations. For > > example | > > | | | calling VERIFY > > before | > > | | | QUERY- > > VOICEPRINT. | > > * | 009 | repository-uri-failure | Failure > > accessing | > > | | | Repository > > URI. | > > * | 010 | repository-uri-missing | Repository-uri is > > not | > > | | | > > specified. | > > * | 011 | voiceprint-id-missing | Voiceprint- > > identification | > > | | | is not > > specified. | > > * | 012 | voiceprint-id-not-exist | Voiceprint- > > identification | > > | | | does not exist in > > the | > > | | | voiceprint > > repository. | > > +------------+-------------------------- > > +---------------------------+ > > > > Alternatively the new entries could be appended for compatibility. > > The > > only > > disadvantage to doing so would be that entries would not be grouped > > in the > > table by category. > > > > I'll happily accept any corrections to my understanding, incase I > have > > misread > > the spec, or feedback on my suggestions. > > > > > > > > > > NIK WALDRON > > > > _______________________________________________ > > Speechsc mailing list > > Speechsc@ietf.org > > https://www.ietf.org/mailman/listinfo/speechsc > > Supplemental web site: > > <http://www.standardstrack.com/ietf/speechsc> > > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ > > > This is an email from Fujitsu Australia Limited, ABN 19 001 011 427. > It is confidential to the ordinary user of the email address to > which it was addressed and may contain copyright and/or legally > privileged information. No one else may read, print, store, copy or > forward all or any of it or its attachments. If you receive this > email in error, please return to sender. Thank you. > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ > _______________________________________________ > Speechsc mailing list > Speechsc@ietf.org > https://www.ietf.org/mailman/listinfo/speechsc > Supplemental web site: > <http://www.standardstrack.com/ietf/speechsc>
- [Speechsc] Speaker Verification - Insufficient or… Nik Waldron
- Re: [Speechsc] Speaker Verification - Insufficien… Dan Burnett
- Re: [Speechsc] Speaker Verification - Insufficien… Nik Waldron
- Re: [Speechsc] Speaker Verification - Insufficien… Eric Burger
- Re: [Speechsc] Speaker Verification - Insufficien… Arsen Chaloyan
- Re: [Speechsc] Speaker Verification - Insufficien… Eric Burger