Re: [Speechsc] Speaker Verification - Insufficient or Noisy Speech
Eric Burger <eburger@standardstrack.com> Mon, 11 May 2009 16:19 UTC
Return-Path: <eburger@standardstrack.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 263E33A6A99 for <speechsc@core3.amsl.com>; Mon, 11 May 2009 09:19:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[AWL=-0.100, BAYES_00=-2.599, J_CHICKENPOX_53=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pnsqBMm9uyyg for <speechsc@core3.amsl.com>; Mon, 11 May 2009 09:19:47 -0700 (PDT)
Received: from gs19.inmotionhosting.com (gs19.inmotionhosting.com [205.134.252.251]) by core3.amsl.com (Postfix) with ESMTP id ABC703A68B4 for <speechsc@ietf.org>; Mon, 11 May 2009 09:19:47 -0700 (PDT)
Received: from c-75-68-112-157.hsd1.nh.comcast.net ([75.68.112.157] helo=[192.168.45.106]) by gs19.inmotionhosting.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from <eburger@standardstrack.com>) id 1M3YFd-0006ur-Ro; Mon, 11 May 2009 09:21:14 -0700
Message-Id: <B9E51944-EE9B-40F6-A655-C7B03C3F3B60@standardstrack.com>
From: Eric Burger <eburger@standardstrack.com>
To: Arsen Chaloyan <achaloyan@yahoo.com>
In-Reply-To: <36904.3443.qm@web111304.mail.gq1.yahoo.com>
Content-Type: multipart/signed; boundary="Apple-Mail-19--388815639"; micalg="sha1"; protocol="application/pkcs7-signature"
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Mon, 11 May 2009 12:21:12 -0400
References: <OF23016286.75EB7C53-ON4A2575B3.0007D2BC@kaz-group.com> <6F3109CD-FF17-43A2-A4BE-71A6A488D22D@standardstrack.com> <36904.3443.qm@web111304.mail.gq1.yahoo.com>
X-Mailer: Apple Mail (2.930.3)
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - gs19.inmotionhosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - standardstrack.com
X-Source:
X-Source-Args:
X-Source-Dir:
Cc: speechsc@ietf.org
Subject: Re: [Speechsc] Speaker Verification - Insufficient or Noisy Speech
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2009 16:19:49 -0000
That would be awesome. The reason I am pushing back on expanding the scope of the base specification is that we can always make it better. However, we are already three years late, and unless something is glaringly incorrect or will harm the Internet, and given there are so many implementations of -16 out there, we should just get this published as an RFC and we can work on examples, extensions, and clarifications in separate documents. On May 11, 2009, at 11:14 AM, Arsen Chaloyan wrote: > Book is definitely good. > However SDP o/a examples (RFC4317) like online resource would be > indeed helpful. > It's not matter of just examples, but it may also cover error cases. > Reading the latest draft, it's still not always obvious what error > response should be sent in some particular cases. Clearly it's not > possible to cover all the cases in foundation specification. > I'll be happy to create such resource in the scope (web) of open > source MRCP project I maintain, if it makes sense. > > Regards, > Arsen Chaloyan > www.unimrcp.org > > From: Eric Burger <eburger@standardstrack.com> > To: Nik Waldron <nik.waldron@kaz-group.com> > Cc: speechsc@ietf.org > Sent: Monday, May 11, 2009 6:59:47 PM > Subject: Re: [Speechsc] Speaker Verification - Insufficient or Noisy > Speech > > I would offer we save it for the book. > > On May 10, 2009, at 10:03 PM, Nik Waldron wrote: > > > Thanks for your response Dan, > > > > > > > > The additional code resolves the problem (2) of noisy or otherwise > ‘bad’ input, and (3) clarifies how to specify that additional data > is needed for training. > > > > > > > > I had not realised that result structure was intended be used in > the case of enrolments as well as verifications. I’m not sure if my > confusion has reach beyond myself and justifies an explanatory note > in the verification section. Thanks for the clarification in any > case. > > > > > > > > I think that the document would benefit from an appendix (or a > separate document as is the case for SDP) which has examples of all > of the major use cases. In my opinion examples often resolve > confusion for readers learning a new protocol. I note that there > are examples in the document, although not any training (enrolment) > examples that I recall for speaker verification. > > > > > > > > I appreciate the enormous effort that goes into producing a > standard protocol (everyone’s a critic). I’d be happy to contribute > some example conversations for Verification if such a section or > document eventuates. > > > > > > > > Best regards, > > > > > > > > > > > > > > > > NIK WALDRON > > > > > > > > From: dburnett@voxeo.com [mailto:dburnett@voxeo.com] > > Sent: Wednesday, May 06, 2009 6:29 AM > > To: Nik Waldron > > Cc: speechsc@ietf.org > > Subject: Re: [Speechsc] Speaker Verification - Insufficient or > Noisy Speech > > > > > > > > Nik, > > > > Thanks for your email. > > > > There are three cases in what you have described: > > > > 1. speech not detected (because of SNR problem, etc.). This will > > return no-input-timeout, just as it would for a speech recognizer. > > > > 2. speech detected, neither too early (speech-too-early) nor too > much > > (too-much-speech-timeout), but still unusable by the training or > > verification process. Note that this could happen if the speech > > passes the endpointer threshold but is too garbled or noisy to be of > > use to the verification engine. > > This case is not handled in MRCP today. I have added error code > 011, > > "speech-not-usable", for this case. > > > > 3. additional turns are needed: the <decision> result element can > be > > used for this. "undecided" was the value we chose to represent the > > case where the engine did not yet have enough data to decide on a > > verification or training result. Note that training decisions can > > also be "accepted" or "rejected" just like verification results -- > the > > former case means there is sufficient training data and the new > > voiceprint is acceptable. The latter means there is sufficient > > training data but the new voiceprint is rejected, because for > example > > it is too close to an existing voiceprint. > > > > -- dan > > > > On Jan 11, 2009, at 7:06 PM, Nik Waldron wrote: > > > > > I sent an email previously requesting information on how a speaker > > > verification > > > system implementing MRCPv2 should cope in the situation, where > there > > > was > > > insufficient or poor quality speech arriving on the RTP audio > > > stream. It > > > seemed > > > to me that was an area of some deficiency in the specification. I > > > received no > > > feedback other than one response saying that to his knowledge > there > > > were > > > no > > > other implementers for Speaker Verification. > > > > > > Below I outline the MRCPv2 exchanges for a training operation: > > > > > > C->S: MRCP/2..0 207 START-SESSION 314161 > > > Channel-Identifier:32AECB23433801@speakverify > > > Repository-URI:http://www.example.com/voiceprintdbase/ > > > Voiceprint-Mode:train > > > Voiceprint-Identifier:johnsmith.voiceprint > > > > > > S->C: MRCP/2.0 82 314161 200 COMPLETE > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > C->S: MRCP/2.0 76 VERIFY 314162 > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2.0 85 314162 200 IN-PROGRESS > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > The end-point detector show insufficient data (which is buffered), > > > or bad > > > signal quality (bad SNR for example). Note that no START-OF-INPUT > > > has NOT > > > > > > been sent although speech has begun. > > > > > > S->C: MRCP/2.0 140 VERIFICATION-COMPLETE 314162 COMPLETE > > > Channel-Identifier:32AECB23433801@speakverify > > > Completion-Cause:002 no-input-timeout > > > > > > This is undesirable from my perspective since it gives the > > > impression to > > > the > > > client that no data has been received (untrue in the > insufficient data > > > case), and > > > provides no distinction between this and the "bad data" case. > This > > > information > > > might be of utility to a call-flow designer in an IVR system. > > > > > > I also note that in the case of text-independent verifiers several > > > turns > > > worth of > > > data may be required for a verification. Several rounds of "no > input" > > > timeouts > > > would surely be confusing to the client, yet this class of > verifiers > > > may > > > be unable > > > to generate and nlsml+xml response on the nth dialog turn. > > > > > > The enrolment might then continue: > > > > > > C->S: MRCP/2.0 76 VERIFY 314163 > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2.0 85 314163 200 IN-PROGRESS > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2.0 96 START-OF-INPUT 314163 IN-PROGRESS > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2..0 131 VERIFICATION-COMPLETE 314163 COMPLETE > > > Channel-Identifier:32AECB23433801@speakverify > > > Completion-Cause:000 success > > > > > > C->S: MRCP/2.0 76 VERIFY 314164 > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2.0 85 314164 200 IN-PROGRESS > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2.0 96 START-OF-INPUT 314164 IN-PROGRESS > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2.0 131 VERIFICATION-COMPLETE 314164 COMPLETE > > > Channel-Identifier:32AECB23433801@speakverify > > > Completion-Cause:000 success > > > > > > C->S: MRCP/2.0 81 END-SESSION 314174 > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > S->C: MRCP/2.0 82 314174 200 COMPLETE > > > Channel-Identifier:32AECB23433801@speakverify > > > > > > Since I received no responses (perhaps due to being close to the > > > holiday > > > season), > > > I will venture a proposal for extending the RFC to include the bad > > > signal > > > cases > > > (+ indicates an addition, * a modification) > > > > > > +------------+-------------------------- > > > +---------------------------+ > > > | Cause-Code | Cause-Name | > > > Description | > > > +------------+-------------------------- > > > +---------------------------+ > > > | 000 | success | VERIFY > > > or | > > > | | | VERIFY-FROM- > > > BUFFER | > > > | | | request > > > completed | > > > | | | successfully. The > > > verify | > > > | | | decision can > > > be | > > > | | | "accepted", > > > "rejected", | > > > | | | or > > > "undecided". | > > > | 001 | error | VERIFY > > > or | > > > | | | VERIFY-FROM- > > > BUFFER | > > > | | | request > > > terminated | > > > | | | prematurely due to > > > a | > > > | | | verification resource > > > or | > > > | | | system > > > error. | > > > | 002 | no-input-timeout | VERIFY request > > > completed | > > > | | | with no result due to > > > a | > > > | | | no-input- > > > timeout. | > > > | 003 | too-much-speech-timeout | VERIFY request > > > completed | > > > | | | result due to too > > > much | > > > | | | > > > speech. | > > > | 004 | speech-too-early | VERIFY request > > > completed | > > > | | | with no result due > > > to | > > > | | | spoke too > > > soon. | > > > + | 005 | insufficient-speech | VERIFY > > > or | > > > + | | | VERIFY-FROM- > > > BUFFER | > > > + | | | request > > > completed | > > > + | | | successfully but > > > had | > > > + | | | insufficient speech > > > to | > > > + | | | complete. More > > > speech | > > > + | | | will complete the > > > current | > > > + | | | incremental > > > operation | > > > + | 006 | bad-speech | VERIFY > > > or | > > > + | | | VERIFY-FROM- > > > BUFFER | > > > + | | | request > > > completed | > > > + | | | unsuccessfully, > > > the | > > > + | | | speech quality was > > > too | > > > + | | | > > > poor | > > > * | 007 | buffer-empty | VERIFY-FROM- > > > BUFFER | > > > | | | request completed with > > > no | > > > | | | result due to > > > empty | > > > | | | > > > buffer. | > > > * | 008 | out-of-sequence | Verification > > > operation | > > > | | | failed due > > > to | > > > | | | out-of-sequence > > > method | > > > | | | invocations. For > > > example | > > > | | | calling VERIFY > > > before | > > > | | | QUERY- > > > VOICEPRINT. | > > > * | 009 | repository-uri-failure | Failure > > > accessing | > > > | | | Repository > > > URI. | > > > * | 010 | repository-uri-missing | Repository-uri is > > > not | > > > | | | > > > specified. | > > > * | 011 | voiceprint-id-missing | Voiceprint- > > > identification | > > > | | | is not > > > specified. | > > > * | 012 | voiceprint-id-not-exist | Voiceprint- > > > identification | > > > | | | does not exist in > > > the | > > > | | | voiceprint > > > repository. | > > > +------------+-------------------------- > > > +---------------------------+ > > > > > > Alternatively the new entries could be appended for compatibility. > > > The > > > only > > > disadvantage to doing so would be that entries would not be > grouped > > > in the > > > table by category. > > > > > > I'll happily accept any corrections to my understanding, incase > I have > > > misread > > > the spec, or feedback on my suggestions. > > > > > > > > > > > > > > > NIK WALDRON > > > > > > _______________________________________________ > > > Speechsc mailing list > > > Speechsc@ietf.org > > > https://www.ietf.org/mailman/listinfo/speechsc > > > Supplemental web site: > > > <http://www.standardstrack.com/ietf/speechsc> > > > > > > > ______________________________________________________________________ > > This email has been scanned by the MessageLabs Email Security > System. > > For more information please visit http://www.messagelabs.com/email > > > ______________________________________________________________________ > > > > > > This is an email from Fujitsu Australia Limited, ABN 19 001 011 > 427. It is confidential to the ordinary user of the email address to > which it was addressed and may contain copyright and/or legally > privileged information. No one else may read, print, store, copy or > forward all or any of it or its attachments. If you receive this > email in error, please return to sender. Thank you. > > > ______________________________________________________________________ > > This email has been scanned by the MessageLabs Email Security > System. > > For more information please visit http://www.messagelabs.com/email > > > ______________________________________________________________________ > > _______________________________________________ > > Speechsc mailing list > > Speechsc@ietf.org > > https://www.ietf.org/mailman/listinfo/speechsc > > Supplemental web site: > > <http://www.standardstrack.com/ietf/speechsc> >
- [Speechsc] Speaker Verification - Insufficient or… Nik Waldron
- Re: [Speechsc] Speaker Verification - Insufficien… Dan Burnett
- Re: [Speechsc] Speaker Verification - Insufficien… Nik Waldron
- Re: [Speechsc] Speaker Verification - Insufficien… Eric Burger
- Re: [Speechsc] Speaker Verification - Insufficien… Arsen Chaloyan
- Re: [Speechsc] Speaker Verification - Insufficien… Eric Burger