Re: [Speechsc] Speaker Verification - Insufficient or Noisy Speech

Eric Burger <eburger@standardstrack.com> Mon, 11 May 2009 13:58 UTC

Return-Path: <eburger@standardstrack.com>
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4666B28C11C for <speechsc@core3.amsl.com>; Mon, 11 May 2009 06:58:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.101
X-Spam-Level:
X-Spam-Status: No, score=-2.101 tagged_above=-999 required=5 tests=[AWL=-0.102, BAYES_00=-2.599, J_CHICKENPOX_53=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nQRq7zToP66u for <speechsc@core3.amsl.com>; Mon, 11 May 2009 06:58:22 -0700 (PDT)
Received: from gs19.inmotionhosting.com (gs19.inmotionhosting.com [205.134.252.251]) by core3.amsl.com (Postfix) with ESMTP id 763FB3A688B for <speechsc@ietf.org>; Mon, 11 May 2009 06:58:22 -0700 (PDT)
Received: from c-75-68-112-157.hsd1.nh.comcast.net ([75.68.112.157] helo=[192.168.45.106]) by gs19.inmotionhosting.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from <eburger@standardstrack.com>) id 1M3W2o-00012J-7e; Mon, 11 May 2009 06:59:50 -0700
Message-Id: <6F3109CD-FF17-43A2-A4BE-71A6A488D22D@standardstrack.com>
From: Eric Burger <eburger@standardstrack.com>
To: Nik Waldron <nik.waldron@kaz-group.com>
In-Reply-To: <OF23016286.75EB7C53-ON4A2575B3.0007D2BC@kaz-group.com>
Content-Type: multipart/signed; boundary="Apple-Mail-14--397300717"; micalg="sha1"; protocol="application/pkcs7-signature"
Mime-Version: 1.0 (Apple Message framework v930.3)
Date: Mon, 11 May 2009 09:59:47 -0400
References: <OF23016286.75EB7C53-ON4A2575B3.0007D2BC@kaz-group.com>
X-Mailer: Apple Mail (2.930.3)
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - gs19.inmotionhosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - standardstrack.com
X-Source:
X-Source-Args:
X-Source-Dir:
Cc: speechsc@ietf.org
Subject: Re: [Speechsc] Speaker Verification - Insufficient or Noisy Speech
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2009 13:58:24 -0000

I would offer we save it for the book.

On May 10, 2009, at 10:03 PM, Nik Waldron wrote:

> Thanks for your response Dan,
>
>
>
> The additional code resolves the problem (2) of noisy or otherwise  
> ‘bad’ input, and (3) clarifies how to specify that additional data  
> is needed for training.
>
>
>
> I had not realised that result structure was intended be used in the  
> case of enrolments as well as verifications.  I’m not sure if my  
> confusion has reach beyond myself and justifies an explanatory note  
> in the verification section.  Thanks for the clarification in any  
> case.
>
>
>
> I think that the document would benefit from an appendix (or a  
> separate document as is the case for SDP) which has examples of all  
> of the major use cases.  In my opinion examples often resolve  
> confusion for readers learning a new protocol.  I note that there  
> are examples in the document, although not any training (enrolment)  
> examples that I recall for speaker verification.
>
>
>
> I appreciate the enormous effort that goes into producing a standard  
> protocol (everyone’s a critic).  I’d be happy to contribute some  
> example conversations for Verification if such a section or document  
> eventuates.
>
>
>
> Best regards,
>
>
>
>
>
>
>
> NIK WALDRON
>
>
>
> From: dburnett@voxeo.com [mailto:dburnett@voxeo.com]
> Sent: Wednesday, May 06, 2009 6:29 AM
> To: Nik Waldron
> Cc: speechsc@ietf.org
> Subject: Re: [Speechsc] Speaker Verification - Insufficient or Noisy  
> Speech
>
>
>
> Nik,
>
> Thanks for your email.
>
> There are three cases in what you have described:
>
> 1. speech not detected (because of SNR problem, etc.).  This will
> return no-input-timeout, just as it would for a speech recognizer.
>
> 2. speech detected, neither too early (speech-too-early) nor too much
> (too-much-speech-timeout), but still unusable by the training or
> verification process.  Note that this could happen if the speech
> passes the endpointer threshold but is too garbled or noisy to be of
> use to the verification engine.
> This case is not handled in MRCP today.  I have added error code 011,
> "speech-not-usable", for this case.
>
> 3. additional turns are needed:  the <decision> result element can be
> used for this.  "undecided" was the value we chose to represent the
> case where the engine did not yet have enough data to decide on a
> verification or training result.  Note that training decisions can
> also be "accepted" or "rejected" just like verification results -- the
> former case means there is sufficient training data and the new
> voiceprint is acceptable.  The latter means there is sufficient
> training data but the new voiceprint is rejected, because for example
> it is too close to an existing voiceprint.
>
> -- dan
>
> On Jan 11, 2009, at 7:06 PM, Nik Waldron wrote:
>
> > I sent an email previously requesting information on how a speaker
> > verification
> > system implementing MRCPv2 should cope in the situation, where there
> > was
> > insufficient or poor quality speech arriving on the RTP audio
> > stream.  It
> > seemed
> > to me that was an area of some deficiency in the specification.  I
> > received no
> > feedback other than one response saying that to his knowledge there
> > were
> > no
> > other implementers for Speaker Verification.
> >
> > Below I outline the MRCPv2 exchanges for a training operation:
> >
> >   C->S:  MRCP/2.0 207 START-SESSION 314161
> >          Channel-Identifier:32AECB23433801@speakverify
> >          Repository-URI:http://www.example.com/voiceprintdbase/
> >          Voiceprint-Mode:train
> >          Voiceprint-Identifier:johnsmith.voiceprint
> >
> >   S->C:  MRCP/2.0 82 314161 200 COMPLETE
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   C->S:  MRCP/2.0 76 VERIFY 314162
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 85 314162 200 IN-PROGRESS
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> > The end-point detector show insufficient data (which is buffered),
> > or bad
> > signal quality (bad SNR for example).  Note that no START-OF-INPUT
> > has NOT
> >
> > been sent although speech has begun.
> >
> >   S->C:  MRCP/2.0 140 VERIFICATION-COMPLETE 314162 COMPLETE
> >          Channel-Identifier:32AECB23433801@speakverify
> >          Completion-Cause:002 no-input-timeout
> >
> > This is undesirable from my perspective since it gives the
> > impression to
> > the
> > client that no data has been received (untrue in the insufficient  
> data
> > case), and
> > provides no distinction between this and the "bad data" case.  This
> > information
> > might be of utility to a call-flow designer in an IVR system.
> >
> > I also note that in the case of text-independent verifiers several
> > turns
> > worth of
> > data may be required for a verification.  Several rounds of "no  
> input"
> > timeouts
> > would surely be confusing to the client, yet this class of verifiers
> > may
> > be unable
> > to generate and nlsml+xml response on the nth dialog turn.
> >
> > The enrolment might then continue:
> >
> >   C->S:  MRCP/2.0 76 VERIFY 314163
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 85 314163 200 IN-PROGRESS
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 96 START-OF-INPUT 314163 IN-PROGRESS
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 131 VERIFICATION-COMPLETE 314163 COMPLETE
> >          Channel-Identifier:32AECB23433801@speakverify
> >          Completion-Cause:000 success
> >
> >   C->S:  MRCP/2.0 76 VERIFY 314164
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 85 314164 200 IN-PROGRESS
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 96 START-OF-INPUT 314164 IN-PROGRESS
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 131 VERIFICATION-COMPLETE 314164 COMPLETE
> >          Channel-Identifier:32AECB23433801@speakverify
> >          Completion-Cause:000 success
> >
> >   C->S:  MRCP/2.0 81 END-SESSION 314174
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> >   S->C:  MRCP/2.0 82 314174 200 COMPLETE
> >          Channel-Identifier:32AECB23433801@speakverify
> >
> > Since I received no responses (perhaps due to being close to the
> > holiday
> > season),
> > I will venture a proposal for extending the RFC to include the bad
> > signal
> > cases
> > (+ indicates an addition, * a modification)
> >
> >   +------------+--------------------------
> > +---------------------------+
> >   | Cause-Code | Cause-Name               |
> > Description               |
> >   +------------+--------------------------
> > +---------------------------+
> >   | 000        | success                  | VERIFY
> > or                 |
> >   |            |                          | VERIFY-FROM-
> > BUFFER        |
> >   |            |                          | request
> > completed         |
> >   |            |                          | successfully.  The
> > verify |
> >   |            |                          | decision can
> > be           |
> >   |            |                          | "accepted",
> > "rejected",   |
> >   |            |                          | or
> > "undecided".           |
> >   | 001        | error                    | VERIFY
> > or                 |
> >   |            |                          | VERIFY-FROM-
> > BUFFER        |
> >   |            |                          | request
> > terminated        |
> >   |            |                          | prematurely due to
> > a      |
> >   |            |                          | verification resource
> > or  |
> >   |            |                          | system
> > error.             |
> >   | 002        | no-input-timeout         | VERIFY request
> > completed  |
> >   |            |                          | with no result due to
> > a   |
> >   |            |                          | no-input-
> > timeout.         |
> >   | 003        | too-much-speech-timeout  | VERIFY request
> > completed  |
> >   |            |                          | result due to too
> > much    |
> >   |            |                          |
> > speech.                   |
> >   | 004        | speech-too-early         | VERIFY request
> > completed  |
> >   |            |                          | with no result due
> > to     |
> >   |            |                          | spoke too
> > soon.           |
> > + | 005        | insufficient-speech      | VERIFY
> > or                 |
> > + |            |                          | VERIFY-FROM-
> > BUFFER        |
> > + |            |                          | request
> > completed         |
> > + |            |                          | successfully but
> > had      |
> > + |            |                          | insufficient speech
> > to    |
> > + |            |                          | complete.  More
> > speech    |
> > + |            |                          | will complete the
> > current |
> > + |            |                          | incremental
> > operation     |
> > + | 006        | bad-speech               | VERIFY
> > or                 |
> > + |            |                          | VERIFY-FROM-
> > BUFFER        |
> > + |            |                          | request
> > completed         |
> > + |            |                          | unsuccessfully,
> > the       |
> > + |            |                          | speech quality was
> > too    |
> > + |            |                          |
> > poor                      |
> > *  | 007        | buffer-empty             | VERIFY-FROM-
> > BUFFER        |
> >   |            |                          | request completed with
> > no |
> >   |            |                          | result due to
> > empty       |
> >   |            |                          |
> > buffer.                   |
> > *  | 008        | out-of-sequence          | Verification
> > operation    |
> >   |            |                          | failed due
> > to             |
> >   |            |                          | out-of-sequence
> > method    |
> >   |            |                          | invocations.  For
> > example |
> >   |            |                          | calling VERIFY
> > before     |
> >   |            |                          | QUERY-
> > VOICEPRINT.         |
> > *  | 009        | repository-uri-failure   | Failure
> > accessing         |
> >   |            |                          | Repository
> > URI.           |
> > *  | 010        | repository-uri-missing   | Repository-uri is
> > not     |
> >   |            |                          |
> > specified.                |
> > *  | 011        | voiceprint-id-missing    | Voiceprint-
> > identification |
> >   |            |                          | is not
> > specified.         |
> > *  | 012        | voiceprint-id-not-exist  | Voiceprint-
> > identification |
> >   |            |                          | does not exist in
> > the     |
> >   |            |                          | voiceprint
> > repository.    |
> >   +------------+--------------------------
> > +---------------------------+
> >
> > Alternatively the new entries could be appended for compatibility.
> > The
> > only
> > disadvantage to doing so would be that entries would not be grouped
> > in the
> > table by category.
> >
> > I'll happily accept any corrections to my understanding, incase I  
> have
> > misread
> > the spec, or feedback on my suggestions.
> >
> >
> >
> >
> > NIK WALDRON
> >
> > _______________________________________________
> > Speechsc mailing list
> > Speechsc@ietf.org
> > https://www.ietf.org/mailman/listinfo/speechsc
> > Supplemental web site:
> > &lt;http://www.standardstrack.com/ietf/speechsc>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
>
> This is an email from Fujitsu Australia Limited, ABN 19 001 011 427.  
> It is confidential to the ordinary user of the email address to  
> which it was addressed and may contain copyright and/or legally  
> privileged information. No one else may read, print, store, copy or  
> forward all or any of it or its attachments. If you receive this  
> email in error, please return to sender. Thank you.
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
> _______________________________________________
> Speechsc mailing list
> Speechsc@ietf.org
> https://www.ietf.org/mailman/listinfo/speechsc
> Supplemental web site:
> &lt;http://www.standardstrack.com/ietf/speechsc&gt;