[Speechsc] Speaker Verification - Insufficient or Noisy Speech

Nik Waldron <nik.waldron@kaz-group.com> Mon, 12 January 2009 00:07 UTC

Return-Path: <speechsc-bounces@ietf.org>
X-Original-To: speechsc-archive@optimus.ietf.org
Delivered-To: ietfarch-speechsc-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D31113A6885; Sun, 11 Jan 2009 16:07:02 -0800 (PST)
X-Original-To: speechsc@core3.amsl.com
Delivered-To: speechsc@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B098E3A6885 for <speechsc@core3.amsl.com>; Sun, 11 Jan 2009 16:07:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.183
X-Spam-Level:
X-Spam-Status: No, score=0.183 tagged_above=-999 required=5 tests=[AWL=-0.672, BAYES_20=-0.74, J_CHICKENPOX_53=0.6, RELAY_IS_203=0.994, UNPARSEABLE_RELAY=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WvtHrI5uu78M for <speechsc@core3.amsl.com>; Sun, 11 Jan 2009 16:07:00 -0800 (PST)
Received: from mail02.kaz-group.com (mail02.kaz-group.com [203.28.13.56]) by core3.amsl.com (Postfix) with ESMTP id C11D73A6407 for <speechsc@ietf.org>; Sun, 11 Jan 2009 16:06:59 -0800 (PST)
Received: from AUKGHB01.Corporate.KAZ-Group.priv (aukghb01.corporate.kaz-group.priv) by mail02.kaz-group.com (Clearswift SMTPRS 5.2.5) with ESMTP id <T8bd6ee5710ac10f02a8d4@mail02.kaz-group.com> for <speechsc@ietf.org>; Mon, 12 Jan 2009 11:06:38 +1100
To: speechsc@ietf.org
MIME-Version: 1.0
X-Mailer: Lotus Notes Release 6.5.2 June 01, 2004
Message-ID: <OF919927DC.D4031D76-ONCA25753B.0080EF56-CA25753C.0000A36E@kaz-group.com>
From: Nik Waldron <nik.waldron@kaz-group.com>
Date: Mon, 12 Jan 2009 11:06:42 +1100
X-MIMETrack: Serialize by Router on AUKGHB01/KAZGROUP/AU(Release 6.5.2|June 01, 2004) at 01/12/2009 11:06:40 AM, Serialize complete at 01/12/2009 11:06:40 AM
Cc: Nik Waldron <nik.waldron@kaz-group.com>
Subject: [Speechsc] Speaker Verification - Insufficient or Noisy Speech
X-BeenThere: speechsc@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Speech Services Control Working Group <speechsc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://www.ietf.org/mailman/private/speechsc>
List-Post: <mailto:speechsc@ietf.org>
List-Help: <mailto:speechsc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/speechsc>, <mailto:speechsc-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: speechsc-bounces@ietf.org
Errors-To: speechsc-bounces@ietf.org

I sent an email previously requesting information on how a speaker 
verification 
system implementing MRCPv2 should cope in the situation, where there was
insufficient or poor quality speech arriving on the RTP audio stream.  It 
seemed
to me that was an area of some deficiency in the specification.  I 
received no
feedback other than one response saying that to his knowledge there were 
no
other implementers for Speaker Verification.

Below I outline the MRCPv2 exchanges for a training operation:

   C->S:  MRCP/2.0 207 START-SESSION 314161
          Channel-Identifier:32AECB23433801@speakverify
          Repository-URI:http://www.example.com/voiceprintdbase/
          Voiceprint-Mode:train
          Voiceprint-Identifier:johnsmith.voiceprint

   S->C:  MRCP/2.0 82 314161 200 COMPLETE
          Channel-Identifier:32AECB23433801@speakverify

   C->S:  MRCP/2.0 76 VERIFY 314162
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 85 314162 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speakverify

The end-point detector show insufficient data (which is buffered), or bad 
signal quality (bad SNR for example).  Note that no START-OF-INPUT has NOT 

been sent although speech has begun.

   S->C:  MRCP/2.0 140 VERIFICATION-COMPLETE 314162 COMPLETE
          Channel-Identifier:32AECB23433801@speakverify
          Completion-Cause:002 no-input-timeout

This is undesirable from my perspective since it gives the impression to 
the 
client that no data has been received (untrue in the insufficient data 
case), and
provides no distinction between this and the "bad data" case.  This 
information
might be of utility to a call-flow designer in an IVR system.

I also note that in the case of text-independent verifiers several turns 
worth of
data may be required for a verification.  Several rounds of "no input" 
timeouts
would surely be confusing to the client, yet this class of verifiers may 
be unable
to generate and nlsml+xml response on the nth dialog turn.

The enrolment might then continue:

   C->S:  MRCP/2.0 76 VERIFY 314163
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 85 314163 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 96 START-OF-INPUT 314163 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 131 VERIFICATION-COMPLETE 314163 COMPLETE
          Channel-Identifier:32AECB23433801@speakverify
          Completion-Cause:000 success

   C->S:  MRCP/2.0 76 VERIFY 314164
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 85 314164 200 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 96 START-OF-INPUT 314164 IN-PROGRESS
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 131 VERIFICATION-COMPLETE 314164 COMPLETE
          Channel-Identifier:32AECB23433801@speakverify
          Completion-Cause:000 success

   C->S:  MRCP/2.0 81 END-SESSION 314174
          Channel-Identifier:32AECB23433801@speakverify

   S->C:  MRCP/2.0 82 314174 200 COMPLETE
          Channel-Identifier:32AECB23433801@speakverify

Since I received no responses (perhaps due to being close to the holiday 
season),
I will venture a proposal for extending the RFC to include the bad signal 
cases 
(+ indicates an addition, * a modification)

   +------------+--------------------------+---------------------------+
   | Cause-Code | Cause-Name               | Description               |
   +------------+--------------------------+---------------------------+
   | 000        | success                  | VERIFY or                 |
   |            |                          | VERIFY-FROM-BUFFER        |
   |            |                          | request completed         |
   |            |                          | successfully.  The verify |
   |            |                          | decision can be           |
   |            |                          | "accepted", "rejected",   |
   |            |                          | or "undecided".           |
   | 001        | error                    | VERIFY or                 |
   |            |                          | VERIFY-FROM-BUFFER        |
   |            |                          | request terminated        |
   |            |                          | prematurely due to a      |
   |            |                          | verification resource or  |
   |            |                          | system error.             |
   | 002        | no-input-timeout         | VERIFY request completed  |
   |            |                          | with no result due to a   |
   |            |                          | no-input-timeout.         |
   | 003        | too-much-speech-timeout  | VERIFY request completed  |
   |            |                          | result due to too much    |
   |            |                          | speech.                   |
   | 004        | speech-too-early         | VERIFY request completed  |
   |            |                          | with no result due to     |
   |            |                          | spoke too soon.           |
 + | 005        | insufficient-speech      | VERIFY or                 |
 + |            |                          | VERIFY-FROM-BUFFER        |
 + |            |                          | request completed         |
 + |            |                          | successfully but had      |
 + |            |                          | insufficient speech to    |
 + |            |                          | complete.  More speech    |
 + |            |                          | will complete the current |
 + |            |                          | incremental operation     |
 + | 006        | bad-speech               | VERIFY or                 |
 + |            |                          | VERIFY-FROM-BUFFER        |
 + |            |                          | request completed         |
 + |            |                          | unsuccessfully, the       |
 + |            |                          | speech quality was too    |
 + |            |                          | poor                      |
 *  | 007        | buffer-empty             | VERIFY-FROM-BUFFER        |
   |            |                          | request completed with no |
   |            |                          | result due to empty       |
   |            |                          | buffer.                   |
*  | 008        | out-of-sequence          | Verification operation    |
   |            |                          | failed due to             |
   |            |                          | out-of-sequence method    |
   |            |                          | invocations.  For example |
   |            |                          | calling VERIFY before     |
   |            |                          | QUERY-VOICEPRINT.         |
*  | 009        | repository-uri-failure   | Failure accessing         |
   |            |                          | Repository URI.           |
*  | 010        | repository-uri-missing   | Repository-uri is not     |
   |            |                          | specified.                |
*  | 011        | voiceprint-id-missing    | Voiceprint-identification |
   |            |                          | is not specified.         |
*  | 012        | voiceprint-id-not-exist  | Voiceprint-identification |
   |            |                          | does not exist in the     |
   |            |                          | voiceprint repository.    |
   +------------+--------------------------+---------------------------+

Alternatively the new entries could be appended for compatibility.  The 
only
disadvantage to doing so would be that entries would not be grouped in the
table by category.

I'll happily accept any corrections to my understanding, incase I have 
misread
the spec, or feedback on my suggestions.




NIK WALDRON

_______________________________________________
Speechsc mailing list
Speechsc@ietf.org
https://www.ietf.org/mailman/listinfo/speechsc
Supplemental web site:
&lt;http://www.standardstrack.com/ietf/speechsc&gt;