Re: [secdir] secdir review of draft-ietf-speechsc-mrcpv2

Dan Burnett <dburnett@voxeo.com> Wed, 14 July 2010 13:19 UTC

Return-Path: <dburnett@voxeo.com>
X-Original-To: secdir@core3.amsl.com
Delivered-To: secdir@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 143A23A681C; Wed, 14 Jul 2010 06:19:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Level:
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VqrHGh+iE6kO; Wed, 14 Jul 2010 06:19:47 -0700 (PDT)
Received: from voxeo.com (mmail.voxeo.com [66.193.54.208]) by core3.amsl.com (Postfix) with ESMTP id 55FC83A68B7; Wed, 14 Jul 2010 06:19:47 -0700 (PDT)
Received: from [97.120.147.6] (account dburnett@voxeo.com HELO [192.168.0.7]) by voxeo.com (CommuniGate Pro SMTP 5.3.8) with ESMTPSA id 67724911; Wed, 14 Jul 2010 13:19:55 +0000
Message-Id: <769E404D-4D62-494F-9594-80F38CD922DF@voxeo.com>
From: Dan Burnett <dburnett@voxeo.com>
To: Catherine Meadows <catherine.meadows@nrl.navy.mil>
In-Reply-To: <51173F8E-94BF-4347-B7A8-909BA5433443@nrl.navy.mil>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Wed, 14 Jul 2010 09:19:47 -0400
References: <51173F8E-94BF-4347-B7A8-909BA5433443@nrl.navy.mil>
X-Mailer: Apple Mail (2.936)
X-Mailman-Approved-At: Sun, 18 Jul 2010 12:34:39 -0700
Cc: sarvi@cisco.com, oran@cisco.com, iesg@ietf.org, eburger@standardstrack.com, secdir@ietf.org
Subject: Re: [secdir] secdir review of draft-ietf-speechsc-mrcpv2
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/secdir>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jul 2010 13:19:49 -0000

Thank you for your comments.  Our intent for the text does not exactly  
match the interpretation you give below.  The general reply I have is  
that Speaker Verification and Speaker Identification are two different  
processes, one of which employs a claim of identity and one of which  
does not.  Neither requires prior verification of the speaker being a  
member of a group.

Please see my specific replies, embedded below.

Dan Burnett

On Jul 9, 2010, at 9:21 PM, Catherine Meadows wrote:

> I have reviewed this document as part of the security directorate's  
> ongoing effort to review all IETF documents being processed by the  
> IESG.
> These comments were written primarily for the benefit of the  
> security area directors.
> Document editors and WG chairs should treat these comments just like  
> any other last call comments.
>
> This draft describes the Media Resource Control Protocol Version 2  
> (MRCPv2)
> which allows client hosts to control media service resources  
> residing in servers on a network.
> MRCPv2 makes use of the Session Initiation Protocol (SIP) to  
> initiate and manage sessions
> and the Session Description Protocol (SDP) to manage and exchange  
> capabilities.  Both clients
> and servers rely on TLS for security.
>
> Most of the security requirements for this protocol are similar to  
> requirements for any protocol
> that manages control data, some of which must be sensitive.  These  
> are outlined in the Security
> Considerations section.  MRCPv2 also supports the use of voice  
> identification to support a limited
> form of limitation: the identification of which member of a group a  
> principal belongs to after the fact that
> the principal belongs to the group has been ascertained by other  
> means.  This is known as
> Speaker Verification and Identification.

Although a population claim is implicitly included in the process,  
there is no requirement that the principal be determined to belong to  
the population or a group within the population.  The verification/ 
identification resource may return as a result that the principal is  
not a member of the population.

>
> I found the initial discussion of Speaker Verification and  
> Identification in Section 11 a little confusing,
> and there is one sentence in particular that could be made more clear:
>
> The fourth  paragraph in that section begins:
>
> Speaker identification is the process of associating an unknown
>   speaker with a member in a population.  It does not employ a claim  
> of
>   identity.

Speaker identification does not employ a claim of identity (other than  
implicitly to the population).

>
> But the paragraph immediately before that starts
>
> In speaker verification, a recorded utterance is compared to a
>   previously stored voiceprint which is in turn associated with a
>   claimed identity for that user.
>

Speaker *verification* does employ a claim of identity.  "Speaker  
identification" identifies a speaker, while "speaker verification"  
verifies that a speaker is who he/she claims to be.

> That sounds like it *does* employ a claim of identity.
>
> The fourth paragraph goes on to say that speaker ID should
> be used when you already have verified that the speaker is a member
> of a group (e.g. by cryptographic means), and you want to verify which
> member of the group s/he is.  This suggests that

Actually, it says "When an individual claims to belong to a group  
(e.g., one of the owners of a joint bank account) a group  
authentication is performed."
Here is the way to think of it:  a voice authentication database  
contains a collection (called a population) of voice prints, one per  
individual who has been enrolled into the database as a member of the  
population.  Each individual may have a unique (non-private) key  
associated with his/her voiceprint, or may share a unique (non- 
private) key representing a group (an enumerated subset of the  
population), or both.
By most technology providers, identification is considered to be the  
process whereby one or more audio samples are compared to the  
voiceprints of the entire population to determine whether they match  
an individual in the population.  Because there is no claim of  
identity as an individual or subgroup of the population, this is  
commonly referred to as requiring "no claim of identity".  Note that  
in this case no keys (individual or group) need to be provided as  
input to the process, since all keys are implied.  Note also that if  
the audio samples do not match any of the voiceprints in the  
population, the resource will return a code indicating this.
By most technology providers, verification is considered to be the  
process whereby one or more audio samples are compared to the  
voiceprint of a given key to determine whether whether they match that  
specific voiceprint.  This is a claim of identity.  In this case the  
key (the claimed identity) needs to be provided as input to the  
process.  Note that if the audio samples do not match the referenced  
voiceprint, the resource vill return a code indicating this.
Some technology providers also provide something called multi- 
verification, where one or more audio samples are compared to multiple  
voiceprints (indicated by multiple keys) to determine whether the  
samples match any of the voiceprints.  This is essentially a claim  
that the samples are from a speaker whose voiceprint is referenced by  
one of the given keys.  As with verification, if the audio samples do  
not match any of the referenced voiceprints, the resource will return  
a code indicating failure to match.
Some technology providers also provide something called group  
identification, where one or more audio samples are compared to  
multiple voiceprints that are represented by a group key to determine  
whether the samples match any of the voiceprints associated with the  
group key.  Although this can be considered a claim of identity in the  
group, it is not a claim of a specific individual identity.  Upon a  
match the return value can be either the group key or the specific  
individual key within the group, depending upon the resource.  If the  
audio samples do not match a voiceprint of a member of the group, the  
resource will return a code indicating failure to match.
The implementation in MRCP of all of the above capabilities is a list  
of input keys, each of which may represent an individual or a group,  
as appropriate for the specific resource, and a list of output keys  
representing individuals and/or groups, as appropriate for the  
specific resource.  In the event that there is no match to any of the  
inputs (where an empty input list implies all keys are possible), an  
error code is returned.
In none of these scenarios is there a situation where a speaker must  
first be identified to be a member of a particular group.


>
> It does not employ a claim of
>   identity.
>
> really means that
>
> It does not provide a proof of identity by itself.
>
> If that is the case, it should say that.
>
> I also note that the speaker verification is restricted to  
> identifying the identity
> of someone who is already verified to be a member of a group.  This  
> suggests that attempting to use
> it without this prior verification is unsafe.  A quick scan through  
> RFC 4313 didn't turn up any references to this
> issue.  If it is unsafe, then the ID should say so, and if there is  
> a related requirement in RFC 4313 that should
> be referenced.  Also, I would recommend saying that speaker  
> verification MUST NOT be implemented without
> prior verification as a member of a group.
>
>
>
>
>
> Catherine Meadows
> Naval Research Laboratory
> Code 5543
> 4555 Overlook Ave., S.W.
> Washington DC, 20375
> phone: 202-767-3490
> fax: 202-404-7942
> email: catherine.meadows@nrl.navy.mil
>