Re: [secdir] secdir review of draft-ietf-speechsc-mrcpv2
Dan Burnett <dburnett@voxeo.com> Wed, 14 July 2010 13:19 UTC
Return-Path: <dburnett@voxeo.com>
X-Original-To: secdir@core3.amsl.com
Delivered-To: secdir@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 143A23A681C; Wed, 14 Jul 2010 06:19:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Level:
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VqrHGh+iE6kO; Wed, 14 Jul 2010 06:19:47 -0700 (PDT)
Received: from voxeo.com (mmail.voxeo.com [66.193.54.208]) by core3.amsl.com (Postfix) with ESMTP id 55FC83A68B7; Wed, 14 Jul 2010 06:19:47 -0700 (PDT)
Received: from [97.120.147.6] (account dburnett@voxeo.com HELO [192.168.0.7]) by voxeo.com (CommuniGate Pro SMTP 5.3.8) with ESMTPSA id 67724911; Wed, 14 Jul 2010 13:19:55 +0000
Message-Id: <769E404D-4D62-494F-9594-80F38CD922DF@voxeo.com>
From: Dan Burnett <dburnett@voxeo.com>
To: Catherine Meadows <catherine.meadows@nrl.navy.mil>
In-Reply-To: <51173F8E-94BF-4347-B7A8-909BA5433443@nrl.navy.mil>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Wed, 14 Jul 2010 09:19:47 -0400
References: <51173F8E-94BF-4347-B7A8-909BA5433443@nrl.navy.mil>
X-Mailer: Apple Mail (2.936)
X-Mailman-Approved-At: Sun, 18 Jul 2010 12:34:39 -0700
Cc: sarvi@cisco.com, oran@cisco.com, iesg@ietf.org, eburger@standardstrack.com, secdir@ietf.org
Subject: Re: [secdir] secdir review of draft-ietf-speechsc-mrcpv2
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/secdir>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jul 2010 13:19:49 -0000
Thank you for your comments. Our intent for the text does not exactly match the interpretation you give below. The general reply I have is that Speaker Verification and Speaker Identification are two different processes, one of which employs a claim of identity and one of which does not. Neither requires prior verification of the speaker being a member of a group. Please see my specific replies, embedded below. Dan Burnett On Jul 9, 2010, at 9:21 PM, Catherine Meadows wrote: > I have reviewed this document as part of the security directorate's > ongoing effort to review all IETF documents being processed by the > IESG. > These comments were written primarily for the benefit of the > security area directors. > Document editors and WG chairs should treat these comments just like > any other last call comments. > > This draft describes the Media Resource Control Protocol Version 2 > (MRCPv2) > which allows client hosts to control media service resources > residing in servers on a network. > MRCPv2 makes use of the Session Initiation Protocol (SIP) to > initiate and manage sessions > and the Session Description Protocol (SDP) to manage and exchange > capabilities. Both clients > and servers rely on TLS for security. > > Most of the security requirements for this protocol are similar to > requirements for any protocol > that manages control data, some of which must be sensitive. These > are outlined in the Security > Considerations section. MRCPv2 also supports the use of voice > identification to support a limited > form of limitation: the identification of which member of a group a > principal belongs to after the fact that > the principal belongs to the group has been ascertained by other > means. This is known as > Speaker Verification and Identification. Although a population claim is implicitly included in the process, there is no requirement that the principal be determined to belong to the population or a group within the population. The verification/ identification resource may return as a result that the principal is not a member of the population. > > I found the initial discussion of Speaker Verification and > Identification in Section 11 a little confusing, > and there is one sentence in particular that could be made more clear: > > The fourth paragraph in that section begins: > > Speaker identification is the process of associating an unknown > speaker with a member in a population. It does not employ a claim > of > identity. Speaker identification does not employ a claim of identity (other than implicitly to the population). > > But the paragraph immediately before that starts > > In speaker verification, a recorded utterance is compared to a > previously stored voiceprint which is in turn associated with a > claimed identity for that user. > Speaker *verification* does employ a claim of identity. "Speaker identification" identifies a speaker, while "speaker verification" verifies that a speaker is who he/she claims to be. > That sounds like it *does* employ a claim of identity. > > The fourth paragraph goes on to say that speaker ID should > be used when you already have verified that the speaker is a member > of a group (e.g. by cryptographic means), and you want to verify which > member of the group s/he is. This suggests that Actually, it says "When an individual claims to belong to a group (e.g., one of the owners of a joint bank account) a group authentication is performed." Here is the way to think of it: a voice authentication database contains a collection (called a population) of voice prints, one per individual who has been enrolled into the database as a member of the population. Each individual may have a unique (non-private) key associated with his/her voiceprint, or may share a unique (non- private) key representing a group (an enumerated subset of the population), or both. By most technology providers, identification is considered to be the process whereby one or more audio samples are compared to the voiceprints of the entire population to determine whether they match an individual in the population. Because there is no claim of identity as an individual or subgroup of the population, this is commonly referred to as requiring "no claim of identity". Note that in this case no keys (individual or group) need to be provided as input to the process, since all keys are implied. Note also that if the audio samples do not match any of the voiceprints in the population, the resource will return a code indicating this. By most technology providers, verification is considered to be the process whereby one or more audio samples are compared to the voiceprint of a given key to determine whether whether they match that specific voiceprint. This is a claim of identity. In this case the key (the claimed identity) needs to be provided as input to the process. Note that if the audio samples do not match the referenced voiceprint, the resource vill return a code indicating this. Some technology providers also provide something called multi- verification, where one or more audio samples are compared to multiple voiceprints (indicated by multiple keys) to determine whether the samples match any of the voiceprints. This is essentially a claim that the samples are from a speaker whose voiceprint is referenced by one of the given keys. As with verification, if the audio samples do not match any of the referenced voiceprints, the resource will return a code indicating failure to match. Some technology providers also provide something called group identification, where one or more audio samples are compared to multiple voiceprints that are represented by a group key to determine whether the samples match any of the voiceprints associated with the group key. Although this can be considered a claim of identity in the group, it is not a claim of a specific individual identity. Upon a match the return value can be either the group key or the specific individual key within the group, depending upon the resource. If the audio samples do not match a voiceprint of a member of the group, the resource will return a code indicating failure to match. The implementation in MRCP of all of the above capabilities is a list of input keys, each of which may represent an individual or a group, as appropriate for the specific resource, and a list of output keys representing individuals and/or groups, as appropriate for the specific resource. In the event that there is no match to any of the inputs (where an empty input list implies all keys are possible), an error code is returned. In none of these scenarios is there a situation where a speaker must first be identified to be a member of a particular group. > > It does not employ a claim of > identity. > > really means that > > It does not provide a proof of identity by itself. > > If that is the case, it should say that. > > I also note that the speaker verification is restricted to > identifying the identity > of someone who is already verified to be a member of a group. This > suggests that attempting to use > it without this prior verification is unsafe. A quick scan through > RFC 4313 didn't turn up any references to this > issue. If it is unsafe, then the ID should say so, and if there is > a related requirement in RFC 4313 that should > be referenced. Also, I would recommend saying that speaker > verification MUST NOT be implemented without > prior verification as a member of a group. > > > > > > Catherine Meadows > Naval Research Laboratory > Code 5543 > 4555 Overlook Ave., S.W. > Washington DC, 20375 > phone: 202-767-3490 > fax: 202-404-7942 > email: catherine.meadows@nrl.navy.mil >
- [secdir] secdir review of draft-ietf-speechsc-mrc… Catherine Meadows
- Re: [secdir] secdir review of draft-ietf-speechsc… Dan Burnett