Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio

Jean-Marc Valin <jean-marc.valin@octasic.com> Tue, 16 November 2010 18:52 UTC

Return-Path: <jean-marc.valin@octasic.com>
X-Original-To: avt@core3.amsl.com
Delivered-To: avt@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9CECF3A6DBF for <avt@core3.amsl.com>; Tue, 16 Nov 2010 10:52:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E6ZHdn7xoT71 for <avt@core3.amsl.com>; Tue, 16 Nov 2010 10:52:08 -0800 (PST)
Received: from toroondcbmts08-srv.bellnexxia.net (toroondcbmts08-srv.bellnexxia.net [207.236.237.42]) by core3.amsl.com (Postfix) with ESMTP id 1D5893A6E17 for <avt@ietf.org>; Tue, 16 Nov 2010 10:52:07 -0800 (PST)
Received: from toip54-bus.srvr.bell.ca ([67.69.240.140]) by toroondcbmts08-srv.bellnexxia.net (InterMail vM.8.00.01.00 201-2244-105-20090324) with ESMTP id <20101116185250.BEEC22242.toroondcbmts08-srv.bellnexxia.net@toip54-bus.srvr.bell.ca>; Tue, 16 Nov 2010 13:52:50 -0500
Received: from toip52-bus.srvr.bell.ca ([67.69.240.55]) by toip54-bus.srvr.bell.ca with ESMTP; 16 Nov 2010 13:52:44 -0500
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEALJg4kzPPaAN/2dsb2JhbACiYHHALIJqGwiCPgSEWokKBhQ
Received: from unknown (HELO MAILEXCH.octasic.com) ([207.61.160.13]) by toip52-bus.srvr.bell.ca with ESMTP; 16 Nov 2010 13:52:44 -0500
Received: from [10.100.60.27] ([10.100.60.27]) by MAILEXCH.octasic.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 16 Nov 2010 13:52:44 -0500
Message-ID: <4CE2D2FC.2000203@octasic.com>
Date: Tue, 16 Nov 2010 13:52:44 -0500
From: Jean-Marc Valin <jean-marc.valin@octasic.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10
MIME-Version: 1.0
To: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
References: <DBB1DC060375D147AC43F310AD987DCC1DEA81037C@ESESSCMS0366.eemea.ericsson.se>
In-Reply-To: <DBB1DC060375D147AC43F310AD987DCC1DEA81037C@ESESSCMS0366.eemea.ericsson.se>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
X-OriginalArrivalTime: 16 Nov 2010 18:52:44.0145 (UTC) FILETIME=[73FDC610:01CB85BF]
Cc: "avt@ietf.org" <avt@ietf.org>, Colin Perkins <csp@csperkins.org>
Subject: Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2010 18:52:09 -0000

Hi Ingemar,

On 10-11-16 07:17 AM, Ingemar Johansson S wrote:
> Today it may be quite far fetched to imagine that much useful
> information can be extracted this way (I even have problems to get our
> automated speech recognition exchange understand me...). However, anyone
> who has read a spy novel by Tom Clancy or John le Carré realize that
> eavesdropping is more or less picking fragments of information from many
> different sources (including trash-bins). This in combination with
> Moores law says that implementors should be aware of this issue.

I think you've pretty much summed up the idea we were trying to convey. 
Conversational speech recognition is indeed hard enough when we have the 
audio that recognizing from VBR is quite far-fetched unless the vocabulary 
is highly constrained.

On the other hand, the real worry I have with VBR and VAD is for 
pre-recorded prompts like you have in an IVR. If an attacker knows the IVR 
prompts (e.g. has an account at the same bank as you), then they can 
extract patterns that are very precise and obtain 100% identification on 
the known prompts. This is the case even with the counter-measures 
described in the draft. On the other hand, anything that isn't pre-recorded 
is not something that worries me too much.

> Section 4: My personal feeling is that the recommendations are too far,
> the idea is that an eavesdropper can extract information from the length
> of the talk spurts. This to me sounds like a much more difficult task
> than the VBR case (which is difficult already that). A hangover of e.g
> 1s is very likely to give 100% activity time for speakers of a
> particular south-european nationality :-)

Any suggestion for a reasonable the hangover half-life?

> Section 5: A variation to the padding is to randomly pad a fraction of
> the packets up to a large size (less or equal to the largest packet size
> from the codec) than , this, I believe should confuse the eavesdropper
> algorithm considerably. It is possible that the same can be applied to
> VAD case in section 4 as well.

I'm not sure what you are suggesting. I think VAD is a bit different from 
VBR because it's a binary decision. If you decide not to send a packet 
based on the VAD data, then you can't just do padding. Similarly, if the 
VAD triggers a low-rate mode, then the padding would have to be high enough 
to make the packets indistinguishable from high-rate mode, which means it 
acts as an overhang. Or maybe I didn't quite understandand what you were 
suggesting.

Cheers,

     Jean-Marc