Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio

Jean-Marc Valin <jean-marc.valin@octasic.com> Tue, 16 November 2010 18:52 UTC

Message-ID: <4CE2D2FC.2000203@octasic.com>
Date: Tue, 16 Nov 2010 13:52:44 -0500
From: Jean-Marc Valin <jean-marc.valin@octasic.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10
MIME-Version: 1.0
To: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
References: <DBB1DC060375D147AC43F310AD987DCC1DEA81037C@ESESSCMS0366.eemea.ericsson.se>
In-Reply-To: <DBB1DC060375D147AC43F310AD987DCC1DEA81037C@ESESSCMS0366.eemea.ericsson.se>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: "avt@ietf.org" <avt@ietf.org>, Colin Perkins <csp@csperkins.org>
Subject: Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio
Precedence: list

Hi Ingemar,

On 10-11-16 07:17 AM, Ingemar Johansson S wrote:
> Today it may be quite far fetched to imagine that much useful
> information can be extracted this way (I even have problems to get our
> automated speech recognition exchange understand me...). However, anyone
> who has read a spy novel by Tom Clancy or John le Carré realize that
> eavesdropping is more or less picking fragments of information from many
> different sources (including trash-bins). This in combination with
> Moores law says that implementors should be aware of this issue.

I think you've pretty much summed up the idea we were trying to convey. 
Conversational speech recognition is indeed hard enough when we have the 
audio that recognizing from VBR is quite far-fetched unless the vocabulary 
is highly constrained.

On the other hand, the real worry I have with VBR and VAD is for 
pre-recorded prompts like you have in an IVR. If an attacker knows the IVR 
prompts (e.g. has an account at the same bank as you), then they can 
extract patterns that are very precise and obtain 100% identification on 
the known prompts. This is the case even with the counter-measures 
described in the draft. On the other hand, anything that isn't pre-recorded 
is not something that worries me too much.

> Section 4: My personal feeling is that the recommendations are too far,
> the idea is that an eavesdropper can extract information from the length
> of the talk spurts. This to me sounds like a much more difficult task
> than the VBR case (which is difficult already that). A hangover of e.g
> 1s is very likely to give 100% activity time for speakers of a
> particular south-european nationality :-)

Any suggestion for a reasonable the hangover half-life?

> Section 5: A variation to the padding is to randomly pad a fraction of
> the packets up to a large size (less or equal to the largest packet size
> from the codec) than , this, I believe should confuse the eavesdropper
> algorithm considerably. It is possible that the same can be applied to
> VAD case in section 4 as well.

I'm not sure what you are suggesting. I think VAD is a bit different from 
VBR because it's a binary decision. If you decide not to send a packet 
based on the VAD data, then you can't just do padding. Similarly, if the 
VAD triggers a low-rate mode, then the padding would have to be high enough 
to make the packets indistinguishable from high-rate mode, which means it 
acts as an overhang. Or maybe I didn't quite understandand what you were 
suggesting.

Cheers,

     Jean-Marc

[AVT] Comments on draft-perkins-avt-srtp-vbr-audio Ingemar Johansson S
Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-… Jean-Marc Valin
Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-… Ingemar Johansson S
Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-… Colin Perkins