Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio

Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Wed, 17 November 2010 07:44 UTC

From: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
To: Jean-Marc Valin <jean-marc.valin@octasic.com>
Date: Wed, 17 Nov 2010 08:45:02 +0100
Thread-Topic: Comments on draft-perkins-avt-srtp-vbr-audio
Thread-Index: AcuFv3is1687e6c1QfKFvdOmYWhrBQAaS35A
Message-ID: <DBB1DC060375D147AC43F310AD987DCC1DEA8105FE@ESESSCMS0366.eemea.ericsson.se>
References: <DBB1DC060375D147AC43F310AD987DCC1DEA81037C@ESESSCMS0366.eemea.ericsson.se> <4CE2D2FC.2000203@octasic.com>
In-Reply-To: <4CE2D2FC.2000203@octasic.com>
Accept-Language: sv-SE, en-US
Content-Language: en-US
acceptlanguage: sv-SE, en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "avt@ietf.org" <avt@ietf.org>, Colin Perkins <csp@csperkins.org>
Subject: Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio
Precedence: list

Hi

Answers inline below

/Ingemar 

> -----Original Message-----
> From: Jean-Marc Valin [mailto:jean-marc.valin@octasic.com] 
> Sent: den 16 november 2010 19:53
> To: Ingemar Johansson S
> Cc: avt@ietf.org; Colin Perkins
> Subject: Re: Comments on draft-perkins-avt-srtp-vbr-audio
> 
> Hi Ingemar,
> 
> On 10-11-16 07:17 AM, Ingemar Johansson S wrote:
> > Today it may be quite far fetched to imagine that much useful 
> > information can be extracted this way (I even have problems 
> to get our 
> > automated speech recognition exchange understand me...). However, 
> > anyone who has read a spy novel by Tom Clancy or John le 
> Carré realize 
> > that eavesdropping is more or less picking fragments of information 
> > from many different sources (including trash-bins). This in 
> > combination with Moores law says that implementors should 
> be aware of this issue.
> 
> I think you've pretty much summed up the idea we were trying 
> to convey. 
> Conversational speech recognition is indeed hard enough when 
> we have the audio that recognizing from VBR is quite 
> far-fetched unless the vocabulary is highly constrained.
> 
> On the other hand, the real worry I have with VBR and VAD is 
> for pre-recorded prompts like you have in an IVR. If an 
> attacker knows the IVR prompts (e.g. has an account at the 
> same bank as you), then they can extract patterns that are 
> very precise and obtain 100% identification on the known 
> prompts. This is the case even with the counter-measures 
> described in the draft. On the other hand, anything that 
> isn't pre-recorded is not something that worries me too much.
Then I would say that a general recommendation is to always use padding for VBR or turn off DTX in the case you describe above. This should be a very small amount of the total traffic volume so one don't need to worry about increased network load.
How is the padding for the VBR case negotiated ?. I guess you don't intend to negotiate this or ?, DTX can be turned off but the padding is not possible to negotiate today

> 
> > Section 4: My personal feeling is that the recommendations are too 
> > far, the idea is that an eavesdropper can extract 
> information from the 
> > length of the talk spurts. This to me sounds like a much more 
> > difficult task than the VBR case (which is difficult 
> already that). A 
> > hangover of e.g 1s is very likely to give 100% activity time for 
> > speakers of a particular south-european nationality :-)
> 
> Any suggestion for a reasonable the hangover half-life?
I would actually prefer no extra hangover other than that already inherent in the codec. This is more due to (imagined) network load reasons than security reasons however as I believe the security concerns are not that severe. If we think secuity then why not just recommend to turn off DTX completely for very sensitive applications ?.  A 1s overhang will IMHO drive the voice activity factor so high so one may aswell turn off DTX completely .


> 
> > Section 5: A variation to the padding is to randomly pad a 
> fraction of 
> > the packets up to a large size (less or equal to the largest packet 
> > size from the codec) than , this, I believe should confuse the 
> > eavesdropper algorithm considerably. It is possible that 
> the same can 
> > be applied to VAD case in section 4 as well.
> 
> I'm not sure what you are suggesting. I think VAD is a bit 
> different from VBR because it's a binary decision. If you 
> decide not to send a packet based on the VAD data, then you 
> can't just do padding. Similarly, if the VAD triggers a 
> low-rate mode, then the padding would have to be high enough 
> to make the packets indistinguishable from high-rate mode, 
> which means it acts as an overhang. Or maybe I didn't quite 
> understandand what you were suggesting.
Hmm, you are right (brain fart). I was thinking like for instance an AMR case where you can actually transmit frames of type NO_DATA after the SID_FIRST frame but unless you make it an overhang (like you already suggested) you still get "holes" which are easily detected by the eavesdropper.


> 
> Cheers,
> 
>      Jean-Marc
>

[AVT] Comments on draft-perkins-avt-srtp-vbr-audio Ingemar Johansson S
Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-… Jean-Marc Valin
Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-… Ingemar Johansson S
Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-… Colin Perkins