Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio

Hi,

On 17 Nov 2010, at 07:45, Ingemar Johansson S wrote:
> Hi
> 
> Answers inline below
> 
> /Ingemar 
> 
>> -----Original Message-----
>> From: Jean-Marc Valin [mailto:jean-marc.valin@octasic.com] 
>> Sent: den 16 november 2010 19:53
>> To: Ingemar Johansson S
>> Cc: avt@ietf.org; Colin Perkins
>> Subject: Re: Comments on draft-perkins-avt-srtp-vbr-audio
>> 
>> Hi Ingemar,
>> 
>> On 10-11-16 07:17 AM, Ingemar Johansson S wrote:
>>> Today it may be quite far fetched to imagine that much useful 
>>> information can be extracted this way (I even have problems 
>> to get our 
>>> automated speech recognition exchange understand me...). However, 
>>> anyone who has read a spy novel by Tom Clancy or John le 
>> Carré realize 
>>> that eavesdropping is more or less picking fragments of information 
>>> from many different sources (including trash-bins). This in 
>>> combination with Moores law says that implementors should 
>> be aware of this issue.
>> 
>> I think you've pretty much summed up the idea we were trying 
>> to convey. 
>> Conversational speech recognition is indeed hard enough when 
>> we have the audio that recognizing from VBR is quite 
>> far-fetched unless the vocabulary is highly constrained.
>> 
>> On the other hand, the real worry I have with VBR and VAD is 
>> for pre-recorded prompts like you have in an IVR. If an 
>> attacker knows the IVR prompts (e.g. has an account at the 
>> same bank as you), then they can extract patterns that are 
>> very precise and obtain 100% identification on the known 
>> prompts. This is the case even with the counter-measures 
>> described in the draft. On the other hand, anything that 
>> isn't pre-recorded is not something that worries me too much.
> Then I would say that a general recommendation is to always use padding for VBR or turn off DTX in the case you describe above. This should be a very small amount of the total traffic volume so one don't need to worry about increased network load.
> How is the padding for the VBR case negotiated ?. I guess you don't intend to negotiate this or ?, DTX can be turned off but the padding is not possible to negotiate today
> 
>> 
>>> Section 4: My personal feeling is that the recommendations are too 
>>> far, the idea is that an eavesdropper can extract 
>> information from the 
>>> length of the talk spurts. This to me sounds like a much more 
>>> difficult task than the VBR case (which is difficult 
>> already that). A 
>>> hangover of e.g 1s is very likely to give 100% activity time for 
>>> speakers of a particular south-european nationality :-)
>> 
>> Any suggestion for a reasonable the hangover half-life?
> I would actually prefer no extra hangover other than that already inherent in the codec. This is more due to (imagined) network load reasons than security reasons however as I believe the security concerns are not that severe. If we think secuity then why not just recommend to turn off DTX completely for very sensitive applications ?.  A 1s overhang will IMHO drive the voice activity factor so high so one may aswell turn off DTX completely .

I just submitted -05 that attempts to give some more nuanced guidance here. Feedback would be appreciated.

Cheers,
Colin

>> 
>>> Section 5: A variation to the padding is to randomly pad a 
>> fraction of 
>>> the packets up to a large size (less or equal to the largest packet 
>>> size from the codec) than , this, I believe should confuse the 
>>> eavesdropper algorithm considerably. It is possible that 
>> the same can 
>>> be applied to VAD case in section 4 as well.
>> 
>> I'm not sure what you are suggesting. I think VAD is a bit 
>> different from VBR because it's a binary decision. If you 
>> decide not to send a packet based on the VAD data, then you 
>> can't just do padding. Similarly, if the VAD triggers a 
>> low-rate mode, then the padding would have to be high enough 
>> to make the packets indistinguishable from high-rate mode, 
>> which means it acts as an overhang. Or maybe I didn't quite 
>> understandand what you were suggesting.
> Hmm, you are right (brain fart). I was thinking like for instance an AMR case where you can actually transmit frames of type NO_DATA after the SID_FIRST frame but unless you make it an overhang (like you already suggested) you still get "holes" which are easily detected by the eavesdropper.
> 
> 
>> 
>> Cheers,
>> 
>>     Jean-Marc
>> 
> _______________________________________________
> Audio/Video Transport Working Group
> avt@ietf.org
> https://www.ietf.org/mailman/listinfo/avt

-- 
Colin Perkins
http://csperkins.org/