Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio

Colin Perkins <csp@csperkins.org> Sun, 12 December 2010 13:46 UTC

Return-Path: <csp@csperkins.org>
X-Original-To: avt@core3.amsl.com
Delivered-To: avt@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9F8C33A6DBA for <avt@core3.amsl.com>; Sun, 12 Dec 2010 05:46:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.599
X-Spam-Level:
X-Spam-Status: No, score=-103.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DjDI8hqNchjU for <avt@core3.amsl.com>; Sun, 12 Dec 2010 05:46:30 -0800 (PST)
Received: from anchor-msapost-1.mail.demon.net (anchor-msapost-1.mail.demon.net [195.173.77.164]) by core3.amsl.com (Postfix) with ESMTP id 1D9B83A6DB8 for <avt@ietf.org>; Sun, 12 Dec 2010 05:46:30 -0800 (PST)
Received: from starkperkins.demon.co.uk ([80.176.158.71] helo=[192.168.0.22]) by anchor-post-1.mail.demon.net with esmtpsa (AUTH csperkins-dwh) (TLSv1:AES128-SHA:128) (Exim 4.69) id 1PRmHU-0005UX-hk; Sun, 12 Dec 2010 13:48:05 +0000
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset="iso-8859-1"
From: Colin Perkins <csp@csperkins.org>
In-Reply-To: <DBB1DC060375D147AC43F310AD987DCC1DEA8105FE@ESESSCMS0366.eemea.ericsson.se>
Date: Sun, 12 Dec 2010 13:47:58 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <94B9F600-7846-400A-874A-08D1E0C732BB@csperkins.org>
References: <DBB1DC060375D147AC43F310AD987DCC1DEA81037C@ESESSCMS0366.eemea.ericsson.se> <4CE2D2FC.2000203@octasic.com> <DBB1DC060375D147AC43F310AD987DCC1DEA8105FE@ESESSCMS0366.eemea.ericsson.se>
To: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
X-Mailer: Apple Mail (2.1082)
Cc: "avt@ietf.org" <avt@ietf.org>
Subject: Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Dec 2010 13:46:31 -0000

Hi,

On 17 Nov 2010, at 07:45, Ingemar Johansson S wrote:
> Hi
> 
> Answers inline below
> 
> /Ingemar 
> 
>> -----Original Message-----
>> From: Jean-Marc Valin [mailto:jean-marc.valin@octasic.com] 
>> Sent: den 16 november 2010 19:53
>> To: Ingemar Johansson S
>> Cc: avt@ietf.org; Colin Perkins
>> Subject: Re: Comments on draft-perkins-avt-srtp-vbr-audio
>> 
>> Hi Ingemar,
>> 
>> On 10-11-16 07:17 AM, Ingemar Johansson S wrote:
>>> Today it may be quite far fetched to imagine that much useful 
>>> information can be extracted this way (I even have problems 
>> to get our 
>>> automated speech recognition exchange understand me...). However, 
>>> anyone who has read a spy novel by Tom Clancy or John le 
>> Carré realize 
>>> that eavesdropping is more or less picking fragments of information 
>>> from many different sources (including trash-bins). This in 
>>> combination with Moores law says that implementors should 
>> be aware of this issue.
>> 
>> I think you've pretty much summed up the idea we were trying 
>> to convey. 
>> Conversational speech recognition is indeed hard enough when 
>> we have the audio that recognizing from VBR is quite 
>> far-fetched unless the vocabulary is highly constrained.
>> 
>> On the other hand, the real worry I have with VBR and VAD is 
>> for pre-recorded prompts like you have in an IVR. If an 
>> attacker knows the IVR prompts (e.g. has an account at the 
>> same bank as you), then they can extract patterns that are 
>> very precise and obtain 100% identification on the known 
>> prompts. This is the case even with the counter-measures 
>> described in the draft. On the other hand, anything that 
>> isn't pre-recorded is not something that worries me too much.
> Then I would say that a general recommendation is to always use padding for VBR or turn off DTX in the case you describe above. This should be a very small amount of the total traffic volume so one don't need to worry about increased network load.
> How is the padding for the VBR case negotiated ?. I guess you don't intend to negotiate this or ?, DTX can be turned off but the padding is not possible to negotiate today
> 
>> 
>>> Section 4: My personal feeling is that the recommendations are too 
>>> far, the idea is that an eavesdropper can extract 
>> information from the 
>>> length of the talk spurts. This to me sounds like a much more 
>>> difficult task than the VBR case (which is difficult 
>> already that). A 
>>> hangover of e.g 1s is very likely to give 100% activity time for 
>>> speakers of a particular south-european nationality :-)
>> 
>> Any suggestion for a reasonable the hangover half-life?
> I would actually prefer no extra hangover other than that already inherent in the codec. This is more due to (imagined) network load reasons than security reasons however as I believe the security concerns are not that severe. If we think secuity then why not just recommend to turn off DTX completely for very sensitive applications ?.  A 1s overhang will IMHO drive the voice activity factor so high so one may aswell turn off DTX completely .

I just submitted -05 that attempts to give some more nuanced guidance here. Feedback would be appreciated.

Cheers,
Colin



>> 
>>> Section 5: A variation to the padding is to randomly pad a 
>> fraction of 
>>> the packets up to a large size (less or equal to the largest packet 
>>> size from the codec) than , this, I believe should confuse the 
>>> eavesdropper algorithm considerably. It is possible that 
>> the same can 
>>> be applied to VAD case in section 4 as well.
>> 
>> I'm not sure what you are suggesting. I think VAD is a bit 
>> different from VBR because it's a binary decision. If you 
>> decide not to send a packet based on the VAD data, then you 
>> can't just do padding. Similarly, if the VAD triggers a 
>> low-rate mode, then the padding would have to be high enough 
>> to make the packets indistinguishable from high-rate mode, 
>> which means it acts as an overhang. Or maybe I didn't quite 
>> understandand what you were suggesting.
> Hmm, you are right (brain fart). I was thinking like for instance an AMR case where you can actually transmit frames of type NO_DATA after the SID_FIRST frame but unless you make it an overhang (like you already suggested) you still get "holes" which are easily detected by the eavesdropper.
> 
> 
>> 
>> Cheers,
>> 
>>     Jean-Marc
>> 
> _______________________________________________
> Audio/Video Transport Working Group
> avt@ietf.org
> https://www.ietf.org/mailman/listinfo/avt



-- 
Colin Perkins
http://csperkins.org/