Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio

Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Wed, 17 November 2010 07:44 UTC

Return-Path: <ingemar.s.johansson@ericsson.com>
X-Original-To: avt@core3.amsl.com
Delivered-To: avt@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7409B3A680C for <avt@core3.amsl.com>; Tue, 16 Nov 2010 23:44:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.519
X-Spam-Level:
X-Spam-Status: No, score=-6.519 tagged_above=-999 required=5 tests=[AWL=0.080, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WeEDMvJ0Ymsv for <avt@core3.amsl.com>; Tue, 16 Nov 2010 23:44:19 -0800 (PST)
Received: from mailgw10.se.ericsson.net (mailgw10.se.ericsson.net [193.180.251.61]) by core3.amsl.com (Postfix) with ESMTP id 2277B3A68B9 for <avt@ietf.org>; Tue, 16 Nov 2010 23:44:18 -0800 (PST)
X-AuditID: c1b4fb3d-b7c05ae0000028e7-03-4ce387ff4cc9
Received: from esessmw0247.eemea.ericsson.se (Unknown_Domain [153.88.253.125]) by mailgw10.se.ericsson.net (Symantec Mail Security) with SMTP id 11.CA.10471.FF783EC4; Wed, 17 Nov 2010 08:45:03 +0100 (CET)
Received: from ESESSCMS0366.eemea.ericsson.se ([169.254.1.174]) by esessmw0247.eemea.ericsson.se ([10.2.3.116]) with mapi; Wed, 17 Nov 2010 08:45:03 +0100
From: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
To: Jean-Marc Valin <jean-marc.valin@octasic.com>
Date: Wed, 17 Nov 2010 08:45:02 +0100
Thread-Topic: Comments on draft-perkins-avt-srtp-vbr-audio
Thread-Index: AcuFv3is1687e6c1QfKFvdOmYWhrBQAaS35A
Message-ID: <DBB1DC060375D147AC43F310AD987DCC1DEA8105FE@ESESSCMS0366.eemea.ericsson.se>
References: <DBB1DC060375D147AC43F310AD987DCC1DEA81037C@ESESSCMS0366.eemea.ericsson.se> <4CE2D2FC.2000203@octasic.com>
In-Reply-To: <4CE2D2FC.2000203@octasic.com>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: sv-SE, en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Brightmail-Tracker: AAAAAA==
Cc: "avt@ietf.org" <avt@ietf.org>, Colin Perkins <csp@csperkins.org>
Subject: Re: [AVT] Comments on draft-perkins-avt-srtp-vbr-audio
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Nov 2010 07:44:20 -0000

Hi

Answers inline below

/Ingemar 

> -----Original Message-----
> From: Jean-Marc Valin [mailto:jean-marc.valin@octasic.com] 
> Sent: den 16 november 2010 19:53
> To: Ingemar Johansson S
> Cc: avt@ietf.org; Colin Perkins
> Subject: Re: Comments on draft-perkins-avt-srtp-vbr-audio
> 
> Hi Ingemar,
> 
> On 10-11-16 07:17 AM, Ingemar Johansson S wrote:
> > Today it may be quite far fetched to imagine that much useful 
> > information can be extracted this way (I even have problems 
> to get our 
> > automated speech recognition exchange understand me...). However, 
> > anyone who has read a spy novel by Tom Clancy or John le 
> Carré realize 
> > that eavesdropping is more or less picking fragments of information 
> > from many different sources (including trash-bins). This in 
> > combination with Moores law says that implementors should 
> be aware of this issue.
> 
> I think you've pretty much summed up the idea we were trying 
> to convey. 
> Conversational speech recognition is indeed hard enough when 
> we have the audio that recognizing from VBR is quite 
> far-fetched unless the vocabulary is highly constrained.
> 
> On the other hand, the real worry I have with VBR and VAD is 
> for pre-recorded prompts like you have in an IVR. If an 
> attacker knows the IVR prompts (e.g. has an account at the 
> same bank as you), then they can extract patterns that are 
> very precise and obtain 100% identification on the known 
> prompts. This is the case even with the counter-measures 
> described in the draft. On the other hand, anything that 
> isn't pre-recorded is not something that worries me too much.
Then I would say that a general recommendation is to always use padding for VBR or turn off DTX in the case you describe above. This should be a very small amount of the total traffic volume so one don't need to worry about increased network load.
How is the padding for the VBR case negotiated ?. I guess you don't intend to negotiate this or ?, DTX can be turned off but the padding is not possible to negotiate today

> 
> > Section 4: My personal feeling is that the recommendations are too 
> > far, the idea is that an eavesdropper can extract 
> information from the 
> > length of the talk spurts. This to me sounds like a much more 
> > difficult task than the VBR case (which is difficult 
> already that). A 
> > hangover of e.g 1s is very likely to give 100% activity time for 
> > speakers of a particular south-european nationality :-)
> 
> Any suggestion for a reasonable the hangover half-life?
I would actually prefer no extra hangover other than that already inherent in the codec. This is more due to (imagined) network load reasons than security reasons however as I believe the security concerns are not that severe. If we think secuity then why not just recommend to turn off DTX completely for very sensitive applications ?.  A 1s overhang will IMHO drive the voice activity factor so high so one may aswell turn off DTX completely .


> 
> > Section 5: A variation to the padding is to randomly pad a 
> fraction of 
> > the packets up to a large size (less or equal to the largest packet 
> > size from the codec) than , this, I believe should confuse the 
> > eavesdropper algorithm considerably. It is possible that 
> the same can 
> > be applied to VAD case in section 4 as well.
> 
> I'm not sure what you are suggesting. I think VAD is a bit 
> different from VBR because it's a binary decision. If you 
> decide not to send a packet based on the VAD data, then you 
> can't just do padding. Similarly, if the VAD triggers a 
> low-rate mode, then the padding would have to be high enough 
> to make the packets indistinguishable from high-rate mode, 
> which means it acts as an overhang. Or maybe I didn't quite 
> understandand what you were suggesting.
Hmm, you are right (brain fart). I was thinking like for instance an AMR case where you can actually transmit frames of type NO_DATA after the SID_FIRST frame but unless you make it an overhang (like you already suggested) you still get "holes" which are easily detected by the eavesdropper.


> 
> Cheers,
> 
>      Jean-Marc
>