Re: [Cfrg] questions on performance and side channel resistance for ChaCha20 and Poly1305 for IPsec and TLS

David McGrew <mcgrew@cisco.com> Fri, 24 January 2014 13:48 UTC

Return-Path: <mcgrew@cisco.com>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C58691A03DE for <cfrg@ietfa.amsl.com>; Fri, 24 Jan 2014 05:48:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.036
X-Spam-Level:
X-Spam-Status: No, score=-10.036 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.535, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1-vlXrujrjP1 for <cfrg@ietfa.amsl.com>; Fri, 24 Jan 2014 05:48:17 -0800 (PST)
Received: from aer-iport-1.cisco.com (aer-iport-1.cisco.com [173.38.203.51]) by ietfa.amsl.com (Postfix) with ESMTP id 28A961A0465 for <cfrg@irtf.org>; Fri, 24 Jan 2014 05:45:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=7146; q=dns/txt; s=iport; t=1390571139; x=1391780739; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to:content-transfer-encoding; bh=UvzgOqd9SC+QimfMpDaKg5zm5mAeVbQSw9SYrT3zC1g=; b=QqR6cKbAj5aSKL1chEWnQ7jDxmejjN4bxKG/zUJtwnxvz/sCItOcf5a2 9x6bq1ARjkGCygbrJvPMRx6gxEe5LEEgK9c4dJbaegoHriS7wMnmSuhMp 2UTRuVj8pq0fZl3+QamuOWXUd3EeSto8bZUltNZlBiN+SSkElhScDlEG2 M=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ag4FAL9s4lKQ/khR/2dsb2JhbABagwyEDLkvCYELFnSCJQEBAQMBIxUzDQEFCwsOCgICBRYEBwICCQMCAQIBRQYNAQUCAhWHZAirOZxpF4EpjHgDGk4Hgm+BSQEDiUiOX4ZHi1eBb4FcHg
X-IronPort-AV: E=Sophos;i="4.95,712,1384300800"; d="scan'208";a="4131292"
Received: from ams-core-1.cisco.com ([144.254.72.81]) by aer-iport-1.cisco.com with ESMTP; 24 Jan 2014 13:45:37 +0000
Received: from [10.0.2.15] ([10.148.144.89]) by ams-core-1.cisco.com (8.14.5/8.14.5) with ESMTP id s0ODjbFh026255; Fri, 24 Jan 2014 13:45:37 GMT
Message-ID: <52E26E81.4080204@cisco.com>
Date: Fri, 24 Jan 2014 08:45:37 -0500
From: David McGrew <mcgrew@cisco.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130922 Icedove/17.0.9
MIME-Version: 1.0
To: Adam Langley <agl@google.com>
References: <180998C7-B6E5-489E-9C79-80D9CAC0DE68@checkpoint.com> <CAL9PXLy9hrq+i_neP96FbTJRvRLbLEXnMYdBdwSeHunFAwF+jQ@mail.gmail.com> <A867BB8E-4556-44B1-A0AF-16771626BF5C@checkpoint.com> <52CB358D.3050603@cisco.com> <A6BDE08D-1F7D-4813-A9C4-61AF8C14412B@checkpoint.com> <52CB482D.6090807@cisco.com> <09031D92-9A14-4CF0-A000-123E71D4F784@checkpoint.com> <3861F1D4-B412-42BE-AE6C-FF5DE213854C@checkpoint.com> <CAL9PXLzgo5a2dk0JM-kWvawPhO1arpurcYSuqcffTWGdrCGY7A@mail.gmail.com> <52E12D1F.80701@cisco.com> <CAL9PXLzurJbXL1nY5YCQ7ZotscQZ6F-Uj4duH_QyA=Z4zXP7tw@mail.gmail.com>
In-Reply-To: <CAL9PXLzurJbXL1nY5YCQ7ZotscQZ6F-Uj4duH_QyA=Z4zXP7tw@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "cfrg@irtf.org" <cfrg@irtf.org>
Subject: Re: [Cfrg] questions on performance and side channel resistance for ChaCha20 and Poly1305 for IPsec and TLS
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Jan 2014 13:48:20 -0000

Hi Adam,

On 01/23/2014 11:51 AM, Adam Langley wrote:
> On Thu, Jan 23, 2014 at 9:54 AM, David McGrew <mcgrew@cisco.com> wrote:
>> Hi Adam and Yoav,
>>
>> I have some questions and comments on these crypto algorithms and their use
>> in TLS and IPsec.
>>
>> On 01/21/2014 01:06 PM, Adam Langley wrote:
>>> On Tue, Jan 21, 2014 at 11:47 AM, Yoav Nir <ynir@checkpoint.com> wrote:
>>>> Reviews and comments would be greatly appreciated, as well as anyone
>>>> checking my examples.
>>> In the introduction: I think ChaCha20+Poly1305 are useful for software
>>> implementations, beyond their use as a backup to AES. AES in not
>>> suitable for pure, software implementations and they tend to be be
>>> slow and have side-channels. (AES-GCM even more so.)
>>
>> The claims that ChaCha20+Poly1305 are faster than AES GCM in pure software
>> environments should be quantified in (at least one of) the drafts.
> I have no problem with that, but it's not something that I typically
> see in IETF drafts and so I didn't do any actual numbers for it.

Agreed that it is not something one would expect to see in a TLS draft, 
but if the definitive algorithm specification is going to be an RFC, it 
should be there.   Watson suggested having a separate RFC that defines 
this algorithm combination, which makes sense to me.

>
> ARM testing is done with with the AES-GCM from OpenSSL 1.0.1f (which
> has optimised asm routines for both AES and GCM, although they are
> *not* side-channel free). ChaCha20 and Poly1305 code come from
> SUPERCOP, but integrated into OpenSSL so the overhead is the same. A
> 1KB block size is used
>
> OMAP 4460: 24.1 MB/s (AES-GCM), 75.3 MB/s (ChaCha20-Poly1305)
> Snapdragon S4 Pro: 41.5 MB/s (AES-GCM), 130.9 MB/s (ChaCha20-Poly1305)
>
> So ChaCha20-Poly1305 is ~3x faster, with side-channel protection,
> little dcache pressure and with a 256-bit cipher rather than 128-bit.
>
> On a Sandy Bridge Xeon, AES-GCM runs at ~900 MB/s and
> ChaCha20-Poly1305 at ~500 MB/s.
>
> In both cases, I'm ignoring the extra cache pressure of AES-GCM.

Thanks, this is good data.

> Over time, AESNI will get faster but also the vector units will get
> wider, which helps ChaCha20 and Poly1305. I expect that on high-end,
> Intel chips, the gap will contain to be a small multiple, but it might
> not be as small as 2x on future chips.
>
>> It would also be interesting to look into the power used by the different
>> algorithms, since battery life is important on mobile devices.
> I don't have data on power usage I'm afraid because I don't have the
> equipment to test it. I'm happy to provide a testing binary for
> Android if anyone does.

Maybe it would be possible to write a "run until low battery" app on 
android?   I am no expert on the platform, but it seems that something 
like this would be possible using a receiver for ACTION_BATTERY_LOW.

Just for clarity's sake: I did not mean to suggest that the draft should 
include this data.   It would be good info to know, and it might support 
the case for the newer algorithms, but if the data doesn't exist, its 
not fair to ask that it to be in there.

>> The design rationale for Salsa
>> describes how timing channels are avoided by not using multiplication in
>> that function.   However, Poly1305 uses *lots* of multiplication operations,
>> by a fixed constant.  Unless I am missing something, this is an
>> inconsistency with the motivation for the ciphersuite.  In any case, if
>> Poly1305 requires implementation techniques to avoid side channels, they
>> should be documented in the draft that specifies that function.
> Some CPUs (the G4e is mentioned in the Salsa paper) do have
> non-constant time multiplication and, on those systems, a
> constant-time Poly1305 implementation will be troublesome.
>
> However, those CPUs are rare I believe. (Does anyone know if the PPC64
> chips have the same issue?) For the case of common Intel and ARM CPUs
> it's not a problem. But you're right that it's worth pointing out,
> although I'm not aware that anyone has attempted a constant-time
> Poly1305 on such a chip.

I bet that Dan has thought about the timing issues - it would be great 
to get his input.    He said that it was "often" an issue for 
multiplication, back in 2007, but Robert Ransom mentions in a separate 
note that many modern CPUs have constant-time integer multiplication.

>
>> I assume that the timing channels are the major concern, and that the attack
>> model is one in which an untrusted process running on a trusted operating
>> system aims to discover the key of a ciphersuite in use by some other
>> process.   It would be good to document that fact, to give implementers an
>> idea of what the important channel is.   (Otherwise, they might mistakenly
>> think that the attacker is only able to observe network traffic.)  Is it
>> also a goal to avoid power analysis attacks?    It would be good to document
>> that fact, either way.
> I think the goal is to make crypto as boring as possible. Avoiding
> clear side-channels is just more relaxing because there's less to
> worry about and all the things that you list (simple power analysis,
> mixing mutually untrusting code on the same system etc) are all
> concerns.
>
> But yes, a section on this in the draft is warranted.
>

My guess is that power analysis attacks are less relevant for TLS and 
for mobile phones, but more relevant for Internet of Things scenarios 
and long-term secret keys.


>> I think your goal here is to have both client and server use a
>> hardware-friendly ciphersuite when they both have the right hardware, and to
>> use a software-friendly ciphersuite otherwise. This would imply that the
>> hardware-friendly ciphersuite comes first in the proposal when it is
>> available; does it also mean that when the hardware support is absent, that
>> the hardware-friendly ciphersuite is not offered?    Also, it seems that it
>> would be necessary, in TLS, to have ciphersuites offered in
>> hardware/software pairs, to ensure that an appropriate authentication method
>> is available.   Maybe I misunderstood the goal, if so I trust that you will
>> set me straight.   But if this is the goal, it would make sense to include
>> guidance to this effect in the draft.
> I'm not sure whether this draft is the correct place for it, but it's
> currently the case that Chrome will switch the positions of AES-GCM
> and CHACHA20-POLY1305 in its ciphersuite preferences depending on
> whether AES hardware support is present. Additionally, some Google
> servers (the rollout is in progress) have partial ciphersuite
> preferences: they will respect the client's preference between some
> ciphersuites. This allows AES-GCM to be used when the client has
> hardware and CHACHA20-POLY1305 otherwise.

To me it seems worthwhile to have some guidance on this subject in the 
TLS draft, but if there is a separate draft defining the algorithm, then 
negotiation seems out of scope.

thanks,

David

>
>
> Cheers
>
> AGL
> .
>