Re: Hardware acceleration and packet number encryption

Ian Swett <ianswett@google.com> Sun, 25 March 2018 17:48 UTC

Return-Path: <ianswett@google.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C2E321200F1 for <quic@ietfa.amsl.com>; Sun, 25 Mar 2018 10:48:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.72
X-Spam-Level:
X-Spam-Status: No, score=-0.72 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0Gpx_bsuVy2s for <quic@ietfa.amsl.com>; Sun, 25 Mar 2018 10:48:41 -0700 (PDT)
Received: from mail-io0-x22d.google.com (mail-io0-x22d.google.com [IPv6:2607:f8b0:4001:c06::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 156961200C5 for <quic@ietf.org>; Sun, 25 Mar 2018 10:48:41 -0700 (PDT)
Received: by mail-io0-x22d.google.com with SMTP id b20so20570884iof.5 for <quic@ietf.org>; Sun, 25 Mar 2018 10:48:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rZYv4hVQKwUTHrmbD4el8usNTKIxX3pLGf6wlO8vEGk=; b=C3I79HjTwGDuLF3q/txVf12pXW9etpr/lmvm/xlFZUn7Zq0TnPCPM1t7S1hibIBzh1 lLaJalHV1q1jogKr55V6us3n9B7FaWcxed1R9zRTGgC9PsX30qVhF6QDq8un9XmeFgvJ qgPSkM/PRHCOgSnM+BQFsrBj5sDyMbz5SsTcRGaitcJPaRke6eGcvAkx+3Fa1eJruWWG 9ufdUX7sNffAHAet1KUfHYsHX8RVwkNAtu8Txfuyrqc7kwv0hgH8J6NQ/Rms32EM7f3L ovaB+FcL0VLGMUE4AMcb6gyLbdg+cd+PQQjjAw2i/RkT6Jd4cygHr9VsVZoFvNOOYBqb P+JQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rZYv4hVQKwUTHrmbD4el8usNTKIxX3pLGf6wlO8vEGk=; b=uKNDLdW8c+dnXYcrBLqAP++HAK0VdZTT27ovmGyfcFGkjFzIM9Q2ges52n9CMLtS66 47uAaBoyL9X+iRmh2HsUroTsHL79tnj7F42fuCO5il7IzifYQ4p4LfYrBi5DUC5lsKWk MaBnk4yu7holqphU7nU1XDaoyPJKYdaunIXEssOO4s0ObZGv4lq/9MnqsOyRhbBceLTa fePTJ0rV98Dfy8mmJKO8rapjtT9ZlXThGeXb788vA7UAqpnn7CgltH14oHEuKyuVZqO1 lJfxd60NOjbt0Lm6nX5BohA9S7CBX8W1bz0BDcveznwp1yDRQbQM7b8yxH2NWUby+Daz iD+g==
X-Gm-Message-State: AElRT7F3cJP7cjnEsYZDDr8rCy9+9JElp/7r1lC5tHMyE/pNvvjIps6U Cbq+gyTyRJQBQeULOKJP1qrC7i9xQN6y+6Is3H0+nA==
X-Google-Smtp-Source: AG47ELtP7GqjaFbLOqOnOFw3zYxxiLYhqDzxYaRNDPx6qA93K70KdVfV1RdHqyZc0KZIvFrjEoH+E/GnCZ5k8vZKIDI=
X-Received: by 10.107.149.205 with SMTP id x196mr39245255iod.212.1522000120017; Sun, 25 Mar 2018 10:48:40 -0700 (PDT)
MIME-Version: 1.0
References: <7fd34142-2e14-e383-1f65-bc3ca657576c@huitema.net> <F9FCC213-62B9-437C-ADF9-1277E6090317@gmail.com> <CABcZeBM3PfPkqVxPMcWM-Noyk=M2eCFWZw2Eq-XytbHM=0T9Uw@mail.gmail.com> <CAN1APdfjuvd1eBWCYedsbpi1mx9_+Xa6VvZ3aq_Bhhc+HN67ug@mail.gmail.com> <CABcZeBMtQBwsAF85i=xHmWN3PuGRkJEci+_PjS3LDXi7NgHyYg@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B5CCEFD@ORSMSX111.amr.corp.intel.com> <CABcZeBNfPsJtLErBn1=iGKuLjJMo=jEB5OLxDuU7FxjJv=+b=A@mail.gmail.com> <82369B21-CDED-4A6F-9B32-FF1D93816D80@fb.com> <CABcZeBNdxTuS-Nwi=KMofEezS0+BUgEoETh-+KM01XNKg4SzSQ@mail.gmail.com> <CAN1APdcKxbd-WVKc1ksLPNG+OOLhC1T2AqSTOAOoCCiG0D_-xA@mail.gmail.com> <AA352A70-FF13-4EEC-AC61-447EB57FB16C@huitema.net> <CAN1APdcLhhR5Y0L28Q-DcO6X0Kpcoqd0H_NMzHopd+1k62b2Yg@mail.gmail.com> <CAN1APdebMB=HeT70JM59Up0bd=V_dh2WhXSyw+HTwiqpa+DgtQ@mail.gmail.com>
In-Reply-To: <CAN1APdebMB=HeT70JM59Up0bd=V_dh2WhXSyw+HTwiqpa+DgtQ@mail.gmail.com>
From: Ian Swett <ianswett@google.com>
Date: Sun, 25 Mar 2018 17:48:28 +0000
Message-ID: <CAKcm_gO0B_iQs=Dc_RxjUU852b_o0V4q2U29UyHe4KK-+uSSZA@mail.gmail.com>
Subject: Re: Hardware acceleration and packet number encryption
To: Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>
Cc: Christian Huitema <huitema@huitema.net>, Subodh Iyengar <subodh@fb.com>, Eric Rescorla <ekr@rtfm.com>, IETF QUIC WG <quic@ietf.org>, "Deval, Manasi" <manasi.deval@intel.com>, Kazuho Oku <kazuhooku@gmail.com>
Content-Type: multipart/alternative; boundary="001a1140fef28105050568404434"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/yPIQ6PTZjFbgtVsQn-MrWhfVRXw>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Mar 2018 17:48:45 -0000

To answer Eric's original two questions
1) I really like PN encryption's properties, and the fact it doesn't
require moving to a PN space per path in QUIC v1(Christian's option #4).
2) I'm pretty sad about how difficult the current design is to offload.

Adding an extra nonce does not feel like the right solution, for all the
reasons Eric cited, and it adds a fair bit of byte overhead.

One note on ossification Subodh and I discussed last week.  The current PN
encryption is certainly possible to implement in hardware with some costly
and QUIC specific hardware.  But baking the entire format of QUIC into
hardware could cause it's own type of ossification.  Only implementing bulk
encryption in hardware allows a much wider variety of APIs and should allow
re-use of existing AES-GCM offload hardware with minimal changes.

I believe PN encryption will cause datacenter users of QUIC to create a
version of QUIC without PN encryption(at least that'd be my plan).  I don't
know if the WG sees this as a bug or a feature?

I'd really like an alternate way to transform the packet number that would
be sufficiently unlinkable, but not require a 2-pass solution.  AERO may
work, but I don't understand it well enough from a quick read of the draft
to understand it's properties.  In my opinion/use cases, it's ok if we
can't offload the PN encryption.  The issue comes that we can't do the PN
encryption in software until the bulk encryption is finished, so we can't
offload anything.


On Sun, Mar 25, 2018 at 12:57 PM Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>
wrote:

> Forgot:
>
> The PN length needs to be known from clear text, e.g. varlen encoded with
> the length bits in clear text, or fixed size, or trial decrypted based on
> past experience.
>
>
> On 25 March 2018 at 18.52.00, Mikkel Fahnøe Jørgensen (mikkelfj@gmail.com)
> wrote:
>
> SPARX will keep it the buffer clean and it is also reasonably fast at
> about 500 cycles per byte, but it is going to add significantly more
> latency than than a full AES-NI block encryption.
>
> Consider the following PN encryption scheme which is not very different
> from current proposal, but which avoids touching any AEAD data:
>
> Derive a key K_pn, encrypt the the last 16 octets of AEAD tag X_pn =
> E(K_pn, tail(tag, 16)).
> The C_pn = (PX XOR X_pn) truncated to len(PN).
> Store C_pn as the encrypted packet number.
>
> I don’t think there is any repeat nonce / XOR weakness in this, but even
> if there are, the protected secret is not that critical.
>
> The encrypted packet number is now only consuming between 1 and 4 octets
> and can be decrypted in the exact same manner.
>
> This will not work out of the box because AEAD includes the packet number
> so we need to modify the buffer. However, this isn’t required so we can
> move the packet number out of the AEAD. But this is not so easy because the
> rest of the header needs to be included and we don’t want the packet number
> to be first in the packet. Placing it last could actually be an option
> since it is useless until we see the AEAD tag anyway. In particular, you
> would have all the required data in a single cache line which matters. (If
> want to key duplicates you can you use the first encrypted bytes in the
> packet rather than the packet number).
>
> This approach still has the down side of consuming a full AES block
> encryption that cannot be parallelised so we at about 24ns on current Intel
> cores. However, any other encryption scheme that does not depend on
> guessing that the packet number is in a given range, is likely to be slower.
>
> I still think this is not worthwhile compared to segmented packet numbers,
> but if has to be, it might be workable.
>
>
> Kind Regards,
> Mikkel Fahnøe Jørgensen
>
>
> On 25 March 2018 at 18.10.04, Christian Huitema (huitema@huitema.net)
> wrote:
>
> If we are exploring research ideas, one possibility would be to use 64 bit
> sequence numbers, and encrypt them using a modern 64 bit cipher like SPARX (
> https://www.cryptolux.org/index.php/SPARX
> <https://www.cryptolux.org/index..php/SPARX>). We can exclude the PN bits
> from the authenticated data, since the actual sequence number is part of
> the AEAD nonce.  With that, 64 bit encryption of the PN and AEAD encryption
> of the payload can proceed in parallel. Decryption requires first
> decrypting the PN to initialize the AEAD nonce, but that can be done
> without double buffering.
>
> Of course, the cost of that is header overhead, since the PN always
> occupies 64 bits. So we are trading some overhead for hardware
> acceleration. And we have to have some faith in the 64 bit encryption
> algorithm. (SPARX was suggested to me by Jean-Philippe Aumusson, the author
> of IPCrypt.)
>
> -- Christian Huitema
>
> On Mar 25, 2018, at 8:48 AM, Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>
> wrote:
>
> The tag used as IV in ECB mode for PN enceryption will use a full block
> size which is 16 octets. The proposal was encrypt the tag and XOR the
> result over that packet number and what follows.
>
> If this is what is meant by “tag as IV” it is problematic for what I
> assume is meant by double buffering, i.e. the need to modify the packet
> buffer decryption. This is because the packet number and what follows must
> be un-XORed before verification can take place.
>
> You could keep the packet number out of AEAD, but you cannot afford to
> waste the additional 16-4=12 octets or more that an AES block encryption
> uses, so you a stuck with modifying the buffer post AEAD.
>
> Finding alternative nonces won’t fix this problem.. If you encrypted the
> header completely separately from the body, you could do something, but
> then you waste space on extra header tags.
>
>
> My suggestion with GF(2^n) will not work because: even if it works in
> principle (finding an ideal in GF(2^32) and multiplying a seed with packet
> number modulo ideal), it is easy to brute force 2^32. Alternatively you can
> do chained hashing similar to how GCM’s GHASH works but then is not a
> unique mapping, but that is not better the CTR mode encryption PRNG style,
> and likely slower. Why would you do this at all, if it worked? Because at
> allows you to stick to encrypting only the packet number that can stay
> outside AEAD and thus avoid buffer modification. But I don’t see how it can
> work.
>
> Mikkel
>
> On 25 March 2018 at 14.25.07, Eric Rescorla (ekr@rtfm.com) wrote:
>
>
>
> On Sat, Mar 24, 2018 at 9:41 PM, Subodh Iyengar <subodh@fb.com> wrote:
>
>> When we were first discussing pne, we proposed that the tag be used as
>> the IV for the ctr operation. The pr samples encrypted data in the packet.
>> Did we change that for a reason?
>>
>
> I believe that's my alternative #1 and PR#1079.
>
>
> Would that help alleviate the buffering of the stream data? Because tag is
>> always the last thing in the packet.
>>
>
> I will let Manasi answer this.
>
>
> -Ekr
>
>
>>
>> Subodh
>>
>>
>> On Mar 25, 2018, at 2:56 AM, Eric Rescorla <ekr@rtfm.com> wrote:
>>
>>
>>
>> On Sun, Mar 25, 2018 at 2:09 AM, Deval, Manasi <manasi.deval@intel.com>
>> wrote:
>>
>>> From talking to several of the folks last week, I understand that
>>> unlinkability is the goal of this protocol and there may be some
>>> flexibility in how that can be achieved.
>>>
>>>
>>>
>>> Christian’s e-mail has a detailed list of options.  Here is the list of
>>> favored options as I understand them.
>>>
>>>
>>>
>>> 1.      Packet number encrypted as current suggestion - The current
>>> proposal for PR 1079, uses a two stage serialized approach such that the
>>> stream header(s) and payload(s) need to be encrypted and the outcome of
>>> encryption forms the nonce of the packet number encryption.
>>>
>>>
>>>
>>> 2.      Packet number encrypted alternative 1 - One of the ideas
>>> suggested was to encrypt the stream header(s) and payload(s) with the
>>> packet number as nonce, but have an additional nonce in the clear to
>>> encrypt the packet number. A scheme like this can allow for these two
>>> encryption operations to occur in parallel. This still has the issue of
>>> serialization in decrypt.
>>>
>>>
>>>
>>> 3.      Packet number encrypted alternative 2 – Another option is to
>>> generate 2 IVs – one for PN and the other for stream header(s) and
>>> payload(s). The nonce can be a random value in the clear. This allows us to
>>> encrypt and decrypt the two fields in parallel. The packet number is
>>> encrypted so it also solves the ossification problem. Another variation of
>>> this is to generate a single IV but use one part of it to encrypt the PN.
>>>
>> Neither of these alternatives seems ideal. Once you are carrying an
>> explicit per-packet nonce, you might as well concatenate the payload and
>> the PN and encrypt them together. The will require the least amount of
>> nonce material.
>>
>> -Ekr
>>
>> 4.      PN in the clear – this is a complex scheme and in the discussion
>>> with Ian, Jana and Praveen, they seemed to think this may be ok. If folks
>>> think this is implementable, then we may need to find an alternate solution
>>> for ossification.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Manasi
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Eric Rescorla [mailto:ekr@rtfm..com <ekr@rtfm.com>]
>>> *Sent:* Saturday, March 24, 2018 3:18 PM
>>> *To:* Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>
>>> *Cc:* Kazuho Oku <kazuhooku@gmail.com>; Deval, Manasi <
>>> manasi.deval@intel.com>; Christian Huitema <huitema@huitema..net
>>> <huitema@huitema.net>>; IETF QUIC WG <quic@ietf.org>
>>> *Subject:* Re: Hardware acceleration and packet number encryption
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Mar 24, 2018 at 9:35 PM, Mikkel Fahnøe Jørgensen <
>>> mikkelfj@gmail.com> wrote:
>>>
>>> AERO: I did not read all of it, but it does indeed sound esoteric.
>>>
>>> It can do two things of interest: reduce space used by packet numbers,
>>> and presumably fix the encryption issue.
>>>
>>>
>>>
>>> However, it has a W parameter which is the limit of reordering which is
>>> default 64 and recommended at most 255 for security reasons. This is way
>>> way too low (I would assume) if packet clusters take multiple transatlantic
>>> paths.
>>>
>>>
>>>
>>> That's just a function of how the packet numbers are encoded. It's not
>>> difficult to come up with a design that tolerates more reordering.
>>>
>>>
>>>
>>> -Ekr
>>>
>>>
>>>
>>>
>>>
>>> If we accepted such a limit, I could very trivially come up with an
>>> efficient solution to PN encryption. Since we cover at most 64 packets, we
>>> only need a 5 bit packet number and reject false positives on AEAD tag. To
>>> simplify, make it 8 bits. The algorithm is to AES encrypt a counter similar
>>> to a typical AES based PRNG. Then, for each packet take one byte from the
>>> stream and use it as packet number. The receiver creates the same stream
>>> and maps the received byte to an index it has. It might occasionally have
>>> to try multiple packet numbers since the mapping is not unique. Longer
>>> packet numbers reduce this conflict ratio. To help with this detection some
>>> short trial decryption might be included. The PN size can be extended as
>>> needed.
>>>
>>>
>>>
>>> The cost of doing this is much lower than direct encryption for as
>>> proposes in PR because 1) a single encryption covers multiple packets, 2)
>>> the encryption can be parallelised resulting in a 4-5 fold performance
>>> increase. Combined this results in sub-nanosecond overhead for AES-NI.
>>>
>>>
>>>
>>> However, you have to deal with uncertainties which is why this isn’t a
>>> very good idea unless you have some very good knowledge of the traffic
>>> pattern. It also complicates HW offloading, but I don’t see why it couldn’t
>>> be done efficiently.
>>>
>>>
>>>
>>>
>>>
>>> Mikkel
>>>
>>>
>>>
>>> On 24 March 2018 at 17.26.47, Eric Rescorla (ekr@rtfm.com) wrote:
>>>
>>> 3. A more exotic solution like AERO (
>>> https://tools.ietf.org/html/draft-mcgrew-aero-00#ref-MF07
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dmcgrew-2Daero-2D00-23ref-2DMF07&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=h3Ju9EBS7mHtwg-wAyN7fQ&m=Kqui4PrKKRuP58njW3vlK_ZPgcQX0TQ9iXVtGY1Kp30&s=GthDylmhvmHUnMvnjBT05qJT9VrOTknvVoMbdC7ObLo&e=>
>>> )..
>>>
>>>
>>>
>>>
>>>
>>
>>
>