RE: Hardware acceleration and packet number encryption

"Deval, Manasi" <manasi.deval@intel.com> Mon, 26 March 2018 15:49 UTC

Return-Path: <manasi.deval@intel.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 57FF3124B17 for <quic@ietfa.amsl.com>; Mon, 26 Mar 2018 08:49:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.921
X-Spam-Level:
X-Spam-Status: No, score=-4.921 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o2xP6CmiLoqC for <quic@ietfa.amsl.com>; Mon, 26 Mar 2018 08:49:03 -0700 (PDT)
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 803B31243F3 for <quic@ietf.org>; Mon, 26 Mar 2018 08:49:03 -0700 (PDT)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Mar 2018 08:49:03 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos; i="5.48,365,1517904000"; d="scan'208,217"; a="38232571"
Received: from orsmsx102.amr.corp.intel.com ([10.22.225.129]) by orsmga003.jf.intel.com with ESMTP; 26 Mar 2018 08:49:02 -0700
Received: from orsmsx163.amr.corp.intel.com (10.22.240.88) by ORSMSX102.amr.corp.intel.com (10.22.225.129) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 26 Mar 2018 08:49:02 -0700
Received: from orsmsx111.amr.corp.intel.com ([169.254.12.250]) by ORSMSX163.amr.corp.intel.com ([169.254.9.55]) with mapi id 14.03.0319.002; Mon, 26 Mar 2018 08:49:02 -0700
From: "Deval, Manasi" <manasi.deval@intel.com>
To: Ian Swett <ianswett@google.com>, Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>
CC: Christian Huitema <huitema@huitema.net>, Subodh Iyengar <subodh@fb.com>, Eric Rescorla <ekr@rtfm.com>, IETF QUIC WG <quic@ietf.org>, Kazuho Oku <kazuhooku@gmail.com>
Subject: RE: Hardware acceleration and packet number encryption
Thread-Topic: Hardware acceleration and packet number encryption
Thread-Index: AQHTw2pQL/NusTeZMUCinV37uQawVaPf3xiAgAAo/YCAAFZ6AIAADAQA//+ZhQCAAKMAgIAALn8AgACBWgCAADj+gIAABd8AgAAL5QCAAAF8gIAADksAgABKXWA=
Date: Mon, 26 Mar 2018 15:49:02 +0000
Message-ID: <1F436ED13A22A246A59CA374CBC543998B5CF268@ORSMSX111.amr.corp.intel.com>
References: <7fd34142-2e14-e383-1f65-bc3ca657576c@huitema.net> <F9FCC213-62B9-437C-ADF9-1277E6090317@gmail.com> <CABcZeBM3PfPkqVxPMcWM-Noyk=M2eCFWZw2Eq-XytbHM=0T9Uw@mail.gmail.com> <CAN1APdfjuvd1eBWCYedsbpi1mx9_+Xa6VvZ3aq_Bhhc+HN67ug@mail.gmail.com> <CABcZeBMtQBwsAF85i=xHmWN3PuGRkJEci+_PjS3LDXi7NgHyYg@mail.gmail.com> <1F436ED13A22A246A59CA374CBC543998B5CCEFD@ORSMSX111.amr.corp.intel.com> <CABcZeBNfPsJtLErBn1=iGKuLjJMo=jEB5OLxDuU7FxjJv=+b=A@mail.gmail.com> <82369B21-CDED-4A6F-9B32-FF1D93816D80@fb.com> <CABcZeBNdxTuS-Nwi=KMofEezS0+BUgEoETh-+KM01XNKg4SzSQ@mail.gmail.com> <CAN1APdcKxbd-WVKc1ksLPNG+OOLhC1T2AqSTOAOoCCiG0D_-xA@mail.gmail.com> <AA352A70-FF13-4EEC-AC61-447EB57FB16C@huitema.net> <CAN1APdcLhhR5Y0L28Q-DcO6X0Kpcoqd0H_NMzHopd+1k62b2Yg@mail.gmail.com> <CAN1APdebMB=HeT70JM59Up0bd=V_dh2WhXSyw+HTwiqpa+DgtQ@mail.gmail.com> <CAKcm_gO0B_iQs=Dc_RxjUU852b_o0V4q2U29UyHe4KK-+uSSZA@mail.gmail.com>
In-Reply-To: <CAKcm_gO0B_iQs=Dc_RxjUU852b_o0V4q2U29UyHe4KK-+uSSZA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
dlp-product: dlpe-windows
dlp-version: 11.0.0.116
dlp-reaction: no-action
x-originating-ip: [10.22.254.140]
Content-Type: multipart/alternative; boundary="_000_1F436ED13A22A246A59CA374CBC543998B5CF268ORSMSX111amrcor_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/Z9-i0fgGCbVIbLqEh4UaHxCpxuA>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Mar 2018 15:49:08 -0000

Even if the PN encryption is done in hardware, it does not automatically follow that the QUIC packet format is baked in hardware. The modern server side hardware that can deploy QUIC are quite flexible in interpreting the header formats.

There are three new ideas on this list - AES CTR, AERO and SPARX. It would take some time to evaluate the implementation and performance of these in both in software and as a hardware offload. I am on vacation this week and will pick this up when I return next week.

Thanks,
Manasi


From: Ian Swett [mailto:ianswett@google.com]
Sent: Sunday, March 25, 2018 10:48 AM
To: Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>
Cc: Christian Huitema <huitema@huitema.net>; Subodh Iyengar <subodh@fb.com>; Eric Rescorla <ekr@rtfm.com>; IETF QUIC WG <quic@ietf.org>; Deval, Manasi <manasi.deval@intel.com>; Kazuho Oku <kazuhooku@gmail.com>
Subject: Re: Hardware acceleration and packet number encryption

To answer Eric's original two questions
1) I really like PN encryption's properties, and the fact it doesn't require moving to a PN space per path in QUIC v1(Christian's option #4).
2) I'm pretty sad about how difficult the current design is to offload.

Adding an extra nonce does not feel like the right solution, for all the reasons Eric cited, and it adds a fair bit of byte overhead.

One note on ossification Subodh and I discussed last week.  The current PN encryption is certainly possible to implement in hardware with some costly and QUIC specific hardware.  But baking the entire format of QUIC into hardware could cause it's own type of ossification.  Only implementing bulk encryption in hardware allows a much wider variety of APIs and should allow re-use of existing AES-GCM offload hardware with minimal changes.

I believe PN encryption will cause datacenter users of QUIC to create a version of QUIC without PN encryption(at least that'd be my plan).  I don't know if the WG sees this as a bug or a feature?

I'd really like an alternate way to transform the packet number that would be sufficiently unlinkable, but not require a 2-pass solution.  AERO may work, but I don't understand it well enough from a quick read of the draft to understand it's properties.  In my opinion/use cases, it's ok if we can't offload the PN encryption.  The issue comes that we can't do the PN encryption in software until the bulk encryption is finished, so we can't offload anything.

On Sun, Mar 25, 2018 at 12:57 PM Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com<mailto:mikkelfj@gmail.com>> wrote:
Forgot:

The PN length needs to be known from clear text, e.g. varlen encoded with the length bits in clear text, or fixed size, or trial decrypted based on past experience.



On 25 March 2018 at 18.52.00, Mikkel Fahnøe Jørgensen (mikkelfj@gmail.com<mailto:mikkelfj@gmail.com>) wrote:
SPARX will keep it the buffer clean and it is also reasonably fast at about 500 cycles per byte, but it is going to add significantly more latency than than a full AES-NI block encryption.

Consider the following PN encryption scheme which is not very different from current proposal, but which avoids touching any AEAD data:

Derive a key K_pn, encrypt the the last 16 octets of AEAD tag X_pn = E(K_pn, tail(tag, 16)).
The C_pn = (PX XOR X_pn) truncated to len(PN).
Store C_pn as the encrypted packet number.

I don’t think there is any repeat nonce / XOR weakness in this, but even if there are, the protected secret is not that critical.

The encrypted packet number is now only consuming between 1 and 4 octets and can be decrypted in the exact same manner.

This will not work out of the box because AEAD includes the packet number so we need to modify the buffer. However, this isn’t required so we can move the packet number out of the AEAD. But this is not so easy because the rest of the header needs to be included and we don’t want the packet number to be first in the packet. Placing it last could actually be an option since it is useless until we see the AEAD tag anyway. In particular, you would have all the required data in a single cache line which matters. (If want to key duplicates you can you use the first encrypted bytes in the packet rather than the packet number).

This approach still has the down side of consuming a full AES block encryption that cannot be parallelised so we at about 24ns on current Intel cores. However, any other encryption scheme that does not depend on guessing that the packet number is in a given range, is likely to be slower.

I still think this is not worthwhile compared to segmented packet numbers, but if has to be, it might be workable.


Kind Regards,
Mikkel Fahnøe Jørgensen


On 25 March 2018 at 18.10.04, Christian Huitema (huitema@huitema.net<mailto:huitema@huitema.net>) wrote:
If we are exploring research ideas, one possibility would be to use 64 bit sequence numbers, and encrypt them using a modern 64 bit cipher like SPARX (https://www.cryptolux.org/index.php/SPARX<https://www.cryptolux.org/index..php/SPARX>). We can exclude the PN bits from the authenticated data, since the actual sequence number is part of the AEAD nonce.  With that, 64 bit encryption of the PN and AEAD encryption of the payload can proceed in parallel. Decryption requires first decrypting the PN to initialize the AEAD nonce, but that can be done without double buffering.

Of course, the cost of that is header overhead, since the PN always occupies 64 bits. So we are trading some overhead for hardware acceleration. And we have to have some faith in the 64 bit encryption algorithm. (SPARX was suggested to me by Jean-Philippe Aumusson, the author of IPCrypt.)

-- Christian Huitema

On Mar 25, 2018, at 8:48 AM, Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com<mailto:mikkelfj@gmail.com>> wrote:
The tag used as IV in ECB mode for PN enceryption will use a full block size which is 16 octets. The proposal was encrypt the tag and XOR the result over that packet number and what follows.

If this is what is meant by “tag as IV” it is problematic for what I assume is meant by double buffering, i.e. the need to modify the packet buffer decryption. This is because the packet number and what follows must be un-XORed before verification can take place.

You could keep the packet number out of AEAD, but you cannot afford to waste the additional 16-4=12 octets or more that an AES block encryption uses, so you a stuck with modifying the buffer post AEAD.

Finding alternative nonces won’t fix this problem.. If you encrypted the header completely separately from the body, you could do something, but then you waste space on extra header tags.


My suggestion with GF(2^n) will not work because: even if it works in principle (finding an ideal in GF(2^32) and multiplying a seed with packet number modulo ideal), it is easy to brute force 2^32. Alternatively you can do chained hashing similar to how GCM’s GHASH works but then is not a unique mapping, but that is not better the CTR mode encryption PRNG style, and likely slower. Why would you do this at all, if it worked? Because at allows you to stick to encrypting only the packet number that can stay outside AEAD and thus avoid buffer modification. But I don’t see how it can work.

Mikkel


On 25 March 2018 at 14.25.07, Eric Rescorla (ekr@rtfm.com<mailto:ekr@rtfm.com>) wrote:


On Sat, Mar 24, 2018 at 9:41 PM, Subodh Iyengar <subodh@fb.com<mailto:subodh@fb.com>> wrote:
When we were first discussing pne, we proposed that the tag be used as the IV for the ctr operation. The pr samples encrypted data in the packet. Did we change that for a reason?

I believe that's my alternative #1 and PR#1079.


Would that help alleviate the buffering of the stream data? Because tag is always the last thing in the packet.

I will let Manasi answer this.


-Ekr


Subodh


On Mar 25, 2018, at 2:56 AM, Eric Rescorla <ekr@rtfm.com<mailto:ekr@rtfm.com>> wrote:


On Sun, Mar 25, 2018 at 2:09 AM, Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>> wrote:
From talking to several of the folks last week, I understand that unlinkability is the goal of this protocol and there may be some flexibility in how that can be achieved.

Christian’s e-mail has a detailed list of options.  Here is the list of favored options as I understand them.



1.      Packet number encrypted as current suggestion - The current proposal for PR 1079, uses a two stage serialized approach such that the stream header(s) and payload(s) need to be encrypted and the outcome of encryption forms the nonce of the packet number encryption.


2.      Packet number encrypted alternative 1 - One of the ideas suggested was to encrypt the stream header(s) and payload(s) with the packet number as nonce, but have an additional nonce in the clear to encrypt the packet number. A scheme like this can allow for these two encryption operations to occur in parallel. This still has the issue of serialization in decrypt.



3.      Packet number encrypted alternative 2 – Another option is to generate 2 IVs – one for PN and the other for stream header(s) and payload(s). The nonce can be a random value in the clear. This allows us to encrypt and decrypt the two fields in parallel. The packet number is encrypted so it also solves the ossification problem. Another variation of this is to generate a single IV but use one part of it to encrypt the PN.
Neither of these alternatives seems ideal. Once you are carrying an explicit per-packet nonce, you might as well concatenate the payload and the PN and encrypt them together. The will require the least amount of nonce material.

-Ekr


4.      PN in the clear – this is a complex scheme and in the discussion with Ian, Jana and Praveen, they seemed to think this may be ok. If folks think this is implementable, then we may need to find an alternate solution for ossification.


Thanks,
Manasi





From: Eric Rescorla [mailto:ekr@rtfm..com<mailto:ekr@rtfm.com>]
Sent: Saturday, March 24, 2018 3:18 PM
To: Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com<mailto:mikkelfj@gmail.com>>
Cc: Kazuho Oku <kazuhooku@gmail.com<mailto:kazuhooku@gmail.com>>; Deval, Manasi <manasi.deval@intel.com<mailto:manasi.deval@intel.com>>; Christian Huitema <huitema@huitema..net<mailto:huitema@huitema.net>>; IETF QUIC WG <quic@ietf.org<mailto:quic@ietf.org>>
Subject: Re: Hardware acceleration and packet number encryption



On Sat, Mar 24, 2018 at 9:35 PM, Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com<mailto:mikkelfj@gmail.com>> wrote:
AERO: I did not read all of it, but it does indeed sound esoteric.
It can do two things of interest: reduce space used by packet numbers, and presumably fix the encryption issue.

However, it has a W parameter which is the limit of reordering which is default 64 and recommended at most 255 for security reasons. This is way way too low (I would assume) if packet clusters take multiple transatlantic paths.

That's just a function of how the packet numbers are encoded. It's not difficult to come up with a design that tolerates more reordering.

-Ekr


If we accepted such a limit, I could very trivially come up with an efficient solution to PN encryption. Since we cover at most 64 packets, we only need a 5 bit packet number and reject false positives on AEAD tag. To simplify, make it 8 bits. The algorithm is to AES encrypt a counter similar to a typical AES based PRNG. Then, for each packet take one byte from the stream and use it as packet number. The receiver creates the same stream and maps the received byte to an index it has. It might occasionally have to try multiple packet numbers since the mapping is not unique. Longer packet numbers reduce this conflict ratio. To help with this detection some short trial decryption might be included. The PN size can be extended as needed.

The cost of doing this is much lower than direct encryption for as proposes in PR because 1) a single encryption covers multiple packets, 2) the encryption can be parallelised resulting in a 4-5 fold performance increase. Combined this results in sub-nanosecond overhead for AES-NI.

However, you have to deal with uncertainties which is why this isn’t a very good idea unless you have some very good knowledge of the traffic pattern. It also complicates HW offloading, but I don’t see why it couldn’t be done efficiently.


Mikkel


On 24 March 2018 at 17.26.47, Eric Rescorla (ekr@rtfm.com<mailto:ekr@rtfm.com>) wrote:
3. A more exotic solution like AERO (https://tools.ietf.org/html/draft-mcgrew-aero-00#ref-MF07<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dmcgrew-2Daero-2D00-23ref-2DMF07&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=h3Ju9EBS7mHtwg-wAyN7fQ&m=Kqui4PrKKRuP58njW3vlK_ZPgcQX0TQ9iXVtGY1Kp30&s=GthDylmhvmHUnMvnjBT05qJT9VrOTknvVoMbdC7ObLo&e=>)..