Re: Packet number encryption

Mikkel Fahnøe Jørgensen <> Wed, 21 February 2018 14:09 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 38EF0127077 for <>; Wed, 21 Feb 2018 06:09:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.698
X-Spam-Status: No, score=-1.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id NGI_cZdRshMI for <>; Wed, 21 Feb 2018 06:09:18 -0800 (PST)
Received: from ( [IPv6:2607:f8b0:4001:c0b::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id C3D16126D05 for <>; Wed, 21 Feb 2018 06:09:17 -0800 (PST)
Received: by with SMTP id w63so2325359ita.3 for <>; Wed, 21 Feb 2018 06:09:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=from:in-reply-to:references:mime-version:date:message-id:subject:to :cc; bh=P+kBBm93uMSQ51xMsnZLA3IZAQesbnwfenFJB9wsg3o=; b=fzrFyWhvmnjXmyTzv8yUxUoZ7Ag6BpuWxxhsl0gxsHa8JIE/zELP23BfGYkjfkivch 14JfFHL7881ExX4ag0XCRiI3HbEBBstMzKi9jIUsbaDSnZ7N5yyULXZC2D1hp09DTorF XKaLPvbMzl5gb3Rl8mD+A0UUju/8GNAX/wCUQtTGhRZDHuEgrWZvcc8QWeoaDUIMSe1R vW72WCE9GlRKygs1XX/tn9t5qUwZ74BGDPHyMVGX3WIVm0yQQx3sOWH0jgNYN6riDdif ycEABtjUE9ry4/vTuGuNBu02VKaCn8Dk5B1E3GxrJv63hCNmue9P3ascEtv7vm3mUAI3 CDiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:from:in-reply-to:references:mime-version:date :message-id:subject:to:cc; bh=P+kBBm93uMSQ51xMsnZLA3IZAQesbnwfenFJB9wsg3o=; b=Sa7xwPtlBS9R7kgP1Nqtb4ffo6YiiKGCxw0zG3r+xRRy510Olk0co9gtns/Ql29wA4 Xv+Vj8FLrmy8SjwrAwlv01WxCGTkAwJjt642S8ifZAmyXglzhoIge4y0X6dUxCyvTenj CK0VMRTCNJob+UOeDp8aTeYy1+EV8UuV6LehoLtIKfmIKTAY4GEdzLJffhK7kRYb91yo OOxTF97NatNqYvNG8zjY3OgyhkW9BVbCJc586PBo+jmhpIIr6c8CG4gvDjJw2mgirXS9 DS6us4cuMDOwkgzc8RDJYPuxMqgZ1qcloNJDlUIExkf1hb3f+PHEq/++im4luCWIxLVm 0KAA==
X-Gm-Message-State: APf1xPC0uF8t7AsAb1P+VqvNSAT699+LogQujDAAFBXH086d4KLhI4B/ SrKwYmH6KC0ICmS0zTlm915qnaZMUdxcgRjvctE=
X-Google-Smtp-Source: AH8x224XnKRSYuYgwE5VAafk4MHYb0OOYzb4a+ttDPCDQOs2nPsfMZ5lRrva16hI0rfs5V3catMZenQD2E++frZVWtw=
X-Received: by with SMTP id y30mr3288850ita.39.1519222157021; Wed, 21 Feb 2018 06:09:17 -0800 (PST)
Received: from 1058052472880 named unknown by with HTTPREST; Wed, 21 Feb 2018 09:09:16 -0500
From: =?UTF-8?Q?Mikkel_Fahn=C3=B8e_J=C3=B8rgensen?= <>
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <DB6PR10MB176692436653B08C1CA949C2ACF10@DB6PR10MB1766.EURPRD10.PROD.OUTLOOK.COM> <> <> <> <> <> <> <>
X-Mailer: Airmail (420)
MIME-Version: 1.0
Date: Wed, 21 Feb 2018 09:09:16 -0500
Message-ID: <>
Subject: Re: Packet number encryption
To: Victor Vasiliev <>, Kazuho Oku <>
Cc: Praveen Balasubramanian <>, "" <>, Marten Seemann <>, huitema <>
Content-Type: multipart/alternative; boundary="001a11c149140118590565b9791d"
Archived-At: <>
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 21 Feb 2018 14:09:20 -0000

If the encrypted packet number is truncated, e.g. wrapping after 4 octets,
it would not be possible to use the encrypted packet number as a nonce
unless it is acceptable to use that octets that follow, which could be hard
to justify. The alternative is to either a) have full length packet
numbers, b) renegotiate keys before wrap, c) drop the CCM header and
compute it based on the decrypted packet number - which saves space as
discussed below, but which also interferes with plain TLS CCM operation.

As I have argued before, it might be simpler to simply have a 32-bit
sequential counter and restrict it to a single path, then user the upper
implicit bits for a path identifier (segmented packet numbers). In this
case there is no encryption overhead and the CCM nonce is trivially
available. The only remaining question is if the packet number should be
redundant in both the QUIC header and the CCM header. This could further be
generalised by moving the packet number out of the QUIC and into an AEAD
specific header field - but that of course impacts monitoring and

Kind Regards,
Mikkel Fahnøe Jørgensen

On 21 February 2018 at 14.53.23, Mikkel Fahnøe Jørgensen (

I suppose that CCM is not a major concern since the nonce would be computed
from the encrypted packet number and thus not reveal anything surprising.
It does, however, add a fair bit of overhead:

16 bytes of CCM header including nonce and length data. 0 padded
authenticated data so the packet header will consume 16 octets. This
includes the now redundant encrypted packet number. At the is a 0-15 octet
padding which on might be 7.5 octets on average, or 0 if the packet is
filled perfectly, and of course the 16 octet auth tag.

This means that CCM mode will use at least 3x16 octets and addition to
encrypted payload and UDP headers.

It would probably be easy to remove 32 octets of that overhead, but it
would require special adaptation of CCM for QUIC that might not be
worthwhile - as we wait for devices that include GCM hardware support.

Kind Regards,
Mikkel Fahnøe Jørgensen

On 21 February 2018 at 13.42.12, Mikkel Fahnøe Jørgensen (

Maybe I’m missing something (such as exactly how TLS/QUIC maps encryption
headers to packets:

but it appears that the packet number encryption / linkage discussion is
focused entirely on AES-GCM.

The CCM mode which is part of TLS 1.3 has a header of 16 octets that
includes a nonce. This nonce would appear to the packet number.

In summary: CCM has the unencrypted form H, LA, A, PA, LM, M, PM, T where H
is a 16 octets header including nonce, LA is a length or the authenticated
data A, LM is the length of the plain text data M, PA and PM is 0-padding
up to block length, and T is tag of up to 16 octets. Some fields optional
depending on first octet (flag) in H. T is computed as a chain of AES
encryptions. Encryption is applied to LM, M, PM, T in CTR mode. (I suppose
there is also padding after A).

CCM is relevant to devices that have AES acceleration but no CLMUL
instruction for GHASH needed by efficient AES-GCM implementations. E.g.
ESP32 microcontrollers:

Of course the CCM header could partly or fully be removed if the
authenticated data and encrypted data lengths can be implied otherwise, or
part of it could be encrypted.

On 12 February 2018 at 12.49.38, Mikkel Fahnøe Jørgensen (

I can add some more numbers here

The numbers deducted from Solarflare report is
interesting, however I am not sure if that reflects the ordinary use
of a protocol; my understanding is that HFT is exceptional.

HFT is the extreme case which is why the benchmark is interesting, not
necessarily the use case itself.

The same concern applies to how fast a distributed database can achieve
consensus, which is something I do care about.

On 12 February 2018 at 12.44.01, Kazuho Oku ( wrote:

Victor, Mikkel, thank you for the estimations.

To me it seems that the estimated overhead of 0.1% seems like a
natural number that we would see on production environment, where the
average packet size is not small.

I also agree that it would be worthwhile to look the case where small
packets are exchanged. The numbers deducted from Solarflare report is
interesting, however I am not sure if that reflects the ordinary use
of a protocol; my understanding is that HFT is exceptional.

One benchmark that might give us a more meaningful number is the DNS
benchmark. lists a benchmark of several
authoritative DNS servers. The TLD benchmark and the Hosting (10k)
benchmark gives us the rough estimation on how much PPS we can achieve
when exchanging small packets.

The numbers we see on the benchmark is roughly 2M RPS (4M pps) on a 16
core (8x2) Intel Xeon processer running with HP enabled, when the
fastest DNS server is being used.

Running `openssl speed -evp` on a similar CPU gives me the following result:

$ /usr/local/openssl-1.1.0/bin/openssl speed -evp aes128
Doing aes-128-cbc for 3s on 16 size blocks: 133387877 aes-128-cbc's in 3.00s

This shows that 40M AES block operations can be run on a single core,
per second, which in turn means that a server with 16 cores can
perform 640M AES block operations per second (or 1,280M if HT is
enabled and if AES is not the bottleneck).

To summarize, if DNS had packet encryption, the overhead would be
somewhere from 0.3% to 0.6%.

Considering that, I would anticipate that the overhead of packet
number encryption will be neglible even for short packet workloads.

2018-02-12 16:19 GMT+09:00 Victor Vasiliev <>om>:
> I have no idea how you get that number (by which I mean, I have a lot of
> guesses, but none of which are solid or useful here). I looked at the
> profiles of various workloads we're running in production, and I estimate
> that none of them would be impacted by PN encryption by worse than 0.1%.
> On Sat, Feb 10, 2018 at 12:12 PM, Praveen Balasubramanian
> <> wrote:
>> Makes sense. The 1%+ overhead I had quoted was in comparison to full
>> protocol processing all the way from app to NIC. This is with an
>> early QUIC implementation and not fully optimized UDP/IP stack (we have
>> focused much much more on optimizing TCP in the past). After software
>> optimizations, the crypto share of the cost will only keep going up for
>> which we have no technique other than to offload when such support is
>> available in hardware and offload is way more challenging when workload
>> hosted in VMs and containers.
>> From: Mikkel Fahnøe Jørgensen []
>> Sent: Saturday, February 10, 2018 3:36 AM
>> To: Victor Vasiliev <>
>> Cc: Praveen Balasubramanian <>om>;; Marten
>> Seemann <>om>; huitema <>
>> Subject: Re: Packet number encryption
>> To put numbers into perspective using Intel 2015 data
>> A 64 byte message in AES-GCM AEAD in HW would use 1.03 cycles per byte or
>> 66 cycles total, or 22ns on a 3GHz core.
>> For packet numbers we use the CBC encrypt numbers because here AES cannot
>> exploit block parallelism.
>> Here we see 4.44 cycles/byte in HW or 71 cycles per block. With a 3GHz
>> setup that would amount to about 24ns overhead for packet encryption.
>> Clearly it makes no sense that AES-GCM is faster than a single AES block
>> encryption, but these are only approximate numbers and CBC mode might
have a
>> little overhead, so we clamp packet numbers to 22ns.
>> Taking the 98ns overhead by the Solarflare report we get a total
>> (simplified) processing time is 98ns non-crypto, 22ns for packet number,
>> 22ns for AEAD totalling 142ns. So the packet number encryption overhead
>> would be 22/(98+22)*100% = 18%. The numbers ignore other QUIC
>> but that can be done in other cores or outside the latency critical
>> This does not take into account that AEAD operation may operate less than
>> optimal because the packet number must be extracted first. On the other
>> hand, it is also not a disastrous overhead if no good alternative can be
>> found.
>> Earlier AES-NI I’ve seen from Intel doc suggests around 100cycles in HW
>> for a single AES-128 block which would be 33ns per packet number in the
>> above example.
>> On 10 February 2018 at 06.18.25, Mikkel Fahnøe Jørgensen
>> ( wrote:
>> 98ns for 68 byte messages

Kazuho Oku