Re: AEAD and header encryption overhead in QUIC

Kazuho Oku <> Fri, 19 June 2020 04:42 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id F39EA3A0E00 for <>; Thu, 18 Jun 2020 21:42:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id LALbfnxgJUVu for <>; Thu, 18 Jun 2020 21:42:18 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::534]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9C7AF3A0DE3 for <>; Thu, 18 Jun 2020 21:42:17 -0700 (PDT)
Received: by with SMTP id x93so6516579ede.9 for <>; Thu, 18 Jun 2020 21:42:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XyBxYXGJF2MhvPxFUCxjOHIendx9tKZK3vHxQq5vRbg=; b=iy9Wa9+j1kd5+B6nFuJvGUZG+3o2H1d20KTIZsNiq8MSL9VRcX1MHE2q1mhsCiGvqN KSX2A18UQKq//kdcJ/MwDkrADQEs587/5I4weyuTb56CTxU8DtYPmQ6+29Yep/vD+CtM sIVC1OBS0q5prm4tUomw24N0bKh4Kw1uRS/f+3S5tYmlXW/77/aEo/TKXL+Y0rESt/W9 cJVR2PfU0n75DUO36vGpy30AAO07ib7MDGlx1vclEXT9iKcvd0NAaDa2Tq0Ay63HZ2VH YsebShaYAqNH5rrkHhvNLYMtM91ZMjduCVtbaiOL8N3wXN63w5BTLbQqu5rKfwT0MObc rRkg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XyBxYXGJF2MhvPxFUCxjOHIendx9tKZK3vHxQq5vRbg=; b=FHFMcdXYMo7BL8LKOzpPZe5z+i115WN4yDxOSATLL5GNWUPIpnROlW1j9GSTFQHLxH WFqdIbsdV4gmancCYcYnYExCEj+cmkgTllcTOWVKBj7KYeBD1RCsH/ol/91fk5TNV8Pv c2tyy8gUijbbDLThhzVbFfmTGVOkushT+BpbXays7eDOmMR0lIKfZ5PnuyHnBjmQel0M VfvUzSJKVBoagXfE6LZ41x9QRPynRrP+DC0KyaxeiLeYXJvFu6LoduAcpG4sQAvsMX3S 8T4Zmz3ho7d6BbTYCGx/DEBUp7awr8315x3qydr7o8cKmcNswyAJtGxB7gfXnnPkCxkG XjNg==
X-Gm-Message-State: AOAM530twmbFbNrRorCzDZyp3NPgkcITPN4xRfMrbtojrdT5y4MkJahp x+zLLFckIeevryTRHZEjXJyaMxHaID9892cHilw=
X-Google-Smtp-Source: ABdhPJwMOPvB++2/thkW4YiK7jyvUwWPYyDxnegJYBUhkDzYWbn3osEHB6mqVk3lmcxz0TjCybJyKZApvQMw4m2DrpQ=
X-Received: by 2002:a05:6402:642:: with SMTP id u2mr1452742edx.230.1592541736035; Thu, 18 Jun 2020 21:42:16 -0700 (PDT)
MIME-Version: 1.0
References: <> <> <> <> <>
In-Reply-To: <>
From: Kazuho Oku <>
Date: Fri, 19 Jun 2020 13:42:05 +0900
Message-ID: <>
Subject: Re: AEAD and header encryption overhead in QUIC
To: Matt Joras <>
Cc: =?UTF-8?Q?Mikkel_Fahn=C3=B8e_J=C3=B8rgensen?= <>, Nick Banks <>, IETF QUIC WG <>
Content-Type: multipart/alternative; boundary="00000000000077698f05a868844a"
Archived-At: <>
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 19 Jun 2020 04:42:20 -0000

2020年6月19日(金) 8:24 Matt Joras <>om>:

> Thanks for the explanations, Kazuho! Would it be possible to share the
> commands you used to produce the given results? It would be interesting so
> others can do direct comparisons with their own performance benchmarking.
> It is always hard to do comparisons across implementations with varying
> hardware and OS versions.

That's a reasonable thing to ask.

While I'm not fully certain if it is helpful to have direct
comparison between the stacks (everybody has different requirements /
deployments), I am at least certain that sharing the setup being used is a
good idea. It also helps me reproduce the results in the future.

I've written down the setup that we used for the benchmark shown in at Regarding the
AES-GCM benchmark, I think you can find sufficient information in the last
paragraph of

> Matt Joras
> On Thu, Jun 18, 2020 at 2:40 PM Kazuho Oku <> wrote:
>> Nick, Matt, Mikkel, thank you for your comments.
>> 2020年6月18日(木) 23:52 Nick Banks <>om>:
>>> We’ve found that an easy way to minimize the CPU cost of the header
>>> protection (on send and receive) is batching. If you copy 8 packet headers
>>> into a single contiguous block of memory, you can do a single crypto
>>> operation on it. The cost of doing one batch of 8 is essentially the same
>>> as doing just a single header. This effectively cuts the cost of header
>>> protection to 1/8th the original cost. Obviously this only works if you
>>> have enough packets to batch process, but for high throughput tests, you
>>> usually do.
>>> Feel free to take a look at the MsQuic code (receive path
>>> <>
>>> , send path
>>> <>).
>>> The nice thing about this design (for those of us who don’t have as much
>>> special crypto expertise) is that it doesn’t require any special crypto
>>> changes. I’d be interested to see if our two approaches could be combined
>>> for even better performance!
>> Thank you for sharing your experience. This is indeed a sensible approach
>> for minimizing the cost of header protection. On the receive side,
>> processing multiple packets in batch is the only way of minimizing the cost
>> of header protection, because AEAD unprotection cannot start until the
>> result of header unprotection is being obtained. Considering that modern
>> x86-64 CPUs can run 4 to 8 AES block operations in parallel, doing header
>> protection for one packet at a time is a waste of CPU resources.
>> To paraphrase, the two approaches are orthogonal on the receive-side, and
>> combining them would give better performance. My napkin tells me that I can
>> expect at most 3% reduction in CPU cycles spent in crypto.
>> On the send side however, I do not expect to see performance increase by
>> combining the two approaches. This "game" is about keeping the CPU
>> pipelines that do AES-NI running at their full speed. The cost does not
>> change as long as the AES operations of header protection are run in
>> parallel with other AES operations, regardless of that being AEAD or header
>> protection o other packets.
>> If there's chance of further improving performance, I tend to think that
>> that would come from overlapping the AEAD operation of multiple packets.
>> This "game" is about keeping the AES-NI pipeline busy. With Fusion, we've
>> reached 90% of the theoretical maximum speed (9th-gen Intel Core CPUs can
>> do 64 bytes of AES encryption in 40 clocks). The question to us is if we
>> want to consider changing our API for single-digit performance gain.
>> 2020年6月19日(金) 2:21 Matt Joras <>om>:
>>> I was curious, since it's not mentioned on the PRs that I saw, were
>>> these tests done with any adjustments to the ACK frequency on the client?
>>> Or is this with the default ACK policy of ACKing every other?
>> This is an important point, thank you for asking. We did reduce ACK
>> frequency to 1/8 CWND, as we did in our previous report [1]. In fact, you
>> cannot turn that off with quicly.
>> 2020年6月19日(金) 2:35 Mikkel Fahnøe Jørgensen <>om>:
>>> I am a bit surprised that it is possible to use bulk encrypt headers as
>>> Nick suggests as I would have thought the key / nonce was associated with
>>> the packet, but I haven’t looked closely recently.
>> That's a keen observation. Nonce used for header protection does depend
>> on the output of AEAD. Though, because the nonce is taken from the very
>> first few bytes of AEAD output, it is possible to start calculating the
>> header protection vector before finishing AEAD operation, unless the packet
>> is tiny. The position of the nonce was deliberately chosen to provide room
>> for this type of optimization. I recall at least one person (Martin
>> Thomson) arguing for having this possibility.
>> [1]
>> --
>> Kazuho Oku

Kazuho Oku