Re: AEAD and header encryption overhead in QUIC

Matt Joras <> Thu, 18 June 2020 23:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 104CF3A0028 for <>; Thu, 18 Jun 2020 16:24:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id RcYZ5Vc4uzZE for <>; Thu, 18 Jun 2020 16:24:32 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::a2b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id EC0833A1219 for <>; Thu, 18 Jun 2020 16:24:31 -0700 (PDT)
Received: by with SMTP id s192so1874268vkh.3 for <>; Thu, 18 Jun 2020 16:24:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=B+RwLoo9cOjKN+Las/AgPeza930ZmE1gGI5M/0clJGY=; b=d9fXmBHrDmMY/iSIxRzJg76Wa1px9UUdv0X2nvrgytCsemKo7qmxi8hx6e22WfyuKW BkpvjGEhx5UhsAtf91jgaKqAF0z7DL5P8asmjJh5vCU5nAh3p++3pNLnbgMPOQN94SaM FwqSIRAqDJGG/GkmeybLd1LgUcRKAyp5++FzvHV39kQzWPNXdVnI0SkjvpA2/DN8c8U6 28WH+D/mj/zQHIHVh07BUiDzNIbwX+0bZHuoIAP77rDGQmrLYQtAUbM9Zl+g8d66kvF3 q2R+9LUtg8kzoglFJ4glgj9RdGj32jsKquKBMlbVeJGW+Iat38mHCtjj3duHe4p/toqC bXbA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=B+RwLoo9cOjKN+Las/AgPeza930ZmE1gGI5M/0clJGY=; b=YY3ER3QoeWYEQB2PyOVT7rAuti0gCLjR7NS2hheAWjZ468LuT6aNZX3rFJ1oHW8ci+ CL+fulDIT7sGTR4yxE4nrWsstelZ3PK9oSbTY7+CwPrqz/kzEvGdIDrOnW5QXSuuAlkL mcjil1pk/U80s2Q9f71/L9VzA6K6rGDKKSSr8sqyol4VvpCXHgjHvusMnXekLb+oYUwk dkZu1b1BQsAPrIbVawShnCGwQOr1BoyLc/uX/w3T/8qtO4XAU2Dm6d4Vm65bfgchfUrh r9zGTdhsUYOcz3h7nO2FPuESVCI47Q1eb7G3xHKL1UUaOAzBLJTeTZpArLx1K4cKyj6/ +Uvw==
X-Gm-Message-State: AOAM531Ef7jDuTQxklAJdSzDW6BCcmYcTattRvXnG5GtB9nNXYZIhYsO VQkaUgrQiOJwB4XFXL+8ApC/bQiyqvPJlMqJuwg=
X-Google-Smtp-Source: ABdhPJyb0WGxR/X4CACR4XTA6v0FtXsTyFX+D55mdO/e3LfTJki7I6eurpwZcJ9+u0HCZW4IGwLopFovYgbMkMhz/Uw=
X-Received: by 2002:a05:6122:10f8:: with SMTP id m24mr5404576vko.41.1592522670639; Thu, 18 Jun 2020 16:24:30 -0700 (PDT)
MIME-Version: 1.0
References: <> <> <> <>
In-Reply-To: <>
From: Matt Joras <>
Date: Thu, 18 Jun 2020 16:24:19 -0700
Message-ID: <>
Subject: Re: AEAD and header encryption overhead in QUIC
To: Kazuho Oku <>
Cc: =?UTF-8?Q?Mikkel_Fahn=C3=B8e_J=C3=B8rgensen?= <>, Nick Banks <>, IETF QUIC WG <>
Content-Type: multipart/alternative; boundary="000000000000148c6c05a8641438"
Archived-At: <>
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 18 Jun 2020 23:24:34 -0000

Thanks for the explanations, Kazuho! Would it be possible to share the
commands you used to produce the given results? It would be interesting so
others can do direct comparisons with their own performance benchmarking.
It is always hard to do comparisons across implementations with varying
hardware and OS versions.

Matt Joras

On Thu, Jun 18, 2020 at 2:40 PM Kazuho Oku <> wrote:

> Nick, Matt, Mikkel, thank you for your comments.
> 2020年6月18日(木) 23:52 Nick Banks <>om>:
>> We’ve found that an easy way to minimize the CPU cost of the header
>> protection (on send and receive) is batching. If you copy 8 packet headers
>> into a single contiguous block of memory, you can do a single crypto
>> operation on it. The cost of doing one batch of 8 is essentially the same
>> as doing just a single header. This effectively cuts the cost of header
>> protection to 1/8th the original cost. Obviously this only works if you
>> have enough packets to batch process, but for high throughput tests, you
>> usually do.
>> Feel free to take a look at the MsQuic code (receive path
>> <>
>> , send path
>> <>).
>> The nice thing about this design (for those of us who don’t have as much
>> special crypto expertise) is that it doesn’t require any special crypto
>> changes. I’d be interested to see if our two approaches could be combined
>> for even better performance!
> Thank you for sharing your experience. This is indeed a sensible approach
> for minimizing the cost of header protection. On the receive side,
> processing multiple packets in batch is the only way of minimizing the cost
> of header protection, because AEAD unprotection cannot start until the
> result of header unprotection is being obtained. Considering that modern
> x86-64 CPUs can run 4 to 8 AES block operations in parallel, doing header
> protection for one packet at a time is a waste of CPU resources.
> To paraphrase, the two approaches are orthogonal on the receive-side, and
> combining them would give better performance. My napkin tells me that I can
> expect at most 3% reduction in CPU cycles spent in crypto.
> On the send side however, I do not expect to see performance increase by
> combining the two approaches. This "game" is about keeping the CPU
> pipelines that do AES-NI running at their full speed. The cost does not
> change as long as the AES operations of header protection are run in
> parallel with other AES operations, regardless of that being AEAD or header
> protection o other packets.
> If there's chance of further improving performance, I tend to think that
> that would come from overlapping the AEAD operation of multiple packets.
> This "game" is about keeping the AES-NI pipeline busy. With Fusion, we've
> reached 90% of the theoretical maximum speed (9th-gen Intel Core CPUs can
> do 64 bytes of AES encryption in 40 clocks). The question to us is if we
> want to consider changing our API for single-digit performance gain.
> 2020年6月19日(金) 2:21 Matt Joras <>om>:
>> I was curious, since it's not mentioned on the PRs that I saw, were these
>> tests done with any adjustments to the ACK frequency on the client? Or is
>> this with the default ACK policy of ACKing every other?
> This is an important point, thank you for asking. We did reduce ACK
> frequency to 1/8 CWND, as we did in our previous report [1]. In fact, you
> cannot turn that off with quicly.
> 2020年6月19日(金) 2:35 Mikkel Fahnøe Jørgensen <>om>:
>> I am a bit surprised that it is possible to use bulk encrypt headers as
>> Nick suggests as I would have thought the key / nonce was associated with
>> the packet, but I haven’t looked closely recently.
> That's a keen observation. Nonce used for header protection does depend on
> the output of AEAD. Though, because the nonce is taken from the very first
> few bytes of AEAD output, it is possible to start calculating the header
> protection vector before finishing AEAD operation, unless the packet is
> tiny. The position of the nonce was deliberately chosen to provide room for
> this type of optimization. I recall at least one person (Martin Thomson)
> arguing for having this possibility.
> [1]
> --
> Kazuho Oku