Re: AEAD and header encryption overhead in QUIC

Kazuho Oku <kazuhooku@gmail.com> Fri, 19 June 2020 04:42 UTC

Return-Path: <kazuhooku@gmail.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F39EA3A0E00 for <quic@ietfa.amsl.com>; Thu, 18 Jun 2020 21:42:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LALbfnxgJUVu for <quic@ietfa.amsl.com>; Thu, 18 Jun 2020 21:42:18 -0700 (PDT)
Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9C7AF3A0DE3 for <quic@ietf.org>; Thu, 18 Jun 2020 21:42:17 -0700 (PDT)
Received: by mail-ed1-x534.google.com with SMTP id x93so6516579ede.9 for <quic@ietf.org>; Thu, 18 Jun 2020 21:42:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XyBxYXGJF2MhvPxFUCxjOHIendx9tKZK3vHxQq5vRbg=; b=iy9Wa9+j1kd5+B6nFuJvGUZG+3o2H1d20KTIZsNiq8MSL9VRcX1MHE2q1mhsCiGvqN KSX2A18UQKq//kdcJ/MwDkrADQEs587/5I4weyuTb56CTxU8DtYPmQ6+29Yep/vD+CtM sIVC1OBS0q5prm4tUomw24N0bKh4Kw1uRS/f+3S5tYmlXW/77/aEo/TKXL+Y0rESt/W9 cJVR2PfU0n75DUO36vGpy30AAO07ib7MDGlx1vclEXT9iKcvd0NAaDa2Tq0Ay63HZ2VH YsebShaYAqNH5rrkHhvNLYMtM91ZMjduCVtbaiOL8N3wXN63w5BTLbQqu5rKfwT0MObc rRkg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XyBxYXGJF2MhvPxFUCxjOHIendx9tKZK3vHxQq5vRbg=; b=FHFMcdXYMo7BL8LKOzpPZe5z+i115WN4yDxOSATLL5GNWUPIpnROlW1j9GSTFQHLxH WFqdIbsdV4gmancCYcYnYExCEj+cmkgTllcTOWVKBj7KYeBD1RCsH/ol/91fk5TNV8Pv c2tyy8gUijbbDLThhzVbFfmTGVOkushT+BpbXays7eDOmMR0lIKfZ5PnuyHnBjmQel0M VfvUzSJKVBoagXfE6LZ41x9QRPynRrP+DC0KyaxeiLeYXJvFu6LoduAcpG4sQAvsMX3S 8T4Zmz3ho7d6BbTYCGx/DEBUp7awr8315x3qydr7o8cKmcNswyAJtGxB7gfXnnPkCxkG XjNg==
X-Gm-Message-State: AOAM530twmbFbNrRorCzDZyp3NPgkcITPN4xRfMrbtojrdT5y4MkJahp x+zLLFckIeevryTRHZEjXJyaMxHaID9892cHilw=
X-Google-Smtp-Source: ABdhPJwMOPvB++2/thkW4YiK7jyvUwWPYyDxnegJYBUhkDzYWbn3osEHB6mqVk3lmcxz0TjCybJyKZApvQMw4m2DrpQ=
X-Received: by 2002:a05:6402:642:: with SMTP id u2mr1452742edx.230.1592541736035; Thu, 18 Jun 2020 21:42:16 -0700 (PDT)
MIME-Version: 1.0
References: <CANatvzz8F1H=DXMkBEhmKHnYM-HVG48TS9KwY=OP881Txkcodw@mail.gmail.com> <CADdTf+i+LZ98GgNhFNVcuoVczC=jCQE-TqWbCqhrpR7=Z2knWg@mail.gmail.com> <CAN1APdft3UU1dfKY_UxRaLy2xeCYSXQT3=k53=96OO_Gu1X_cw@mail.gmail.com> <CANatvzw2pj-CaXbAJ_-kUrvmoC3_oyvnSo7Yn+mX-kBuqVkr3w@mail.gmail.com> <CADdTf+jtAd5rh+fW2RMLjTgXzNZ30xxSQMXKRtRb4z7bZcRsbg@mail.gmail.com>
In-Reply-To: <CADdTf+jtAd5rh+fW2RMLjTgXzNZ30xxSQMXKRtRb4z7bZcRsbg@mail.gmail.com>
From: Kazuho Oku <kazuhooku@gmail.com>
Date: Fri, 19 Jun 2020 13:42:05 +0900
Message-ID: <CANatvzxsPhZvyLtk80aPCnr+g7rdpDngvSLxp6eWRYppEHjTrQ@mail.gmail.com>
Subject: Re: AEAD and header encryption overhead in QUIC
To: Matt Joras <matt.joras@gmail.com>
Cc: =?UTF-8?Q?Mikkel_Fahn=C3=B8e_J=C3=B8rgensen?= <mikkelfj@gmail.com>, Nick Banks <nibanks@microsoft.com>, IETF QUIC WG <quic@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000077698f05a868844a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/VBgI7A6sxzU8UhVmQ9jE7YrpkPI>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jun 2020 04:42:20 -0000

2020年6月19日(金) 8:24 Matt Joras <matt.joras@gmail.com>om>:

> Thanks for the explanations, Kazuho! Would it be possible to share the
> commands you used to produce the given results? It would be interesting so
> others can do direct comparisons with their own performance benchmarking.
> It is always hard to do comparisons across implementations with varying
> hardware and OS versions.
>

That's a reasonable thing to ask.

While I'm not fully certain if it is helpful to have direct
comparison between the stacks (everybody has different requirements /
deployments), I am at least certain that sharing the setup being used is a
good idea. It also helps me reproduce the results in the future.

I've written down the setup that we used for the benchmark shown in
https://github.com/h2o/quicly/pull/359 at
https://github.com/h2o/quicly/wiki/Benchmarking-CPU-usage. Regarding the
AES-GCM benchmark, I think you can find sufficient information in the last
paragraph of https://github.com/h2o/picotls/pull/310.


>
> Matt Joras
>
> On Thu, Jun 18, 2020 at 2:40 PM Kazuho Oku <kazuhooku@gmail.com> wrote:
>
>> Nick, Matt, Mikkel, thank you for your comments.
>>
>> 2020年6月18日(木) 23:52 Nick Banks <nibanks@microsoft.com>om>:
>>
>>> We’ve found that an easy way to minimize the CPU cost of the header
>>> protection (on send and receive) is batching. If you copy 8 packet headers
>>> into a single contiguous block of memory, you can do a single crypto
>>> operation on it. The cost of doing one batch of 8 is essentially the same
>>> as doing just a single header. This effectively cuts the cost of header
>>> protection to 1/8th the original cost. Obviously this only works if you
>>> have enough packets to batch process, but for high throughput tests, you
>>> usually do.
>>>
>>>
>>>
>>> Feel free to take a look at the MsQuic code (receive path
>>> <https://github.com/microsoft/msquic/blob/master/src/core/connection.c#L4784>
>>> , send path
>>> <https://github.com/microsoft/msquic/blob/master/src/core/packet_builder.c#L728>).
>>> The nice thing about this design (for those of us who don’t have as much
>>> special crypto expertise) is that it doesn’t require any special crypto
>>> changes. I’d be interested to see if our two approaches could be combined
>>> for even better performance!
>>>
>>
>> Thank you for sharing your experience. This is indeed a sensible approach
>> for minimizing the cost of header protection. On the receive side,
>> processing multiple packets in batch is the only way of minimizing the cost
>> of header protection, because AEAD unprotection cannot start until the
>> result of header unprotection is being obtained. Considering that modern
>> x86-64 CPUs can run 4 to 8 AES block operations in parallel, doing header
>> protection for one packet at a time is a waste of CPU resources.
>>
>> To paraphrase, the two approaches are orthogonal on the receive-side, and
>> combining them would give better performance. My napkin tells me that I can
>> expect at most 3% reduction in CPU cycles spent in crypto.
>>
>> On the send side however, I do not expect to see performance increase by
>> combining the two approaches. This "game" is about keeping the CPU
>> pipelines that do AES-NI running at their full speed. The cost does not
>> change as long as the AES operations of header protection are run in
>> parallel with other AES operations, regardless of that being AEAD or header
>> protection o other packets.
>>
>> If there's chance of further improving performance, I tend to think that
>> that would come from overlapping the AEAD operation of multiple packets.
>> This "game" is about keeping the AES-NI pipeline busy. With Fusion, we've
>> reached 90% of the theoretical maximum speed (9th-gen Intel Core CPUs can
>> do 64 bytes of AES encryption in 40 clocks). The question to us is if we
>> want to consider changing our API for single-digit performance gain.
>>
>>
>> 2020年6月19日(金) 2:21 Matt Joras <matt.joras@gmail.com>om>:
>>
>>> I was curious, since it's not mentioned on the PRs that I saw, were
>>> these tests done with any adjustments to the ACK frequency on the client?
>>> Or is this with the default ACK policy of ACKing every other?
>>>
>>
>> This is an important point, thank you for asking. We did reduce ACK
>> frequency to 1/8 CWND, as we did in our previous report [1]. In fact, you
>> cannot turn that off with quicly.
>>
>>
>> 2020年6月19日(金) 2:35 Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>om>:
>>
>>> I am a bit surprised that it is possible to use bulk encrypt headers as
>>> Nick suggests as I would have thought the key / nonce was associated with
>>> the packet, but I haven’t looked closely recently.
>>>
>>
>> That's a keen observation. Nonce used for header protection does depend
>> on the output of AEAD. Though, because the nonce is taken from the very
>> first few bytes of AEAD output, it is possible to start calculating the
>> header protection vector before finishing AEAD operation, unless the packet
>> is tiny. The position of the nonce was deliberately chosen to provide room
>> for this type of optimization. I recall at least one person (Martin
>> Thomson) arguing for having this possibility.
>>
>> [1]
>> https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency
>>
>> --
>> Kazuho Oku
>>
>

-- 
Kazuho Oku