Re: AEAD and header encryption overhead in QUIC
Kazuho Oku <kazuhooku@gmail.com> Fri, 19 June 2020 04:42 UTC
Return-Path: <kazuhooku@gmail.com>
X-Original-To: quic@ietfa.amsl.com
Delivered-To: quic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F39EA3A0E00 for <quic@ietfa.amsl.com>; Thu, 18 Jun 2020 21:42:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LALbfnxgJUVu for <quic@ietfa.amsl.com>; Thu, 18 Jun 2020 21:42:18 -0700 (PDT)
Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9C7AF3A0DE3 for <quic@ietf.org>; Thu, 18 Jun 2020 21:42:17 -0700 (PDT)
Received: by mail-ed1-x534.google.com with SMTP id x93so6516579ede.9 for <quic@ietf.org>; Thu, 18 Jun 2020 21:42:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XyBxYXGJF2MhvPxFUCxjOHIendx9tKZK3vHxQq5vRbg=; b=iy9Wa9+j1kd5+B6nFuJvGUZG+3o2H1d20KTIZsNiq8MSL9VRcX1MHE2q1mhsCiGvqN KSX2A18UQKq//kdcJ/MwDkrADQEs587/5I4weyuTb56CTxU8DtYPmQ6+29Yep/vD+CtM sIVC1OBS0q5prm4tUomw24N0bKh4Kw1uRS/f+3S5tYmlXW/77/aEo/TKXL+Y0rESt/W9 cJVR2PfU0n75DUO36vGpy30AAO07ib7MDGlx1vclEXT9iKcvd0NAaDa2Tq0Ay63HZ2VH YsebShaYAqNH5rrkHhvNLYMtM91ZMjduCVtbaiOL8N3wXN63w5BTLbQqu5rKfwT0MObc rRkg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XyBxYXGJF2MhvPxFUCxjOHIendx9tKZK3vHxQq5vRbg=; b=FHFMcdXYMo7BL8LKOzpPZe5z+i115WN4yDxOSATLL5GNWUPIpnROlW1j9GSTFQHLxH WFqdIbsdV4gmancCYcYnYExCEj+cmkgTllcTOWVKBj7KYeBD1RCsH/ol/91fk5TNV8Pv c2tyy8gUijbbDLThhzVbFfmTGVOkushT+BpbXays7eDOmMR0lIKfZ5PnuyHnBjmQel0M VfvUzSJKVBoagXfE6LZ41x9QRPynRrP+DC0KyaxeiLeYXJvFu6LoduAcpG4sQAvsMX3S 8T4Zmz3ho7d6BbTYCGx/DEBUp7awr8315x3qydr7o8cKmcNswyAJtGxB7gfXnnPkCxkG XjNg==
X-Gm-Message-State: AOAM530twmbFbNrRorCzDZyp3NPgkcITPN4xRfMrbtojrdT5y4MkJahp x+zLLFckIeevryTRHZEjXJyaMxHaID9892cHilw=
X-Google-Smtp-Source: ABdhPJwMOPvB++2/thkW4YiK7jyvUwWPYyDxnegJYBUhkDzYWbn3osEHB6mqVk3lmcxz0TjCybJyKZApvQMw4m2DrpQ=
X-Received: by 2002:a05:6402:642:: with SMTP id u2mr1452742edx.230.1592541736035; Thu, 18 Jun 2020 21:42:16 -0700 (PDT)
MIME-Version: 1.0
References: <CANatvzz8F1H=DXMkBEhmKHnYM-HVG48TS9KwY=OP881Txkcodw@mail.gmail.com> <CADdTf+i+LZ98GgNhFNVcuoVczC=jCQE-TqWbCqhrpR7=Z2knWg@mail.gmail.com> <CAN1APdft3UU1dfKY_UxRaLy2xeCYSXQT3=k53=96OO_Gu1X_cw@mail.gmail.com> <CANatvzw2pj-CaXbAJ_-kUrvmoC3_oyvnSo7Yn+mX-kBuqVkr3w@mail.gmail.com> <CADdTf+jtAd5rh+fW2RMLjTgXzNZ30xxSQMXKRtRb4z7bZcRsbg@mail.gmail.com>
In-Reply-To: <CADdTf+jtAd5rh+fW2RMLjTgXzNZ30xxSQMXKRtRb4z7bZcRsbg@mail.gmail.com>
From: Kazuho Oku <kazuhooku@gmail.com>
Date: Fri, 19 Jun 2020 13:42:05 +0900
Message-ID: <CANatvzxsPhZvyLtk80aPCnr+g7rdpDngvSLxp6eWRYppEHjTrQ@mail.gmail.com>
Subject: Re: AEAD and header encryption overhead in QUIC
To: Matt Joras <matt.joras@gmail.com>
Cc: Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>, Nick Banks <nibanks@microsoft.com>, IETF QUIC WG <quic@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000077698f05a868844a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic/VBgI7A6sxzU8UhVmQ9jE7YrpkPI>
X-BeenThere: quic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Main mailing list of the IETF QUIC working group <quic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic>, <mailto:quic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic/>
List-Post: <mailto:quic@ietf.org>
List-Help: <mailto:quic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic>, <mailto:quic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jun 2020 04:42:20 -0000
2020年6月19日(金) 8:24 Matt Joras <matt.joras@gmail.com>: > Thanks for the explanations, Kazuho! Would it be possible to share the > commands you used to produce the given results? It would be interesting so > others can do direct comparisons with their own performance benchmarking. > It is always hard to do comparisons across implementations with varying > hardware and OS versions. > That's a reasonable thing to ask. While I'm not fully certain if it is helpful to have direct comparison between the stacks (everybody has different requirements / deployments), I am at least certain that sharing the setup being used is a good idea. It also helps me reproduce the results in the future. I've written down the setup that we used for the benchmark shown in https://github.com/h2o/quicly/pull/359 at https://github.com/h2o/quicly/wiki/Benchmarking-CPU-usage. Regarding the AES-GCM benchmark, I think you can find sufficient information in the last paragraph of https://github.com/h2o/picotls/pull/310. > > Matt Joras > > On Thu, Jun 18, 2020 at 2:40 PM Kazuho Oku <kazuhooku@gmail.com> wrote: > >> Nick, Matt, Mikkel, thank you for your comments. >> >> 2020年6月18日(木) 23:52 Nick Banks <nibanks@microsoft.com>: >> >>> We’ve found that an easy way to minimize the CPU cost of the header >>> protection (on send and receive) is batching. If you copy 8 packet headers >>> into a single contiguous block of memory, you can do a single crypto >>> operation on it. The cost of doing one batch of 8 is essentially the same >>> as doing just a single header. This effectively cuts the cost of header >>> protection to 1/8th the original cost. Obviously this only works if you >>> have enough packets to batch process, but for high throughput tests, you >>> usually do. >>> >>> >>> >>> Feel free to take a look at the MsQuic code (receive path >>> <https://github.com/microsoft/msquic/blob/master/src/core/connection.c#L4784> >>> , send path >>> <https://github.com/microsoft/msquic/blob/master/src/core/packet_builder.c#L728>). >>> The nice thing about this design (for those of us who don’t have as much >>> special crypto expertise) is that it doesn’t require any special crypto >>> changes. I’d be interested to see if our two approaches could be combined >>> for even better performance! >>> >> >> Thank you for sharing your experience. This is indeed a sensible approach >> for minimizing the cost of header protection. On the receive side, >> processing multiple packets in batch is the only way of minimizing the cost >> of header protection, because AEAD unprotection cannot start until the >> result of header unprotection is being obtained. Considering that modern >> x86-64 CPUs can run 4 to 8 AES block operations in parallel, doing header >> protection for one packet at a time is a waste of CPU resources. >> >> To paraphrase, the two approaches are orthogonal on the receive-side, and >> combining them would give better performance. My napkin tells me that I can >> expect at most 3% reduction in CPU cycles spent in crypto. >> >> On the send side however, I do not expect to see performance increase by >> combining the two approaches. This "game" is about keeping the CPU >> pipelines that do AES-NI running at their full speed. The cost does not >> change as long as the AES operations of header protection are run in >> parallel with other AES operations, regardless of that being AEAD or header >> protection o other packets. >> >> If there's chance of further improving performance, I tend to think that >> that would come from overlapping the AEAD operation of multiple packets. >> This "game" is about keeping the AES-NI pipeline busy. With Fusion, we've >> reached 90% of the theoretical maximum speed (9th-gen Intel Core CPUs can >> do 64 bytes of AES encryption in 40 clocks). The question to us is if we >> want to consider changing our API for single-digit performance gain. >> >> >> 2020年6月19日(金) 2:21 Matt Joras <matt.joras@gmail.com>: >> >>> I was curious, since it's not mentioned on the PRs that I saw, were >>> these tests done with any adjustments to the ACK frequency on the client? >>> Or is this with the default ACK policy of ACKing every other? >>> >> >> This is an important point, thank you for asking. We did reduce ACK >> frequency to 1/8 CWND, as we did in our previous report [1]. In fact, you >> cannot turn that off with quicly. >> >> >> 2020年6月19日(金) 2:35 Mikkel Fahnøe Jørgensen <mikkelfj@gmail.com>: >> >>> I am a bit surprised that it is possible to use bulk encrypt headers as >>> Nick suggests as I would have thought the key / nonce was associated with >>> the packet, but I haven’t looked closely recently. >>> >> >> That's a keen observation. Nonce used for header protection does depend >> on the output of AEAD. Though, because the nonce is taken from the very >> first few bytes of AEAD output, it is possible to start calculating the >> header protection vector before finishing AEAD operation, unless the packet >> is tiny. The position of the nonce was deliberately chosen to provide room >> for this type of optimization. I recall at least one person (Martin >> Thomson) arguing for having this possibility. >> >> [1] >> https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency >> >> -- >> Kazuho Oku >> > -- Kazuho Oku
- AEAD and header encryption overhead in QUIC Kazuho Oku
- RE: AEAD and header encryption overhead in QUIC Nick Banks
- Re: AEAD and header encryption overhead in QUIC Matt Joras
- Re: AEAD and header encryption overhead in QUIC Mikkel Fahnøe Jørgensen
- Re: AEAD and header encryption overhead in QUIC Kazuho Oku
- Re: AEAD and header encryption overhead in QUIC Matt Joras
- Re: AEAD and header encryption overhead in QUIC Kazuho Oku