[CFRG] Re: BLAKE3 I-D

Jack O'Connor <oconnor663@gmail.com> Mon, 19 August 2024 19:05 UTC

Return-Path: <oconnor663@gmail.com>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7FCE9C19ECBA; Mon, 19 Aug 2024 12:05:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.855
X-Spam-Level:
X-Spam-Status: No, score=-1.855 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fcJPF60ckd2V; Mon, 19 Aug 2024 12:05:23 -0700 (PDT)
Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A32AEC1CAE66; Mon, 19 Aug 2024 12:05:23 -0700 (PDT)
Received: by mail-ot1-x32d.google.com with SMTP id 46e09a7af769-7092dd03223so1158339a34.1; Mon, 19 Aug 2024 12:05:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724094323; x=1724699123; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yBq04OiZw5N0r85omXCaHAwf1ueDxQV4RJ3tuSQ2SFI=; b=A0kk5ECcf9MIzMWDN3hXqUB3U/9B1zOk1PjikexthPKwyymiHvxHV+207FEAepBzhe j7CJosp8CgCWxk4vqPiMIe4Ki9F+t+gME7h/DF0UUkKwawyuTFf8IwHu6fBTFnqLH0I4 lJvN8DAvPAU1rWZRjqfw+/qesKauHKAbY8Sx/jOF95FilMwH2kLVLo1ZphW2dAQr6kEe OjFmnmqb4x06BLvJBuYvNd4TN52of4r8gm6ALRhOvqu79sYOscCWKnhMWYzLEvSHsxu7 i6Hwj8YcN3CwXCIJhw70q7dXglx7NiipY0vQGNB96XNvrjv45D67lsXIaufzGQ9frPK/ tOTQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724094323; x=1724699123; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yBq04OiZw5N0r85omXCaHAwf1ueDxQV4RJ3tuSQ2SFI=; b=gXjskx8lPsxQIuIRi+Rmoy/IxIW1b9XevkWv4nL6XIBAmo2fvs4yIDP1eammLWFzsE COhGAC2zAkq9KKNWE84UAj76LhOmBDO7R9RS5JUp1ykUs3uKLjy+I4OXZoDsNQT8b8OZ q3wP0v90cRaWdPm/JdY0wKbqXbFCFaUCvJ0bsahtSkimcExKax3sXc563b4piOq6cwtu fUa+mukh/eJR4v33gm4P92mF9YG2bCpXe8lkvDn8I8sMpFCUWxSFgK9TMdRHjmynTak7 h4Zbvrk2rKx7JNLo5Vft2fHIyUxGFGCg4quMXQEZusebC3/Dal2BMhHTA2RqQXYUMb5W h46g==
X-Forwarded-Encrypted: i=1; AJvYcCUF862u1RKJDeuq4UHk6zcAthmTEdU5CrntHag0yhMaLgyxNrEZlE4J5mtnZvHVe57gW7f193iXHkpaWsp1UVXMaLfWxtahrcWa7aGZXYVhmcOdjg==
X-Gm-Message-State: AOJu0YxNlrXOxVASaqRETCzDmNde9jIKt9230iTydSEFAjISlqRg41ca 2D6ipzulg5HDZDZXIZ+6+7v8ejctagc0VCM+LDmBGmHqwuQnaDd3Gua/LAEtafLT/s1YRfJNlh8 Aj+EfGd/SZML/moFfT0OaP+ESUS0=
X-Google-Smtp-Source: AGHT+IFxLuHAPu82uHRj3dNqtbwPPLUNSZCVkdTIrf9QXfkStKB5gyTUTNfNT8OIWqoOUqsm2txwphIHiTso9gsS4BE=
X-Received: by 2002:a05:6830:2803:b0:708:b100:3a35 with SMTP id 46e09a7af769-70cac88f62bmr11874984a34.21.1724094322791; Mon, 19 Aug 2024 12:05:22 -0700 (PDT)
MIME-Version: 1.0
References: <CAGiyFdfKZ1qsPR62kb8M_EqfGOfuU4nkEY4JjLCwBb_JOZdxOA@mail.gmail.com> <CAMr0u6kpcRvsifS3GRX0LNCD1LODo_pePZo51K7okfQtatEgNA@mail.gmail.com> <CAGiyFdfAFT4HzxNLB4QKdGs8F8QD-y5LmMpnH=C+O8+2XF8eBQ@mail.gmail.com> <CAG2Zi20x1WvGH3FdhOW0HjpDfJhgfnSJUvXsoqywgn4vy_1eGA@mail.gmail.com> <CA+6di1kw4rPcseBUfAc=kTLbQSXGyph9wHZV-fn9CEg5KjOkgA@mail.gmail.com> <CAG2Zi21v9pDu_EOB1aOyFwsJ+ztoZ5tnk7Dimhap7xGMryJttQ@mail.gmail.com> <CAGiyFdeUaYaKfDwe1xyRQmB1svW3OBpCRXKvOnA-hcyi5zec-w@mail.gmail.com> <CAG2Zi2277O_aJhY1v5N6vGFK1_TPFHQ5w89RJgmzfbSBmGhmcw@mail.gmail.com> <CA+6di1nvyi53OUquH-JBGm_nk34f+R+UTLPB81ct9mSbSmOUeQ@mail.gmail.com>
In-Reply-To: <CA+6di1nvyi53OUquH-JBGm_nk34f+R+UTLPB81ct9mSbSmOUeQ@mail.gmail.com>
From: Jack O'Connor <oconnor663@gmail.com>
Date: Mon, 19 Aug 2024 12:04:56 -0700
Message-ID: <CA+6di1=22A4WJVaTGPdVp7SeEh-4dqqXq+jnRv6agNuCZuNW3g@mail.gmail.com>
To: Christopher Patton <cpatton@cloudflare.com>
Content-Type: multipart/alternative; boundary="000000000000ab5ded06200dfdd2"
Message-ID-Hash: XFT7TIPH7WGE3ETPLEDRM424Z77BVUKP
X-Message-ID-Hash: XFT7TIPH7WGE3ETPLEDRM424Z77BVUKP
X-MailFrom: oconnor663@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cfrg.irtf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: cfrg@ietf.org, cfrg-chairs@ietf.org, Zooko O'Whielacronx <zookog@gmail.com>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [CFRG] Re: BLAKE3 I-D
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/zWSjMF3Vozxcp_gn0IfikoHZATU>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Owner: <mailto:cfrg-owner@irtf.org>
List-Post: <mailto:cfrg@irtf.org>
List-Subscribe: <mailto:cfrg-join@irtf.org>
List-Unsubscribe: <mailto:cfrg-leave@irtf.org>

Update: Those SIMD optimizations (AVX-512-only, Unix-only) have now been
published as v1.5.4 of the `blake3` crate. Here's the before and after on
my laptop.

Master but with the new optimizations commented out
<https://gist.github.com/oconnor663/4183a403cc09974aa03046ead42dffe3>:
$ cargo +nightly bench xof
...
test bench_xof_01_block                ... bench:          40.68 ns/iter
(+/- 1.74) = 1600 MB/s
test bench_xof_02_blocks               ... bench:          77.85 ns/iter
(+/- 3.37) = 1662 MB/s
test bench_xof_04_blocks               ... bench:         152.42 ns/iter
(+/- 3.35) = 1684 MB/s
test bench_xof_08_blocks               ... bench:         300.24 ns/iter
(+/- 3.17) = 1706 MB/s
test bench_xof_16_blocks               ... bench:         598.23 ns/iter
(+/- 5.45) = 1712 MB/s
test bench_xof_32_blocks               ... bench:       1,194.53 ns/iter
(+/- 119.65) = 1715 MB/s
test bench_xof_64_blocks               ... bench:       2,379.19 ns/iter
(+/- 13.14) = 1721 MB/s

v1.5.4 <https://crates.io/crates/blake3/1.5.4>:
$ cargo +nightly bench xof
...
test bench_xof_01_block                ... bench:          40.95 ns/iter
(+/- 4.41) = 1600 MB/s
test bench_xof_02_blocks               ... bench:          47.62 ns/iter
(+/- 1.04) = 2723 MB/s
test bench_xof_04_blocks               ... bench:          49.32 ns/iter
(+/- 1.78) = 5224 MB/s
test bench_xof_08_blocks               ... bench:          93.44 ns/iter
(+/- 3.44) = 5505 MB/s
test bench_xof_16_blocks               ... bench:         119.09 ns/iter
(+/- 8.88) = 8605 MB/s
test bench_xof_32_blocks               ... bench:         235.22 ns/iter
(+/- 3.97) = 8714 MB/s
test bench_xof_64_blocks               ... bench:         468.84 ns/iter
(+/- 26.65) = 8752 MB/s

If you happen to have an AVX-512-supporting machine, I'd be curious to see
if your benchmarks improve.

On Thu, Aug 15, 2024 at 4:33 PM Jack O'Connor <oconnor663@gmail.com> wrote:

> I'm slightly embarrassed to report that our XOF implementation is slower
> than it should be. It should benefit from all the same SIMD optimizations
> as the input side, but our current assembly implementations only
> parallelize input, and the XOF uses a slower codepath with less
> parallelism. Concretely on a CPU with AVX-512 support, for outputs longer
> than 1 KiB or so, it should be ~5x faster than it is. (Not as fast as
> hardware-accelerated AES-CTR though.)
>
> If you do have an AVX-512 Linux machine, and you want to benchmark the
> properly optimized XOF, it's currently on this branch:
> https://github.com/BLAKE3-team/BLAKE3/tree/xof_integration_rebase. I've
> been dragging my feet on shipping that, but I should go ahead and push it
> out, even though it doesn't cover all our target platforms.
>
> On Thu, Aug 15, 2024 at 2:06 PM Christopher Patton <cpatton@cloudflare.com>
> wrote:
>
>> Hi all,
>>
>> Before adopting BLAKE3, I think it would be useful to see how much of a
>> difference it would make in our applications. I would suggest looking
>> through RFCs published by CFRG and assess how performance would change if
>> they could have used BLAKE3. Off the top of my head:
>> - RFC 9180 - HPKE (replace HKDF?)
>> - draft-irtf-cfrg-opaque - OPAQUE
>> - RFC 9380 - hashing to elliptic curves
>>
>> I'll add my own data point: draft-irtf-cfrg-vdaf. This draft specifies an
>> incremental distributed point function (IDPF), a type of function secret
>> sharing used in some MPC protocols. Most of the computation is spent on XOF
>> evaluation. For performance reasons, we try to use AES wherever we can in
>> order to get hardware support. We end up with a mix of TurboSHAKE128 and
>> AES, which is not ideal. It would be much nicer if we could afford to use a
>> dedicated XOF, but TurboSHAKE128 is not fast enough in software. I threw
>> together some benchmarks for B3:
>>
>> https://github.com/cjpatton/libprio-rs/compare/main...cjpatton:libprio-rs:exp/blake3-for-idpf?expand=1
>>
>> The results were interesting. Compared to Turbo, B3 is 30% faster, as
>> expected. Compared to the baseline (mix of Turbo and AES), B3 is 2-3x
>> slower for the client operation, as expected; but the server was slightly
>> faster, which frankly is a bit of a mystery. We'll need to dig into the
>> code more to be certain, as there may be some obvious inefficiencies on the
>> client side. But preliminarily, I would say B3 is probably too slow in
>> software for this application.
>>
>> Chris P.
>>
>>
>>
>>
>>