[CFRG] Re: BLAKE3 I-D

Christopher Patton <cpatton@cloudflare.com> Tue, 20 August 2024 13:50 UTC

Return-Path: <cpatton@cloudflare.com>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6C25CC169404 for <cfrg@ietfa.amsl.com>; Tue, 20 Aug 2024 06:50:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=cloudflare.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hYr0UKa12D7V for <cfrg@ietfa.amsl.com>; Tue, 20 Aug 2024 06:50:50 -0700 (PDT)
Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3419EC1519B4 for <cfrg@ietf.org>; Tue, 20 Aug 2024 06:50:50 -0700 (PDT)
Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-450059a25b9so49785891cf.0 for <cfrg@ietf.org>; Tue, 20 Aug 2024 06:50:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1724161849; x=1724766649; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=pQDjq1L3iCgg4zvySEeiEn+QfBx+YN5kYXxMKpvLQzk=; b=ddGqoN2DOssVYpKai7cZn85ty0ZUQGwNZfJfvqONF6Rkp8ZhT5krU9HWbYwiqddLXo NVwJfRPcmOdCsZjXjgfYRbJmJ4HG2Xsxze11eRwxfoCp3ggbQtI9W9pOja058yHdPsEV GGClCm9BhLWzE9tFVp640M5MNDMe4yqpBSPPKhIlACeCtocRKZwqAzViDGjlGAFGg7lt rfND05oqcpTo3/O0rRAuL4S0ubqqJyQa6EWexnIVwAC9dAxa8mUDcO7BsDpchkman9FU V16GUxnLM9u0t97b+Xp7uP2EjV3CVgTxFhFIaQ+gpumRrTamKs9eKZjsT+SSRlMpJhoE mBug==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724161849; x=1724766649; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pQDjq1L3iCgg4zvySEeiEn+QfBx+YN5kYXxMKpvLQzk=; b=e2m0n1aB5guzpwcpoVdV+pGtFIORLssX0mUHxng6XZnJeEKIjbjjYeaDG+RspK4Kax zGyVIdb77DjonYt+238tCygkQNBI/HB4R4qPNnJAUoDFmpc/M2DuUCBkGjXH7mAMIa49 k2//PBhUR035/Easjq6joWpOXZ5HzXvpOvqWV5GZ74dgAe4lHPPb9QpmVYpZqunhutiI haOl5YiSj4WF14oxm6V8XQFy+Ob2Vvb6zf4gg7xXSPme5oviaUk1WYMoDTaeClA7fwVq oTjtP/lihJv9KKZSk/lkHzFvoLF4X2cT3QzFOoEObuuwP/ELlqR12ZI7P4hV0OoluTb1 kbHQ==
X-Forwarded-Encrypted: i=1; AJvYcCXLQm8d5+fx2ISZBqudM2HcZftlkWrsbU1vzJAYEJT97mKnWxBvPSnteR1+jmPt5y27x2fDi3CBOPxVF0F7
X-Gm-Message-State: AOJu0Ywhpq6ey1tqBbNj3zbP1qp4S+5ypBkjpp70P8zpFHYxuxAWAPKM BN0UHhzkGJN1bZt3Mq0dosy9YKl1kn1QSnG612sQgTC/QvY8xgiK/vii1c9iDE3XZhfAiJ/bPYX Di+WME3DT93aET5Hca4NnYML6PYo99/rhuycWmA==
X-Google-Smtp-Source: AGHT+IGb1tOI1tImq3Plk9MXmeV9Myk2fZCmJOqyfeA68W8zmIW4fSjkXo65f5uVE6WB/lSE+RlPW80dGNa/f8iFycU=
X-Received: by 2002:a05:622a:1247:b0:453:1334:9725 with SMTP id d75a77b69052e-454e50dd69fmr53951011cf.3.1724161849049; Tue, 20 Aug 2024 06:50:49 -0700 (PDT)
MIME-Version: 1.0
References: <CAGiyFdfKZ1qsPR62kb8M_EqfGOfuU4nkEY4JjLCwBb_JOZdxOA@mail.gmail.com> <CAMr0u6kpcRvsifS3GRX0LNCD1LODo_pePZo51K7okfQtatEgNA@mail.gmail.com> <CAGiyFdfAFT4HzxNLB4QKdGs8F8QD-y5LmMpnH=C+O8+2XF8eBQ@mail.gmail.com> <CAG2Zi20x1WvGH3FdhOW0HjpDfJhgfnSJUvXsoqywgn4vy_1eGA@mail.gmail.com> <CA+6di1kw4rPcseBUfAc=kTLbQSXGyph9wHZV-fn9CEg5KjOkgA@mail.gmail.com> <CAG2Zi21v9pDu_EOB1aOyFwsJ+ztoZ5tnk7Dimhap7xGMryJttQ@mail.gmail.com> <CAGiyFdeUaYaKfDwe1xyRQmB1svW3OBpCRXKvOnA-hcyi5zec-w@mail.gmail.com> <CAG2Zi2277O_aJhY1v5N6vGFK1_TPFHQ5w89RJgmzfbSBmGhmcw@mail.gmail.com> <CA+6di1nvyi53OUquH-JBGm_nk34f+R+UTLPB81ct9mSbSmOUeQ@mail.gmail.com> <CA+6di1=22A4WJVaTGPdVp7SeEh-4dqqXq+jnRv6agNuCZuNW3g@mail.gmail.com>
In-Reply-To: <CA+6di1=22A4WJVaTGPdVp7SeEh-4dqqXq+jnRv6agNuCZuNW3g@mail.gmail.com>
From: Christopher Patton <cpatton@cloudflare.com>
Date: Tue, 20 Aug 2024 06:50:38 -0700
Message-ID: <CAG2Zi21oU_dcJnmy+jV-oxOX9KVVRzDQ01+WjSsOesNVaphBWg@mail.gmail.com>
To: Jack O'Connor <oconnor663@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000008c6a3906201db619"
Message-ID-Hash: ZMYFQBYCZ7EK3LWLR2CZHTKHLZ4MSBML
X-Message-ID-Hash: ZMYFQBYCZ7EK3LWLR2CZHTKHLZ4MSBML
X-MailFrom: cpatton@cloudflare.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-cfrg.irtf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: cfrg@ietf.org, cfrg-chairs@ietf.org, Zooko O'Whielacronx <zookog@gmail.com>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [CFRG] Re: BLAKE3 I-D
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/OG9WI4mFMA-HLJ-hg9iT8QL8M54>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Owner: <mailto:cfrg-owner@irtf.org>
List-Post: <mailto:cfrg@irtf.org>
List-Subscribe: <mailto:cfrg-join@irtf.org>
List-Unsubscribe: <mailto:cfrg-leave@irtf.org>

Hi Jack, I didn't see a performance improvement, but I may be on the wrong
platform. If you'd like to try it out, check out:
https://github.com/cjpatton/libprio-rs/tree/exp/blake3-for-idpf
argo bench --features experimental --bench speed_tests -- "^idpf"


On Mon, Aug 19, 2024 at 12:05 PM Jack O'Connor <oconnor663@gmail.com> wrote:

> Update: Those SIMD optimizations (AVX-512-only, Unix-only) have now been
> published as v1.5.4 of the `blake3` crate. Here's the before and after on
> my laptop.
>
> Master but with the new optimizations commented out
> <https://gist.github.com/oconnor663/4183a403cc09974aa03046ead42dffe3>:
> $ cargo +nightly bench xof
> ...
> test bench_xof_01_block                ... bench:          40.68 ns/iter
> (+/- 1.74) = 1600 MB/s
> test bench_xof_02_blocks               ... bench:          77.85 ns/iter
> (+/- 3.37) = 1662 MB/s
> test bench_xof_04_blocks               ... bench:         152.42 ns/iter
> (+/- 3.35) = 1684 MB/s
> test bench_xof_08_blocks               ... bench:         300.24 ns/iter
> (+/- 3.17) = 1706 MB/s
> test bench_xof_16_blocks               ... bench:         598.23 ns/iter
> (+/- 5.45) = 1712 MB/s
> test bench_xof_32_blocks               ... bench:       1,194.53 ns/iter
> (+/- 119.65) = 1715 MB/s
> test bench_xof_64_blocks               ... bench:       2,379.19 ns/iter
> (+/- 13.14) = 1721 MB/s
>
> v1.5.4 <https://crates.io/crates/blake3/1.5.4>:
> $ cargo +nightly bench xof
> ...
> test bench_xof_01_block                ... bench:          40.95 ns/iter
> (+/- 4.41) = 1600 MB/s
> test bench_xof_02_blocks               ... bench:          47.62 ns/iter
> (+/- 1.04) = 2723 MB/s
> test bench_xof_04_blocks               ... bench:          49.32 ns/iter
> (+/- 1.78) = 5224 MB/s
> test bench_xof_08_blocks               ... bench:          93.44 ns/iter
> (+/- 3.44) = 5505 MB/s
> test bench_xof_16_blocks               ... bench:         119.09 ns/iter
> (+/- 8.88) = 8605 MB/s
> test bench_xof_32_blocks               ... bench:         235.22 ns/iter
> (+/- 3.97) = 8714 MB/s
> test bench_xof_64_blocks               ... bench:         468.84 ns/iter
> (+/- 26.65) = 8752 MB/s
>
> If you happen to have an AVX-512-supporting machine, I'd be curious to see
> if your benchmarks improve.
>
> On Thu, Aug 15, 2024 at 4:33 PM Jack O'Connor <oconnor663@gmail.com>
> wrote:
>
>> I'm slightly embarrassed to report that our XOF implementation is slower
>> than it should be. It should benefit from all the same SIMD optimizations
>> as the input side, but our current assembly implementations only
>> parallelize input, and the XOF uses a slower codepath with less
>> parallelism. Concretely on a CPU with AVX-512 support, for outputs longer
>> than 1 KiB or so, it should be ~5x faster than it is. (Not as fast as
>> hardware-accelerated AES-CTR though.)
>>
>> If you do have an AVX-512 Linux machine, and you want to benchmark the
>> properly optimized XOF, it's currently on this branch:
>> https://github.com/BLAKE3-team/BLAKE3/tree/xof_integration_rebase. I've
>> been dragging my feet on shipping that, but I should go ahead and push it
>> out, even though it doesn't cover all our target platforms.
>>
>> On Thu, Aug 15, 2024 at 2:06 PM Christopher Patton <
>> cpatton@cloudflare.com> wrote:
>>
>>> Hi all,
>>>
>>> Before adopting BLAKE3, I think it would be useful to see how much of a
>>> difference it would make in our applications. I would suggest looking
>>> through RFCs published by CFRG and assess how performance would change if
>>> they could have used BLAKE3. Off the top of my head:
>>> - RFC 9180 - HPKE (replace HKDF?)
>>> - draft-irtf-cfrg-opaque - OPAQUE
>>> - RFC 9380 - hashing to elliptic curves
>>>
>>> I'll add my own data point: draft-irtf-cfrg-vdaf. This draft specifies
>>> an incremental distributed point function (IDPF), a type of function secret
>>> sharing used in some MPC protocols. Most of the computation is spent on XOF
>>> evaluation. For performance reasons, we try to use AES wherever we can in
>>> order to get hardware support. We end up with a mix of TurboSHAKE128 and
>>> AES, which is not ideal. It would be much nicer if we could afford to use a
>>> dedicated XOF, but TurboSHAKE128 is not fast enough in software. I threw
>>> together some benchmarks for B3:
>>>
>>> https://github.com/cjpatton/libprio-rs/compare/main...cjpatton:libprio-rs:exp/blake3-for-idpf?expand=1
>>>
>>> The results were interesting. Compared to Turbo, B3 is 30% faster, as
>>> expected. Compared to the baseline (mix of Turbo and AES), B3 is 2-3x
>>> slower for the client operation, as expected; but the server was slightly
>>> faster, which frankly is a bit of a mystery. We'll need to dig into the
>>> code more to be certain, as there may be some obvious inefficiencies on the
>>> client side. But preliminarily, I would say B3 is probably too slow in
>>> software for this application.
>>>
>>> Chris P.
>>>
>>>
>>>
>>>
>>>