Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Thu, 15 August 2024 23:46 UTC

From: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Date: Thu, 15 Aug 2024 16:46:17 -0700
To: Jack O'Connor <oconnor663@gmail.com>
CC: cfrg@ietf.org, cfrg-chairs@ietf.org, Zooko O'Whielacronx <zookog@gmail.com>
Subject: [CFRG] Re: BLAKE3 I-D
Was this received by the CFRG ml? (Apologies for the double posting if yes.)

On Thu, Aug 15, 2024 at 4:34 PM Jack O'Connor <oconnor663@gmail.com> wrote:

> I'm slightly embarrassed to report that our XOF implementation is slower
> than it should be. It should benefit from all the same SIMD optimizations
> as the input side, but our current assembly implementations only
> parallelize input, and the XOF uses a slower codepath with less
> parallelism. Concretely on a CPU with AVX-512 support, for outputs longer
> than 1 KiB or so, it should be ~5x faster than it is. (Not as fast as
> hardware-accelerated AES-CTR though.)
> If you do have an AVX-512 Linux machine, and you want to benchmark the
> properly optimized XOF, it's currently on this branch:
> https://github.com/BLAKE3-team/BLAKE3/tree/xof_integration_rebase. I've
> been dragging my feet on shipping that, but I should go ahead and push it
> out, even though it doesn't cover all our target platforms.
> On Thu, Aug 15, 2024 at 2:06 PM Christopher Patton <cpatton@cloudflare.com>
> wrote:
>> Hi all,
>> Before adopting BLAKE3, I think it would be useful to see how much of a
>> difference it would make in our applications. I would suggest looking
>> through RFCs published by CFRG and assess how performance would change if
>> they could have used BLAKE3. Off the top of my head:
>> - RFC 9180 - HPKE (replace HKDF?)
>> - draft-irtf-cfrg-opaque - OPAQUE
>> - RFC 9380 - hashing to elliptic curves
>> I'll add my own data point: draft-irtf-cfrg-vdaf. This draft specifies an
>> incremental distributed point function (IDPF), a type of function secret
>> sharing used in some MPC protocols. Most of the computation is spent on XOF
>> evaluation. For performance reasons, we try to use AES wherever we can in
>> order to get hardware support. We end up with a mix of TurboSHAKE128 and
>> AES, which is not ideal. It would be much nicer if we could afford to use a
>> dedicated XOF, but TurboSHAKE128 is not fast enough in software. I threw
>> together some benchmarks for B3:
>> https://github.com/cjpatton/libprio-rs/compare/main...cjpatton:libprio-rs:exp/blake3-for-idpf?expand=1
>> The results were interesting. Compared to Turbo, B3 is 30% faster, as
>> expected. Compared to the baseline (mix of Turbo and AES), B3 is 2-3x
>> slower for the client operation, as expected; but the server was slightly
>> faster, which frankly is a bit of a mystery. We'll need to dig into the
>> code more to be certain, as there may be some obvious inefficiencies on the
>> client side. But preliminarily, I would say B3 is probably too slow in
>> software for this application.
>> Chris P.