### Re: [CFRG] hash_to_field requires implementing a new 25519 field op

Loup Vaillant-David <loup@loup-vaillant.fr> Wed, 21 July 2021 23:10 UTC

Return-Path: <loup@loup-vaillant.fr>

X-Original-To: cfrg@ietfa.amsl.com

Delivered-To: cfrg@ietfa.amsl.com

Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8450F3A2E11 for <cfrg@ietfa.amsl.com>; Wed, 21 Jul 2021 16:10:52 -0700 (PDT)

X-Virus-Scanned: amavisd-new at amsl.com

X-Spam-Flag: NO

X-Spam-Score: -1.898

X-Spam-Level:

X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no

Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5KFj2V4Nb61x for <cfrg@ietfa.amsl.com>; Wed, 21 Jul 2021 16:10:47 -0700 (PDT)

Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C892B3A2E0E for <cfrg@irtf.org>; Wed, 21 Jul 2021 16:10:45 -0700 (PDT)

Received: (Authenticated sender: loup@loup-vaillant.fr) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 554051BF204; Wed, 21 Jul 2021 23:10:40 +0000 (UTC)

Message-ID: <05222110505718e6aa884adb78e185f17ea40ac2.camel@loup-vaillant.fr>

From: Loup Vaillant-David <loup@loup-vaillant.fr>

To: Rene Struik <rstruik.ext@gmail.com>, "Riad S. Wahby" <rsw@jfet.org>, Filippo Valsorda <filippo@ml.filippo.io>

Cc: cfrg@irtf.org

Date: Thu, 22 Jul 2021 01:10:39 +0200

In-Reply-To: <aebaaa16-8fda-bcf0-067c-e81d257a768a@gmail.com>

References: <aaa46d82-f05d-4558-8a2a-6d945fe9cb1d@www.fastmail.com> <20210721191123.i3f33p3qvkwxlbtl@kaon.local> <c16f6fe23f1b11e9a311f4b57cabfbef4a517b58.camel@loup-vaillant.fr> <aebaaa16-8fda-bcf0-067c-e81d257a768a@gmail.com>

Content-Type: text/plain; charset="UTF-8"

X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2

Mime-Version: 1.0

Content-Transfer-Encoding: 7bit

Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/w-xPGrg4x0P5kfa8g566Bp4RiH8>

Subject: Re: [CFRG] hash_to_field requires implementing a new 25519 field op

X-BeenThere: cfrg@irtf.org

X-Mailman-Version: 2.1.29

Precedence: list

List-Id: Crypto Forum Research Group <cfrg.irtf.org>

List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>

List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>

List-Post: <mailto:cfrg@irtf.org>

List-Help: <mailto:cfrg-request@irtf.org?subject=help>

List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>

X-List-Received-Date: Wed, 21 Jul 2021 23:10:53 -0000

Hi Rene, Something similar could indeed work (untested code below): void reduce(uint8_t r[32], uint8_t w[64]) { fe low, high; fe_frombytes(low , w); fe_frombytes(high, w + 32); fe_mul_small(high, high, 19); fe_add (low , low, high); fe_tobytes(r, low); // maybe wipe low and high } There's a little problem though: this reduction ignores *two* bits from the input, one of which is right in the middle. As much as I love how it makes the implementation easy, you might hate how it makes the specification weird. Or I could use the same trick to read the two missing bits, and have a full, clean, wide reduction. Let's see (untested code): void reduce(uint8_t r[32], uint8_t w[64]) { fe low, high; fe_frombytes(low , w); fe_frombytes(high, w + 32); fe_mul_small(high, high, 38); fe_add (low , low, high); low[0] += (w[31] >> 7) * 19; low[0] += (w[63] >> 7) * 19 * 38; // may be omited fe_tobytes(r, low); // maybe wipe low and high } What do you know, that's much less ugly than I anticipated. Nice. Slower than a dedicated reduction, but if I'm being honest the performance difference probably never matters. Thanks for the tip, Loup. On Wed, 2021-07-21 at 18:22 -0400, Rene Struik wrote: > Hi Loup, Filippo: > > I do not see the modular reduction implementation issue: > > if x=x0+x1*2^m and N=2^m-alpha, x (mod N) = (x0 +alpha*x1) (mod N) = > ZNadd(x0,ZNmul(alpha,x1)), where ZNadd and ZNmul are operators on > the > set ZN=[0..N-1). This may not be the most efficient, but using two > invocations of a binary operator to implement a single modular > reduction > should work for "Arthur Random Programmer" (whoever that new [to me] > creature is). For p=2^255-19, one could take m=255 and and alpha=19 > (i.e., N=p) or m=256 and alpha=38 (i.e., N=2*p), depending on libary > details. The above also works for NIST prime curves, Brainpool > curves, > Gost curves, secp256k1 (all with negligible relative incremental > cost). > > I do not think a specification needs to spell out all implementation > details, as long as it is plausible that effective implementations > are > possible. > > This all being said, the suitability of curve maps (including > detailed > steps) depends on the application(s) one wishes to cater to. {I do > not > recall having seen such discussion, e.g., related to password > schemes > and pairing-based applications.} > > Rene > > On 2021-07-21 5:42 p.m., Loup Vaillant-David wrote: > > Hi, > > > > > > > But (as you know) in general it is not safe to reduce > > > ceil(log2(p)) > > > bits mod p, since this can give significant bias for general p. > > > The > > > goal in the draft is to specify a hash-to-field method that is > > > safe > > > for any p without requiring any parameter except the security > > > level > > > in bits, k, and which gives bias roughly 2^-k. Adding parameters > > > is > > > dangerous in the sense that incorrectly choosing parameters gives > > > a > > > hash-to-field function that looks OK but is subtly broken. > > > > The three primes I know of are pretty clear cut. Two are clearly > > safe, > > the other clearly is not. > > > > 1. 2^255 - 2^4 - 2^1 - 1 (Curve25519) > > 2. 2^448 - 2^224 - 1 (Curve448) > > 3. 2^256 - 2^224 + 2^96 + 2^192 - 1 (P-256) > > > > The first has a bias of about 2^-(255-4) == 2^-252. => Safe > > The second has a bias of about 2^-(448-224) == 2^-224. => Safe > > The third has a bias of about 2^-(256-224) == 2^-32. => Unsafe > > > > So, Curve25519 and Curve448 have biases well below 2^-128, and as > > such > > are virtually undetectable (you need about 2^128 samples, which you > > are > > not going to get). P-256's bias however can be detected with as > > little > > as a few billion samples, and as such is easily detected in > > realistic > > settings. > > > > Yes, having yet another parameter is a bummer, but as long as we > > don't > > use primes with a bias close to 2^-128, choosing an acceptable > > threshold (like 2^-128) should not be too much of a problem. > > > > > > > In my mind the second is the much more convincing argument, > > > because > > > every EC-relevant field library that I'm aware of implements a > > > wide > > > reduction already, inside the multiplication routine (which gives > > > a > > > p^2-size intermediate result). > > > > > > So this > > > > > > > Curve25519 implementations will all have to implement and > > > > expose > > > > a new field operation (like https://git.io/JlU6L) which will be > > > > only > > > > used for h2c, meaning it will not be well tested or maintained, > > > > or h2c implementations will have to roll their own, probably > > > > in variable time. > > > > > > isn't really true. If one exposes the ceil(log2(p^2)) bit > > > reduction > > > that's already implemented for multiplication, then there's no > > > need > > > for extra code, and maybe even no need for extra tests. > > > > Wait a minute, that reduction is not *nearly* as large as you make > > it > > out to be. Let's take a look at Curve25519's ref10 implementation > > on > > SUPERCOP. You would think that because the result of the > > multiplication > > (before carry propagation) is stored in 10 64-bit signed integers, > > the > > number they represent easily exceeds 2^512, or even approaches > > 2^640. > > > > Nope. > > > > In fact, that number those wide limbs represent does not even > > exceed > > 2^293. That's because the limbs overlap: even though they're 63-bit > > wide, they're only 25 or 26 bits apart. > > > > How then can we compute the multiplication *at all*, would you say? > > I've explained it on > > https://loup-vaillant.fr/tutorials/poly1305-design > > Search for "Cheating at modular arithmetic". > > > > The trick is to perform part of the modular reduction by taking > > advantage of the fact that you're dealing with a pseudo Mersene > > prime. > > See, when you work modulo 2^255-19, multiplying 2^255 is the same > > as > > multiplying by 19. So you can just shift numbers to the right 255 > > bits, > > and multiply by 19 to compensate. That's why field multiplication > > has > > those *19 all over the place: instead of representing the number > > over > > twenty limbs, we just shift directly to one of the ten least > > significant limbs, and multiply by 19 to compensate. > > > > Long story short, your trick won't work. We will need to implement > > a > > separate reduction routine. > > > > --- > > > > And before someone is tempted to point out that EdDSA involves the > > reduction of a 512-bit modulo the prime order of the curve that we > > could reuse, hold your horses: while TweetNaCl and Monocypher uses > > a > > fairly generic algorithm that can easily be adapted by changing a > > single constant, I've seen ref10 use a highly optimised reduction > > where > > the value of the order of the curve is deeply embedded in the > > algorithm. I personally could not even make sense of it, so I don't > > think it can be adapted. > > > > Moreover, keep in mind that 2^255-19 is a pseudo Mersene prime, and > > as > > such can use a much faster reduction algorithm: just shift the most > > significant bits to the right, multiply by 19, and add it to the > > least > > significant bits. You'll get a full reduction in a couple > > iterations > > (less if you're clever). Still, implementing this is not for Arthur > > Random Programmer. I personally would test it quite extensively, > > and > > write a proof of correctness on top. > > > > > > > Would it partially allay your concerns if the hash-to-field > > > section > > > explicitly pointed out that it's possible to use the > > > multiplication > > > routine (or, well, half of it) for the wide modular reduction? > > > > Since to my knowledge it is *not* possible, it would actually make > > me > > very worried: either I missed something big (and I want to know), > > or > > the RFC writers did (and it needs to be pointed out). > > > > Loup. > > > > > > _______________________________________________ > > CFRG mailing list > > CFRG@irtf.org > > https://www.irtf.org/mailman/listinfo/cfrg > >

- [CFRG] hash_to_field requires implementing a new … Filippo Valsorda
- Re: [CFRG] hash_to_field requires implementing a … Riad S. Wahby
- Re: [CFRG] hash_to_field requires implementing a … Filippo Valsorda
- Re: [CFRG] hash_to_field requires implementing a … Loup Vaillant-David
- Re: [CFRG] hash_to_field requires implementing a … Rene Struik
- Re: [CFRG] hash_to_field requires implementing a … Riad S. Wahby
- Re: [CFRG] hash_to_field requires implementing a … Riad S. Wahby
- Re: [CFRG] hash_to_field requires implementing a … Riad S. Wahby
- Re: [CFRG] hash_to_field requires implementing a … Loup Vaillant-David