Re: [CFRG] hash_to_field requires implementing a new 25519 field op

Loup Vaillant-David <loup@loup-vaillant.fr> Wed, 21 July 2021 23:10 UTC

Return-Path: <loup@loup-vaillant.fr>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8450F3A2E11 for <cfrg@ietfa.amsl.com>; Wed, 21 Jul 2021 16:10:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5KFj2V4Nb61x for <cfrg@ietfa.amsl.com>; Wed, 21 Jul 2021 16:10:47 -0700 (PDT)
Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C892B3A2E0E for <cfrg@irtf.org>; Wed, 21 Jul 2021 16:10:45 -0700 (PDT)
Received: (Authenticated sender: loup@loup-vaillant.fr) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 554051BF204; Wed, 21 Jul 2021 23:10:40 +0000 (UTC)
Message-ID: <05222110505718e6aa884adb78e185f17ea40ac2.camel@loup-vaillant.fr>
From: Loup Vaillant-David <loup@loup-vaillant.fr>
To: Rene Struik <rstruik.ext@gmail.com>, "Riad S. Wahby" <rsw@jfet.org>, Filippo Valsorda <filippo@ml.filippo.io>
Cc: cfrg@irtf.org
Date: Thu, 22 Jul 2021 01:10:39 +0200
In-Reply-To: <aebaaa16-8fda-bcf0-067c-e81d257a768a@gmail.com>
References: <aaa46d82-f05d-4558-8a2a-6d945fe9cb1d@www.fastmail.com> <20210721191123.i3f33p3qvkwxlbtl@kaon.local> <c16f6fe23f1b11e9a311f4b57cabfbef4a517b58.camel@loup-vaillant.fr> <aebaaa16-8fda-bcf0-067c-e81d257a768a@gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/w-xPGrg4x0P5kfa8g566Bp4RiH8>
Subject: Re: [CFRG] hash_to_field requires implementing a new 25519 field op
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Jul 2021 23:10:53 -0000

Hi Rene,

Something similar could indeed work (untested code below):

    void reduce(uint8_t r[32], uint8_t w[64])
    {
        fe low, high;
        fe_frombytes(low , w);
        fe_frombytes(high, w + 32);
        fe_mul_small(high, high, 19);
        fe_add      (low , low, high);
        fe_tobytes(r, low);
        // maybe wipe low and high
    }

There's a little problem though: this reduction ignores *two* bits from
the input, one of which is right in the middle. As much as I love how
it makes the implementation easy, you might hate how it makes the
specification weird.

Or I could use the same trick to read the two missing bits, and have a
full, clean, wide reduction. Let's see (untested code):

    void reduce(uint8_t r[32], uint8_t w[64])
    {
        fe low, high;
        fe_frombytes(low , w);
        fe_frombytes(high, w + 32);
        fe_mul_small(high, high, 38);
        fe_add      (low , low, high);
        low[0] += (w[31] >> 7) * 19;
        low[0] += (w[63] >> 7) * 19
* 38; // may be omited
        fe_tobytes(r, low);
        // maybe wipe low and high
    }

What do you know, that's much less ugly than I anticipated. Nice.
Slower than a dedicated reduction, but if I'm being honest the
performance difference probably never matters.

Thanks for the tip,
Loup.



On Wed, 2021-07-21 at 18:22 -0400, Rene Struik wrote:
> Hi Loup, Filippo:
> 
> I do not see the modular reduction implementation issue:
> 
> if x=x0+x1*2^m and N=2^m-alpha, x (mod N) = (x0 +alpha*x1) (mod N) = 
> ZNadd(x0,ZNmul(alpha,x1)), where ZNadd and ZNmul are operators on
> the 
> set ZN=[0..N-1). This may not be the most efficient, but using two 
> invocations of a binary operator to implement a single modular
> reduction 
> should work for "Arthur Random Programmer" (whoever that new [to me] 
> creature is). For p=2^255-19, one could take m=255 and and alpha=19 
> (i.e., N=p) or m=256 and alpha=38 (i.e., N=2*p), depending on libary 
> details. The above also works for NIST prime curves, Brainpool
> curves, 
> Gost curves, secp256k1 (all with negligible relative incremental
> cost).
> 
> I do not think a specification needs to spell out all implementation 
> details, as long as it is plausible that effective implementations
> are 
> possible.
> 
> This all being said, the suitability of curve maps (including
> detailed 
> steps) depends on the application(s) one wishes to cater to. {I do
> not 
> recall having seen such discussion, e.g., related to password
> schemes 
> and pairing-based applications.}
> 
> Rene
> 
> On 2021-07-21 5:42 p.m., Loup Vaillant-David wrote:
> > Hi,
> > 
> > 
> > > But (as you know) in general it is not safe to reduce
> > > ceil(log2(p))
> > > bits mod p, since this can give significant bias for general p.
> > > The
> > > goal in the draft is to specify a hash-to-field method that is
> > > safe
> > > for any p without requiring any parameter except the security
> > > level
> > > in bits, k, and which gives bias roughly 2^-k. Adding parameters
> > > is
> > > dangerous in the sense that incorrectly choosing parameters gives
> > > a
> > > hash-to-field function that looks OK but is subtly broken.
> > 
> > The three primes I know of are pretty clear cut. Two are clearly
> > safe,
> > the other clearly is not.
> > 
> > 1. 2^255 - 2^4 - 2^1 - 1               (Curve25519)
> > 2. 2^448 - 2^224 - 1                   (Curve448)
> > 3. 2^256 - 2^224 + 2^96 + 2^192 - 1    (P-256)
> > 
> > The first  has a bias of about 2^-(255-4)   == 2^-252. => Safe
> > The second has a bias of about 2^-(448-224) == 2^-224. => Safe
> > The third  has a bias of about 2^-(256-224) == 2^-32.  => Unsafe
> > 
> > So, Curve25519 and Curve448 have biases well below 2^-128, and as
> > such
> > are virtually undetectable (you need about 2^128 samples, which you
> > are
> > not going to get). P-256's bias however can be detected with as
> > little
> > as a few billion samples, and as such is easily detected in
> > realistic
> > settings.
> > 
> > Yes, having yet another parameter is a bummer, but as long as we
> > don't
> > use primes with a bias close to 2^-128, choosing an acceptable
> > threshold (like 2^-128) should not be too much of a problem.
> > 
> > 
> > > In my mind the second is the much more convincing argument,
> > > because
> > > every EC-relevant field library that I'm aware of implements a
> > > wide
> > > reduction already, inside the multiplication routine (which gives
> > > a
> > > p^2-size intermediate result).
> > > 
> > > So this
> > > 
> > > > Curve25519 implementations will all have to implement and
> > > > expose
> > > > a new field operation (like https://git.io/JlU6L) which will be
> > > > only
> > > > used for h2c, meaning it will not be well tested or maintained,
> > > > or h2c implementations will have to roll their own, probably
> > > > in variable time.
> > > 
> > > isn't really true. If one exposes the ceil(log2(p^2)) bit
> > > reduction
> > > that's already implemented for multiplication, then there's no
> > > need
> > > for extra code, and maybe even no need for extra tests.
> > 
> > Wait a minute, that reduction is not *nearly* as large as you make
> > it
> > out to be.  Let's take a look at Curve25519's ref10 implementation
> > on
> > SUPERCOP. You would think that because the result of the
> > multiplication
> > (before carry propagation) is stored in 10 64-bit signed integers,
> > the
> > number they represent easily exceeds 2^512, or even approaches
> > 2^640.
> > 
> > Nope.
> > 
> > In fact, that number those wide limbs represent does not even
> > exceed
> > 2^293. That's because the limbs overlap: even though they're 63-bit
> > wide, they're only 25 or 26 bits apart.
> > 
> > How then can we compute the multiplication *at all*, would you say?
> > I've explained it on 
> > https://loup-vaillant.fr/tutorials/poly1305-design
> > Search for "Cheating at modular arithmetic".
> > 
> > The trick is to perform part of the modular reduction by taking
> > advantage of the fact that you're dealing with a pseudo Mersene
> > prime.
> > See, when you work modulo 2^255-19, multiplying 2^255 is the same
> > as
> > multiplying by 19. So you can just shift numbers to the right 255
> > bits,
> > and multiply by 19 to compensate. That's why field multiplication
> > has
> > those *19 all over the place: instead of representing the number
> > over
> > twenty limbs, we just shift directly to one of the ten least
> > significant limbs, and multiply by 19 to compensate.
> > 
> > Long story short, your trick won't work. We will need to implement
> > a
> > separate reduction routine.
> > 
> > ---
> > 
> > And before someone is tempted to point out that EdDSA involves the
> > reduction of a 512-bit modulo the prime order of the curve that we
> > could reuse, hold your horses: while TweetNaCl and Monocypher uses
> > a
> > fairly generic algorithm that can easily be adapted by changing a
> > single constant, I've seen ref10 use a highly optimised reduction
> > where
> > the value of the order of the curve is deeply embedded in the
> > algorithm. I personally could not even make sense of it, so I don't
> > think it can be adapted.
> > 
> > Moreover, keep in mind that 2^255-19 is a pseudo Mersene prime, and
> > as
> > such can use a much faster reduction algorithm: just shift the most
> > significant bits to the right, multiply by 19, and add it to the
> > least
> > significant bits. You'll get a full reduction in a couple
> > iterations
> > (less if you're clever). Still, implementing this is not for Arthur
> > Random Programmer. I personally would test it quite extensively,
> > and
> > write a proof of correctness on top.
> > 
> > 
> > > Would it partially allay your concerns if the hash-to-field
> > > section
> > > explicitly pointed out that it's possible to use the
> > > multiplication
> > > routine (or, well, half of it) for the wide modular reduction?
> > 
> > Since to my knowledge it is *not* possible, it would actually make
> > me
> > very worried: either I missed something big (and I want to know),
> > or
> > the RFC writers did (and it needs to be pointed out).
> > 
> > Loup.
> > 
> > 
> > _______________________________________________
> > CFRG mailing list
> > CFRG@irtf.org
> > https://www.irtf.org/mailman/listinfo/cfrg
> 
>