[CFRG] Comment on draft-irtf-cfrg-hash-to-curve-10

Daira Hopwood <daira@jacaranda.org> Mon, 04 January 2021 13:40 UTC

Return-Path: <daira@jacaranda.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 98E373A0D28 for <cfrg@ietfa.amsl.com>; Mon, 4 Jan 2021 05:40:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.199
X-Spam-Status: No, score=-0.199 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=jacaranda.org
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id dpNr4GF3texf for <cfrg@ietfa.amsl.com>; Mon, 4 Jan 2021 05:40:49 -0800 (PST)
Received: from krystal1.wisercloud.co.uk (krystal1.wisercloud.co.uk []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D4EDF3A0D27 for <cfrg@ietf.org>; Mon, 4 Jan 2021 05:40:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=jacaranda.org; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Date:Message-ID:Subject:Cc:From:To:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=ayCzSiEOjM81+2Qf2Une5w4Ejoz525jS6QDa+9psahg=; b=XxL+wyLWibK1ePA0Pc4V5xJmAk 6rwPvKc/ECCERh1MTSSvzT3EMPCGm/M66J648nEIVUvD/pe9ZL+l1Pvs8nZ7oSj24VZhe9JZAh0oG YY/p9oRt4speVHSmZJKKBR/vlx9zNJBB65rp8yCf43levdjOYMrLX4CJo1okFrpecvYHteHn0y0kG jU/8WDbZBiQZS2085D3BsClFfOjvFxK8wh+rJ+VQiBVcnEylcTieP5+YNkvcUy8IOgymyqev9a+2i 3qXfjjxo3WMkQJJavTE6xHEHAEon19R0gwzhDThyEIqW0SMmwVDc1dtNuCwgaFFMZwkKYaiN5k7Z/ ykC6aCiw==;
Received: from host86-188-90-147.range86-188.btcentralplus.com ([]:38426 helo=[]) by krystal1.wisercloud.co.uk with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <daira@jacaranda.org>) id 1kwQ6G-00FjDN-M0; Mon, 04 Jan 2021 13:40:46 +0000
To: cfrg@ietf.org
From: Daira Hopwood <daira@jacaranda.org>
Cc: Sean Bowe <sean@electriccoin.co>, str4d@electriccoin.co
Message-ID: <e270e62d-941d-0a87-7dc9-cf80f73b5aeb@jacaranda.org>
Date: Mon, 4 Jan 2021 13:40:44 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-OutGoing-Spam-Status: No, score=-0.5
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - krystal1.wisercloud.co.uk
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - jacaranda.org
X-Get-Message-Sender-Via: krystal1.wisercloud.co.uk: authenticated_id: daira@jacaranda.org
X-Authenticated-Sender: krystal1.wisercloud.co.uk: daira@jacaranda.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/ZYlKNd4JJiJxNTYSwrY-ZFb4LeI>
Subject: [CFRG] Comment on draft-irtf-cfrg-hash-to-curve-10
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jan 2021 13:40:52 -0000


This is a response to the RG Last Call for draft-irtf-cfrg-hash-to-curve-10.

The current draft is based in large part on 
However, it does not cover most of the optimizations described in that
paper. This is understandable given that the paper is focussed on
hashing to BLS12-381 G1 and G2, and does not explain in much detail how
to apply these optimizations to other curves.

In fact, *all* of the optimizations applied for BLS12-381 G1 in [WB2019]
can be adapted to general short Weierstrass curves over an arbitrary
prime field:

* merging an inversion and square root in map_to_curve;
* eliminating all remaining inversions by use of Jacobian coordinates

   * in map_to_curve;
   * in the addition, for the random oracle variant; and
   * in the isogeny map, for curves with AB = 0.

These are all compatible with a constant-time implementation; actually,
they result in an algorithm that is naturally constant time. The
unoptimized algorithm, on the other hand, entails a trade-off between
constant-timeness and speed. That could tempt implementors to rely on
arguments about inapplicability of timing attacks in their context,
that may be incorrect, or may fail when code is reused in a different

(Yes I have read section 10 of the ID. From experience, implementors
are very likely to ignore even strong recommendations if they have a
performance issue to solve and they believe, correctly or incorrectly,
that they can get away with it for their application.)

Probably all of the above is also true for curves over extension fields,
but I haven't verified that.

Recent developments in proof systems have made it more important to be
able to efficiently hash to curves over F_p where p = 1 + m * 2^n for
reasonably large n, say n ~= 32 (informally, "highly 2-adic" fields).
This is effectively a requirement for fast polynomial arithmetic using
FFTs. Some proof systems, such as
[PLONK](https://eprint.iacr.org/2019/953) and its variants, depend
fundamentally on the resulting multiplicative subgroup structure.

My main concern is a lack of advice in the draft on how to apply the
[WB2019] optimizations to such fields, which is not at all obvious.
The paper only discusses the cases needed for BLS12-381 G1 (where
p = 3 (mod 4)), and G2 (a quadratic extension field where
p^2 = 9 (mod 16)).
The [reference 
only has some of the optimizations for simplified SWU on fields where
p = 3 (mod 4), p = 5 (mod 8), and p^2 = 9 (mod 16).

It took me a couple of days to work out the detail of other cases.
Therefore I think it is likely that the draft as it stands would lead
to many sub-optimal implementations. Also, in my opinion optimization
attempts will carry a great risk of introducing errors or
incompatibilities in corner cases (including some that are
exceptionally unlikely to be picked up in tests not focussed
specifically on those cases), with the specification as it stands.

The description in the current draft also obscures the extent of the
performance advantage for the simplified SWU mapping, potentially
leading practitioners to make sub-optimal choices in protocol design.
By my estimates, an optimized constant-time simplified SWU
implementation can be at least 3 times faster than a constant-time
SW implementation. Some of the same optimization ideas could in
principle be applied to SW, but some cannot.


All of the following assumes that F is a prime field. Again, it is
likely to be easily adaptable to extension fields but I haven't
checked it for that case.

In order to obtain the speed-ups discussed above for arbitrary prime
fields, it's first necessary to change the interface to the square
root algorithm. We need an algorithm, that I'll call divsqrt, with
the following specification.

Let h be a fixed nonsquare in F.

   divsqrt(N: F, D: F \ {0}) -> (F, Bool):
       if N/D is square in F, return (sqrt(N/D), true)
       else return (sqrt(h * N/D), false).

where sqrt returns an arbitrary choice of square root.

It's possible to implement this specification in constant time
without computing two square roots or any explicit inversion. For
example this can be done using an adaptation of the algorithm in
[Sarkar2020](https://eprint.iacr.org/2020/1407), combined with
[BDL+12](https://cr.yp.to/papers.html#ed25519) section 5, and some
hints in [WB2019] section 4.2.

A prototype specialized for n = 32 and the
curves is at https://github.com/zcash/pasta/squareroottab.sage .
I can provide an implementation that is not dependent on n if needed.
Note that this is also more efficient than the commonly used
Tonelli–Shanks, especially its constant-time variant.

(That code does data-dependent look-ups, but could be modified to also
be resistant to cache-based side channel attacks by using smaller
tables and accessing all elements. [Sarkar2020] also has a variant
that doesn't use tables, and is still significantly faster than the
constant-time Tonelli–Shanks in Appendix I of the ID.)

The overhead of handling the division in the square root, relative
to a plain sqrt using a similar algorithm, is just over n squarings
and log2(n) multiplications for F_p where p = 1 + m * 2^n. This is a
substantial saving over a constant-time inversion, which would take
around |p| squarings.

Now given divsqrt, we can apply another optimization in section 4.2
of [WB2019] that avoids needing to test whether gx1 (using the notation
of the Internet Draft) is square. In brief: we first compute the
numerators and denominators for x1 = N/D and gx1 = U/V as in the
paper. If gx1 is not square, then we have sqrt(h * gx1); using a
precomputed square root of Z/h, we can calculate the numerators and
denominators of x2 and y2, without ever needing to calculate gx2.
The detail is explained in comments at
https://github.com/zcash/pasta/hashtocurve.sage .

Note that the same procedure works for *all* short Weierstrass curves
to which simplified SWU is applicable. Therefore IMHO (and this is
the main point of this email), it may as well be included in the
description of the basic algorithm rather than presented as an
optimization — especially given that it is naturally constant time,
and the currently presented algorithm isn't!

As far as I can see, there are only three potential arguments for not
making this change:

* That divsqrt is more complicated than sqrt. This doesn't really
   hold up, because it is just as easy to implement divsqrt naively
   as it would be to implement the current map_to_curve naively.

* That it's less obvious how this is derived from the original
   algorithm of [BCI+10](https://eprint.iacr.org/2009/340). But the
   current algorithm already diverges from that paper a little. What
   matters IMHO is that the algorithm is correct and its derivation
   is explained so that anyone with a mathematical background can
   check it.

* That it would make the map_to_curve dependent on Jacobian coordinates,
   which might not be what an implementor wants to use in the rest of
   their code. But it is straightforward to also say how to obtain the
   output in homogeneous or affine coordinates.

A naive constant-time implementation of a full hash_to_curve using the
simplified SWU algorithm from the draft takes 1001S + 168M + 4I for
the Pallas curve (a 255-bit non-pairing friendly curve with A = 0 and
a degree-3 isogeny from the curve on which hashing is performed).
With optimizations, it takes 577S + 150M and no inversions, if output
is in Jacobian coordinates.

Aside: the current draft prohibits Z = -1. It is unclear why, since
that is exactly what [WB2019] uses for BLS12-381 G1.


Another possible change I would suggest is in the method of choosing
the sign of the y-coordinate. (Unlike the optimizations discussed
above, this would actually change the map_to_curve function, and so
would probably need to be optional.) The current method involves
ensuring that the low bit of u (as named in the ID) is consistent
with the low bit of y.

This is fine if we're on a conventional processor where taking the
low bit of a field element is cheap. However, it is unnecessarily
inefficient in the setting of an arithmetic circuit. To take the low
bit of a field element in that setting requires a full decomposition
of the value, with cost O(|p|). In an R1CS-based proof system that
requires one constraint per bit of p. In PLONK it requires one range
check (e.g. implemented using
[plookup](https://eprint.iacr.org/2020/315)) for each chunk of k bits
where k is limited by the circuit size.

Since it is very cheap to check a square root in an arithmetic circuit
using nondeterminism, these bit decompositions become the major cost
bottleneck of hash-to-curve in a circuit.

The following modification halves this cost for for arithmetic circuits
while not imposing significant overhead in other settings: simply
require that sgn0(u - y) = 0. This is not the same as requiring that
u and y have the same low bit, but it has a similar effect on the
distribution of y.

Either for this modification or for the existing algorithm, it would
be desirable to have a proof that it suffices to use the low bit of u
(rather than using an extra random bit independent of u), since this
is not entirely obvious.


I work for Electric Coin Company, but I'm not speaking for them here.

None of the optimizations or changes discussed here are to my knowledge
affected by, or likely to be affected by, patents or other IP issues.

Daira Hopwood