### [CFRG] Comment on draft-irtf-cfrg-hash-to-curve-10

Daira Hopwood <daira@jacaranda.org> Mon, 04 January 2021 13:40 UTC

Return-Path: <daira@jacaranda.org>

X-Original-To: cfrg@ietfa.amsl.com

Delivered-To: cfrg@ietfa.amsl.com

Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 98E373A0D28
for <cfrg@ietfa.amsl.com>; Mon, 4 Jan 2021 05:40:51 -0800 (PST)

X-Virus-Scanned: amavisd-new at amsl.com

X-Spam-Flag: NO

X-Spam-Score: -0.199

X-Spam-Level:

X-Spam-Status: No, score=-0.199 tagged_above=-999 required=5
tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001,
URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no

Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
header.d=jacaranda.org

Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id dpNr4GF3texf for <cfrg@ietfa.amsl.com>;
Mon, 4 Jan 2021 05:40:49 -0800 (PST)

Received: from krystal1.wisercloud.co.uk (krystal1.wisercloud.co.uk
[185.53.58.188])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id D4EDF3A0D27
for <cfrg@ietf.org>; Mon, 4 Jan 2021 05:40:48 -0800 (PST)

DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=jacaranda.org; s=default; h=Content-Transfer-Encoding:Content-Type:
MIME-Version:Date:Message-ID:Subject:Cc:From:To:Sender:Reply-To:Content-ID:
Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
:Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
List-Subscribe:List-Post:List-Owner:List-Archive;
bh=ayCzSiEOjM81+2Qf2Une5w4Ejoz525jS6QDa+9psahg=; b=XxL+wyLWibK1ePA0Pc4V5xJmAk
6rwPvKc/ECCERh1MTSSvzT3EMPCGm/M66J648nEIVUvD/pe9ZL+l1Pvs8nZ7oSj24VZhe9JZAh0oG
YY/p9oRt4speVHSmZJKKBR/vlx9zNJBB65rp8yCf43levdjOYMrLX4CJo1okFrpecvYHteHn0y0kG
jU/8WDbZBiQZS2085D3BsClFfOjvFxK8wh+rJ+VQiBVcnEylcTieP5+YNkvcUy8IOgymyqev9a+2i
3qXfjjxo3WMkQJJavTE6xHEHAEon19R0gwzhDThyEIqW0SMmwVDc1dtNuCwgaFFMZwkKYaiN5k7Z/
ykC6aCiw==;

Received: from host86-188-90-147.range86-188.btcentralplus.com
([86.188.90.147]:38426 helo=[192.168.1.85])
by krystal1.wisercloud.co.uk with esmtpsa (TLS1.2) tls
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93)
(envelope-from <daira@jacaranda.org>)
id 1kwQ6G-00FjDN-M0; Mon, 04 Jan 2021 13:40:46 +0000

To: cfrg@ietf.org

From: Daira Hopwood <daira@jacaranda.org>

Cc: Sean Bowe <sean@electriccoin.co>, str4d@electriccoin.co

Message-ID: <e270e62d-941d-0a87-7dc9-cf80f73b5aeb@jacaranda.org>

Date: Mon, 4 Jan 2021 13:40:44 +0000

User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0

MIME-Version: 1.0

Content-Type: text/plain; charset=utf-8; format=flowed

Content-Language: en-US

Content-Transfer-Encoding: 8bit

X-OutGoing-Spam-Status: No, score=-0.5

X-AntiAbuse: This header was added to track abuse,
please include it with any abuse report

X-AntiAbuse: Primary Hostname - krystal1.wisercloud.co.uk

X-AntiAbuse: Original Domain - ietf.org

X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]

X-AntiAbuse: Sender Address Domain - jacaranda.org

X-Get-Message-Sender-Via: krystal1.wisercloud.co.uk: authenticated_id:
daira@jacaranda.org

X-Authenticated-Sender: krystal1.wisercloud.co.uk: daira@jacaranda.org

X-Source:

X-Source-Args:

X-Source-Dir:

Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/ZYlKNd4JJiJxNTYSwrY-ZFb4LeI>

Subject: [CFRG] Comment on draft-irtf-cfrg-hash-to-curve-10

X-BeenThere: cfrg@irtf.org

X-Mailman-Version: 2.1.29

Precedence: list

List-Id: Crypto Forum Research Group <cfrg.irtf.org>

List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>,
<mailto:cfrg-request@irtf.org?subject=unsubscribe>

List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>

List-Post: <mailto:cfrg@irtf.org>

List-Help: <mailto:cfrg-request@irtf.org?subject=help>

List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>,
<mailto:cfrg-request@irtf.org?subject=subscribe>

X-List-Received-Date: Mon, 04 Jan 2021 13:40:52 -0000

Hello, This is a response to the RG Last Call for draft-irtf-cfrg-hash-to-curve-10. The current draft is based in large part on [WB2019](https://eprint.iacr.org/2019/403). However, it does not cover most of the optimizations described in that paper. This is understandable given that the paper is focussed on hashing to BLS12-381 G1 and G2, and does not explain in much detail how to apply these optimizations to other curves. In fact, *all* of the optimizations applied for BLS12-381 G1 in [WB2019] can be adapted to general short Weierstrass curves over an arbitrary prime field: * merging an inversion and square root in map_to_curve; * eliminating all remaining inversions by use of Jacobian coordinates * in map_to_curve; * in the addition, for the random oracle variant; and * in the isogeny map, for curves with AB = 0. These are all compatible with a constant-time implementation; actually, they result in an algorithm that is naturally constant time. The unoptimized algorithm, on the other hand, entails a trade-off between constant-timeness and speed. That could tempt implementors to rely on arguments about inapplicability of timing attacks in their context, that may be incorrect, or may fail when code is reused in a different context. (Yes I have read section 10 of the ID. From experience, implementors are very likely to ignore even strong recommendations if they have a performance issue to solve and they believe, correctly or incorrectly, that they can get away with it for their application.) Probably all of the above is also true for curves over extension fields, but I haven't verified that. Recent developments in proof systems have made it more important to be able to efficiently hash to curves over F_p where p = 1 + m * 2^n for reasonably large n, say n ~= 32 (informally, "highly 2-adic" fields). This is effectively a requirement for fast polynomial arithmetic using FFTs. Some proof systems, such as [PLONK](https://eprint.iacr.org/2019/953) and its variants, depend fundamentally on the resulting multiplicative subgroup structure. My main concern is a lack of advice in the draft on how to apply the [WB2019] optimizations to such fields, which is not at all obvious. The paper only discusses the cases needed for BLS12-381 G1 (where p = 3 (mod 4)), and G2 (a quadratic extension field where p^2 = 9 (mod 16)). The [reference code](https://github.com/cfrg/draft-irtf-cfrg-hash-to-curve/tree/master/poc) only has some of the optimizations for simplified SWU on fields where p = 3 (mod 4), p = 5 (mod 8), and p^2 = 9 (mod 16). It took me a couple of days to work out the detail of other cases. Therefore I think it is likely that the draft as it stands would lead to many sub-optimal implementations. Also, in my opinion optimization attempts will carry a great risk of introducing errors or incompatibilities in corner cases (including some that are exceptionally unlikely to be picked up in tests not focussed specifically on those cases), with the specification as it stands. The description in the current draft also obscures the extent of the performance advantage for the simplified SWU mapping, potentially leading practitioners to make sub-optimal choices in protocol design. By my estimates, an optimized constant-time simplified SWU implementation can be at least 3 times faster than a constant-time SW implementation. Some of the same optimization ideas could in principle be applied to SW, but some cannot. ---- All of the following assumes that F is a prime field. Again, it is likely to be easily adaptable to extension fields but I haven't checked it for that case. In order to obtain the speed-ups discussed above for arbitrary prime fields, it's first necessary to change the interface to the square root algorithm. We need an algorithm, that I'll call divsqrt, with the following specification. Let h be a fixed nonsquare in F. divsqrt(N: F, D: F \ {0}) -> (F, Bool): if N/D is square in F, return (sqrt(N/D), true) else return (sqrt(h * N/D), false). where sqrt returns an arbitrary choice of square root. It's possible to implement this specification in constant time without computing two square roots or any explicit inversion. For example this can be done using an adaptation of the algorithm in [Sarkar2020](https://eprint.iacr.org/2020/1407), combined with [BDL+12](https://cr.yp.to/papers.html#ed25519) section 5, and some hints in [WB2019] section 4.2. A prototype specialized for n = 32 and the [Pallas/Vesta](https://electriccoin.co/blog/the-pasta-curves-for-halo-2-and-beyond/) curves is at https://github.com/zcash/pasta/squareroottab.sage . I can provide an implementation that is not dependent on n if needed. Note that this is also more efficient than the commonly used Tonelli–Shanks, especially its constant-time variant. (That code does data-dependent look-ups, but could be modified to also be resistant to cache-based side channel attacks by using smaller tables and accessing all elements. [Sarkar2020] also has a variant that doesn't use tables, and is still significantly faster than the constant-time Tonelli–Shanks in Appendix I of the ID.) The overhead of handling the division in the square root, relative to a plain sqrt using a similar algorithm, is just over n squarings and log2(n) multiplications for F_p where p = 1 + m * 2^n. This is a substantial saving over a constant-time inversion, which would take around |p| squarings. Now given divsqrt, we can apply another optimization in section 4.2 of [WB2019] that avoids needing to test whether gx1 (using the notation of the Internet Draft) is square. In brief: we first compute the numerators and denominators for x1 = N/D and gx1 = U/V as in the paper. If gx1 is not square, then we have sqrt(h * gx1); using a precomputed square root of Z/h, we can calculate the numerators and denominators of x2 and y2, without ever needing to calculate gx2. The detail is explained in comments at https://github.com/zcash/pasta/hashtocurve.sage . Note that the same procedure works for *all* short Weierstrass curves to which simplified SWU is applicable. Therefore IMHO (and this is the main point of this email), it may as well be included in the description of the basic algorithm rather than presented as an optimization — especially given that it is naturally constant time, and the currently presented algorithm isn't! As far as I can see, there are only three potential arguments for not making this change: * That divsqrt is more complicated than sqrt. This doesn't really hold up, because it is just as easy to implement divsqrt naively as it would be to implement the current map_to_curve naively. * That it's less obvious how this is derived from the original algorithm of [BCI+10](https://eprint.iacr.org/2009/340). But the current algorithm already diverges from that paper a little. What matters IMHO is that the algorithm is correct and its derivation is explained so that anyone with a mathematical background can check it. * That it would make the map_to_curve dependent on Jacobian coordinates, which might not be what an implementor wants to use in the rest of their code. But it is straightforward to also say how to obtain the output in homogeneous or affine coordinates. A naive constant-time implementation of a full hash_to_curve using the simplified SWU algorithm from the draft takes 1001S + 168M + 4I for the Pallas curve (a 255-bit non-pairing friendly curve with A = 0 and a degree-3 isogeny from the curve on which hashing is performed). With optimizations, it takes 577S + 150M and no inversions, if output is in Jacobian coordinates. Aside: the current draft prohibits Z = -1. It is unclear why, since that is exactly what [WB2019] uses for BLS12-381 G1. ---- Another possible change I would suggest is in the method of choosing the sign of the y-coordinate. (Unlike the optimizations discussed above, this would actually change the map_to_curve function, and so would probably need to be optional.) The current method involves ensuring that the low bit of u (as named in the ID) is consistent with the low bit of y. This is fine if we're on a conventional processor where taking the low bit of a field element is cheap. However, it is unnecessarily inefficient in the setting of an arithmetic circuit. To take the low bit of a field element in that setting requires a full decomposition of the value, with cost O(|p|). In an R1CS-based proof system that requires one constraint per bit of p. In PLONK it requires one range check (e.g. implemented using [plookup](https://eprint.iacr.org/2020/315)) for each chunk of k bits where k is limited by the circuit size. Since it is very cheap to check a square root in an arithmetic circuit using nondeterminism, these bit decompositions become the major cost bottleneck of hash-to-curve in a circuit. The following modification halves this cost for for arithmetic circuits while not imposing significant overhead in other settings: simply require that sgn0(u - y) = 0. This is not the same as requiring that u and y have the same low bit, but it has a similar effect on the distribution of y. Either for this modification or for the existing algorithm, it would be desirable to have a proof that it suffices to use the low bit of u (rather than using an extra random bit independent of u), since this is not entirely obvious. ---- I work for Electric Coin Company, but I'm not speaking for them here. None of the optimizations or changes discussed here are to my knowledge affected by, or likely to be affected by, patents or other IP issues. -- Daira Hopwood

- [CFRG] Comment on draft-irtf-cfrg-hash-to-curve-10 Daira Hopwood
- Re: [CFRG] Comment on draft-irtf-cfrg-hash-to-cur… Daira Hopwood