Re: [CFRG] Comment on draft-irtf-cfrg-hash-to-curve-10

Christopher Wood <> Thu, 18 March 2021 00:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9899E3A18D3 for <>; Wed, 17 Mar 2021 17:24:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.119
X-Spam-Status: No, score=-2.119 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key) header.b=IIcRi1ex; dkim=pass (2048-bit key) header.b=UwACY7Y+
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id TU7r4JqTFill for <>; Wed, 17 Mar 2021 17:23:57 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id E2D0B3A18F8 for <>; Wed, 17 Mar 2021 17:23:56 -0700 (PDT)
Received: from compute4.internal (compute4.nyi.internal []) by mailout.nyi.internal (Postfix) with ESMTP id DB0095C00EB for <>; Wed, 17 Mar 2021 20:23:55 -0400 (EDT)
Received: from imap4 ([]) by compute4.internal (MEProxy); Wed, 17 Mar 2021 20:23:55 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; h=mime-version:message-id:in-reply-to:references:date:from:to :subject:content-type:content-transfer-encoding; s=fm3; bh=1avPv AtvuUwVQpLwgppJMCVUENRhr9Bmz4G3gj2sqI4=; b=IIcRi1exxOX+HZcA9ggeo ZQU6HNNI40nJ7MHopgE36LfgSGBxiaruVQf0LWPgIGb406eHyveSCe+ipQBeYVEs mX8XjYoOWaFIdmlFDPPmG5PcBvlOf6UeosL6UyzH1teuwduxVaA5WafLtExm2dD4 s9r7I6uza3vIpmmBkleMSId2RbiHeIdP9Sow4XZAaounIWZKRDgq3cEVerFfqdtx RAeASxc+QHdczVxgUADcQuUrlCGSI59GCjroj+lNXWc8t/0GMe4wXtZQVCZ+aw7n qtRGzkeTe4svIBmQoBeWqpoCrbc9XTirHaqqtmsHCjAv6Kd9IfF586h7gL9Of6RK Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=1avPvAtvuUwVQpLwgppJMCVUENRhr9Bmz4G3gj2sq I4=; b=UwACY7Y+JDMd2+W4/odiMbBFfI1FKbIHPwW2MFrGhRvyD7X+/YbGwyChU k0mHxiBhZjB/SEMERmJknhZAmIwbWBL9XSxfDo7VzAmLMrcuhzxPfSoAa92dnPwz LfPxfrd+WDZbBh+EtC2JVtwd+NZ+DYaqtmfJEuQmunWavS5ot/ggYUkUy4vDhJue FiOLQQV/1PMEpqhv9pwUy3YWsuyZVQ94gaTcGLV95H84WWKh4ZEVA32xVT5pKcan V2qBs4RpyQ4UAW7Fdh/KXYuC6lZ51f80oXpt9ca2Ja/9RbI6ccE70rWeqDJLbnN6 rH0rGuONZWHqKmUKM2MJHPCqViPGg==
X-ME-Sender: <xms:m51SYPyp5DtlwZwrPpC9toZpjqgPMeqmPITyBM8rzz16YQNE8uPPPg> <xme:m51SYHTkWnW5uyucYEg9YMUxU9ma5g6ICHhAteLWOq4Ld6-gDa7IhoFhZRTOzto4m XizzGa8yqDx4ethZkc>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrudefhedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtgfesth hqredtreerjeenucfhrhhomhepfdevhhhrihhsthhophhhvghrucghohhougdfuceotggr fieshhgvrghpihhnghgsihhtshdrnhgvtheqnecuggftrfgrthhtvghrnhepleduteetve ffgeeuvedvlefhgfetkeeltdffueffheffleehueetiedtleevhfdvnecuffhomhgrihhn pehirggtrhdrohhrghdpghhithhhuhgsrdgtohhmpdihphdrthhopdgvlhgvtghtrhhitg gtohhinhdrtghopdhirhhtfhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgr rhgrmhepmhgrihhlfhhrohhmpegtrgifsehhvggrphhinhhgsghithhsrdhnvght
X-ME-Proxy: <xmx:m51SYJXVQ8szo42x1MKKjEDQYUKeDt1xEo-NPMUmDEJKUAdyO8AjRA> <xmx:m51SYJg6Bd5lbAnNguv3N75Kpr8BD8aHRWjGtNO09sjI5H_QTgl61A> <xmx:m51SYBBHQktjQo2tbcX5VSCUpQ9ZOC-qyKFegPLethiM96l7mIFZPw> <xmx:m51SYEPYyV8HMQR-bdCA6eousH8er-WTvqVZvb8hiLli5CmLiZIPhw>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 650A116006B; Wed, 17 Mar 2021 20:23:55 -0400 (EDT)
X-Mailer: Webmail Interface
User-Agent: Cyrus-JMAP/3.5.0-alpha0-206-g078a48fda5-fm-20210226.001-g078a48fd
Mime-Version: 1.0
Message-Id: <>
In-Reply-To: <>
References: <>
Date: Wed, 17 Mar 2021 17:23:35 -0700
From: "Christopher Wood" <>
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [CFRG] Comment on draft-irtf-cfrg-hash-to-curve-10
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 18 Mar 2021 00:24:02 -0000

Hi Daira,

Thanks for the feedback, and apologies for the delayed reply! Given the maturity of this document and its state in the RG, I've two very high-level responses:

1. Changing the sign bit breaks backwards compatibility. Absent more support for this change, especially from those familiar with the challenges in the current design, perhaps now is not the time. I note that future ciphersuites could change the sign computation, should that be desired for certain applications.
2. I think the optimized, curve-specific algorithms in the appendix are constant time as described, so it doesn't *seem* necessary to make any substantial editorial changes to lift those to the main text. The mappings in Section 6 are not intended to be translated directly into code. Instead, each mapping points to a corresponding straight-line implementation in Appendix F. 

I would love to hear from others who may have opinions on either of these points, or on the particular suggestions below.


On Mon, Jan 4, 2021, at 5:40 AM, Daira Hopwood wrote:
> Hello,
> This is a response to the RG Last Call for draft-irtf-cfrg-hash-to-curve-10.
> The current draft is based in large part on 
> [WB2019](
> However, it does not cover most of the optimizations described in that
> paper. This is understandable given that the paper is focussed on
> hashing to BLS12-381 G1 and G2, and does not explain in much detail how
> to apply these optimizations to other curves.
> In fact, *all* of the optimizations applied for BLS12-381 G1 in [WB2019]
> can be adapted to general short Weierstrass curves over an arbitrary
> prime field:
> * merging an inversion and square root in map_to_curve;
> * eliminating all remaining inversions by use of Jacobian coordinates
>    * in map_to_curve;
>    * in the addition, for the random oracle variant; and
>    * in the isogeny map, for curves with AB = 0.
> These are all compatible with a constant-time implementation; actually,
> they result in an algorithm that is naturally constant time. The
> unoptimized algorithm, on the other hand, entails a trade-off between
> constant-timeness and speed. That could tempt implementors to rely on
> arguments about inapplicability of timing attacks in their context,
> that may be incorrect, or may fail when code is reused in a different
> context.
> (Yes I have read section 10 of the ID. From experience, implementors
> are very likely to ignore even strong recommendations if they have a
> performance issue to solve and they believe, correctly or incorrectly,
> that they can get away with it for their application.)
> Probably all of the above is also true for curves over extension fields,
> but I haven't verified that.
> Recent developments in proof systems have made it more important to be
> able to efficiently hash to curves over F_p where p = 1 + m * 2^n for
> reasonably large n, say n ~= 32 (informally, "highly 2-adic" fields).
> This is effectively a requirement for fast polynomial arithmetic using
> FFTs. Some proof systems, such as
> [PLONK]( and its variants, depend
> fundamentally on the resulting multiplicative subgroup structure.
> My main concern is a lack of advice in the draft on how to apply the
> [WB2019] optimizations to such fields, which is not at all obvious.
> The paper only discusses the cases needed for BLS12-381 G1 (where
> p = 3 (mod 4)), and G2 (a quadratic extension field where
> p^2 = 9 (mod 16)).
> The [reference 
> code](
> only has some of the optimizations for simplified SWU on fields where
> p = 3 (mod 4), p = 5 (mod 8), and p^2 = 9 (mod 16).
> It took me a couple of days to work out the detail of other cases.
> Therefore I think it is likely that the draft as it stands would lead
> to many sub-optimal implementations. Also, in my opinion optimization
> attempts will carry a great risk of introducing errors or
> incompatibilities in corner cases (including some that are
> exceptionally unlikely to be picked up in tests not focussed
> specifically on those cases), with the specification as it stands.
> The description in the current draft also obscures the extent of the
> performance advantage for the simplified SWU mapping, potentially
> leading practitioners to make sub-optimal choices in protocol design.
> By my estimates, an optimized constant-time simplified SWU
> implementation can be at least 3 times faster than a constant-time
> SW implementation. Some of the same optimization ideas could in
> principle be applied to SW, but some cannot.
> ----
> All of the following assumes that F is a prime field. Again, it is
> likely to be easily adaptable to extension fields but I haven't
> checked it for that case.
> In order to obtain the speed-ups discussed above for arbitrary prime
> fields, it's first necessary to change the interface to the square
> root algorithm. We need an algorithm, that I'll call divsqrt, with
> the following specification.
> Let h be a fixed nonsquare in F.
>    divsqrt(N: F, D: F \ {0}) -> (F, Bool):
>        if N/D is square in F, return (sqrt(N/D), true)
>        else return (sqrt(h * N/D), false).
> where sqrt returns an arbitrary choice of square root.
> It's possible to implement this specification in constant time
> without computing two square roots or any explicit inversion. For
> example this can be done using an adaptation of the algorithm in
> [Sarkar2020](, combined with
> [BDL+12]( section 5, and some
> hints in [WB2019] section 4.2.
> A prototype specialized for n = 32 and the
> [Pallas/Vesta](
> curves is at .
> I can provide an implementation that is not dependent on n if needed.
> Note that this is also more efficient than the commonly used
> Tonelli–Shanks, especially its constant-time variant.
> (That code does data-dependent look-ups, but could be modified to also
> be resistant to cache-based side channel attacks by using smaller
> tables and accessing all elements. [Sarkar2020] also has a variant
> that doesn't use tables, and is still significantly faster than the
> constant-time Tonelli–Shanks in Appendix I of the ID.)
> The overhead of handling the division in the square root, relative
> to a plain sqrt using a similar algorithm, is just over n squarings
> and log2(n) multiplications for F_p where p = 1 + m * 2^n. This is a
> substantial saving over a constant-time inversion, which would take
> around |p| squarings.
> Now given divsqrt, we can apply another optimization in section 4.2
> of [WB2019] that avoids needing to test whether gx1 (using the notation
> of the Internet Draft) is square. In brief: we first compute the
> numerators and denominators for x1 = N/D and gx1 = U/V as in the
> paper. If gx1 is not square, then we have sqrt(h * gx1); using a
> precomputed square root of Z/h, we can calculate the numerators and
> denominators of x2 and y2, without ever needing to calculate gx2.
> The detail is explained in comments at
> .
> Note that the same procedure works for *all* short Weierstrass curves
> to which simplified SWU is applicable. Therefore IMHO (and this is
> the main point of this email), it may as well be included in the
> description of the basic algorithm rather than presented as an
> optimization — especially given that it is naturally constant time,
> and the currently presented algorithm isn't!
> As far as I can see, there are only three potential arguments for not
> making this change:
> * That divsqrt is more complicated than sqrt. This doesn't really
>    hold up, because it is just as easy to implement divsqrt naively
>    as it would be to implement the current map_to_curve naively.
> * That it's less obvious how this is derived from the original
>    algorithm of [BCI+10]( But the
>    current algorithm already diverges from that paper a little. What
>    matters IMHO is that the algorithm is correct and its derivation
>    is explained so that anyone with a mathematical background can
>    check it.
> * That it would make the map_to_curve dependent on Jacobian coordinates,
>    which might not be what an implementor wants to use in the rest of
>    their code. But it is straightforward to also say how to obtain the
>    output in homogeneous or affine coordinates.
> A naive constant-time implementation of a full hash_to_curve using the
> simplified SWU algorithm from the draft takes 1001S + 168M + 4I for
> the Pallas curve (a 255-bit non-pairing friendly curve with A = 0 and
> a degree-3 isogeny from the curve on which hashing is performed).
> With optimizations, it takes 577S + 150M and no inversions, if output
> is in Jacobian coordinates.
> Aside: the current draft prohibits Z = -1. It is unclear why, since
> that is exactly what [WB2019] uses for BLS12-381 G1.
> ----
> Another possible change I would suggest is in the method of choosing
> the sign of the y-coordinate. (Unlike the optimizations discussed
> above, this would actually change the map_to_curve function, and so
> would probably need to be optional.) The current method involves
> ensuring that the low bit of u (as named in the ID) is consistent
> with the low bit of y.
> This is fine if we're on a conventional processor where taking the
> low bit of a field element is cheap. However, it is unnecessarily
> inefficient in the setting of an arithmetic circuit. To take the low
> bit of a field element in that setting requires a full decomposition
> of the value, with cost O(|p|). In an R1CS-based proof system that
> requires one constraint per bit of p. In PLONK it requires one range
> check (e.g. implemented using
> [plookup]( for each chunk of k bits
> where k is limited by the circuit size.
> Since it is very cheap to check a square root in an arithmetic circuit
> using nondeterminism, these bit decompositions become the major cost
> bottleneck of hash-to-curve in a circuit.
> The following modification halves this cost for for arithmetic circuits
> while not imposing significant overhead in other settings: simply
> require that sgn0(u - y) = 0. This is not the same as requiring that
> u and y have the same low bit, but it has a similar effect on the
> distribution of y.
> Either for this modification or for the existing algorithm, it would
> be desirable to have a proof that it suffices to use the low bit of u
> (rather than using an extra random bit independent of u), since this
> is not entirely obvious.
> ----
> I work for Electric Coin Company, but I'm not speaking for them here.
> None of the optimizations or changes discussed here are to my knowledge
> affected by, or likely to be affected by, patents or other IP issues.
> -- 
> Daira Hopwood
> _______________________________________________
> CFRG mailing list