Re: [CFRG] thoughts on clearing the cofactor in hash to curve

Loup Vaillant-David <loup@loup-vaillant.fr> Wed, 14 April 2021 14:10 UTC

Return-Path: <loup@loup-vaillant.fr>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0138F3A0C9E for <cfrg@ietfa.amsl.com>; Wed, 14 Apr 2021 07:10:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.917
X-Spam-Level:
X-Spam-Status: No, score=-1.917 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Tp7OsYwxX053 for <cfrg@ietfa.amsl.com>; Wed, 14 Apr 2021 07:10:33 -0700 (PDT)
Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB1C73A0FF2 for <cfrg@irtf.org>; Wed, 14 Apr 2021 07:10:12 -0700 (PDT)
X-Originating-IP: 78.198.246.40
Received: from grey-fade (unknown [78.198.246.40]) (Authenticated sender: loup@loup-vaillant.fr) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 03A191BF212; Wed, 14 Apr 2021 14:10:05 +0000 (UTC)
Message-ID: <731729dc5317238feb3cf5e7a66ebea4e5148c4d.camel@loup-vaillant.fr>
From: Loup Vaillant-David <loup@loup-vaillant.fr>
To: Armando Faz <armfazh=40cloudflare.com@dmarc.ietf.org>, "Hao, Feng" <Feng.Hao@warwick.ac.uk>
Cc: IRTF CFRG <cfrg@irtf.org>
Date: Wed, 14 Apr 2021 16:10:05 +0200
In-Reply-To: <CABZxKYnTxM_es9tkDHd+cN4X0dT3WeOuaR2zki7LqFWzp17dgA@mail.gmail.com>
References: <CABZxKYnTxM_es9tkDHd+cN4X0dT3WeOuaR2zki7LqFWzp17dgA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/papKcpUVL9PGfFMs38tc5_vTJvY>
Subject: Re: [CFRG] thoughts on clearing the cofactor in hash to curve
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Apr 2021 14:10:38 -0000

On Mon, 2021-04-12 at 15:50 -0700, Armando Faz wrote:
> > "Hao, Feng" <Feng.Hao@warwick.ac.uk>
> > I asked for clarification on whether the small subgroup points can
> > be removed from map-to-curve by design. The replies from the hash-
> > to-curve authors indicated no: because 1) too much hassle; 2) not
> > worth it for the negligible probability. I think the rationale is
> > clear.
> 
> There were several comments about why clear-cofactor helps to map
> low-order points to the identity, and why this might or might not be
> an issue in higher protocols.
> However, few comments addressed the problem of devising a map-to-
> curve algorithm that always returns points of a given order. I
> consider this is still an open problem in general, (which is another
> reason why the draft takes a simpler  approach, namely clear-
> cofactor).
> Also note that a similar function will be useful for CSIDH, which
> needs to sample points in a desired torsion group. If I am not wrong,
> CSIDH also uses the clear-cofactor technique to achieve this task.
> Happy to hear more comments about this specific problem, (which is a
> different  discussion from the implications of not having such a
> map).

Hi,

If we separate the map/hash to curve step from the higher-level
protocol that uses it, it is generally simpler to clear the cofactor
before the high-level protocol sees anything. (This ease of use is why
I wouldn't dare object to clearing the cofactor in the context of the
hash-to-curve RFC.)

On the other hand, there might be a small performance benefits in *not*
clearing the cofactor. Curve25519's cofactor for instance is 8 (2^3),
and requires 3 point doublings to clear. Depending on the context, this
could be considered a significant overhead, or a negligible one:

- Elligator2 based Map to Curve requires a single exponentiation.
  Adding 3 point doubling on top of that will likely add about 10%
  of overhead.
- If we're performing a scalar multiplication after clearing the
  cofactor (OPRF does), then the overhead drops down to 1% or so.

(The overhead for Curve448 are even lower: the curve is bigger, and
 requires only 2 points doubling to clear the cofactor.)

Negligible or no, the overhead is measurable, so there's the temptation
to skip it. And in many cases, we can: it turns out that regular X25519
(and X448) scalar multiplications ignore the cofactor to begin with.
They do this by clearing the 3 (or 2) least significant bit of the
scalar, thus making sure that the scalar is a multiple of the cofactor.

To sum it up, if cofactor clearing happens as a natural side effect of
the higher level protocol (such as scalar multiplication, where the
scalar is tweaked to clear the cofactor), then we can save a few cycles
and omit it. On the other hand, you really have to know what you're
doing.

---

Another reason not to clear the cofactor is when you also want to
provide the reverse map. Reverse map facilitate steganography and other
forms of meta-data hiding: if you map a random point on the curve, the
representative is indistinguishable from random. So when you send it
over the network, it looks random, without the statistical biases
associated with raw public keys.

The problem with the reverse mapping is that we need to pick a random
point over the *whole* curve, not just the prime order group (let's
ignore the fact that half of the points do not map at all). If we
restricted ourselves to the prime order group, the attacker could just
map it back and notice that by some amazing "coincidence", they always
get a point on the prime order group.

So the representative we send over the network for attackers to
eavesdrop must map to a point whose cofactor is *not* cleared. To do
that, it makes sense for a low-level cryptographic library to expose an
interface that does not clear the cofactors. One can then implement
Hash-To-Curve (which does clear the cofactor) on top of it.

---

Speaking of which, I found rather unfortunate incompatibilities between
the RFC's map-to-point step, and existing Elligator2 implementations
over Curve25519 I was referring to when I added Elligator2 to
Monocypher last year. I believe Mike Hamburg asked about them then.

As one can guess, I'm rather found of the reverse map, and what it
enables. In the specific case of Curve25519, mapping a point to a
representative will yield a 255-bit number. Note however that
representatives and their opposite (modulo 2^255-19) map to the same
curve point, so I always chose the positive representative. (The one
between 0 and p-2. I could instead discriminate with parity instead
like Ed25519 does, but the Elligator2 paper suggested otherwise).

The result is a 254-bit number. I can then just pad the two other bits
with random noise, and the decoding step will have to ignore those
bits. If I recall correctly, the Hash-To-Curve RFC however does not
ignore those two bit:

- Bit 256 is used to flip the sign of the mapped curve point, instead
  of just being ignored.
- The representative is encoded in 255 bits instead of 254 bits, which
  are enough for positive representatives.

Both of these incompatibilities are fairly easily worked around: we
don't care about the sign of the point to begin with if its purpose is
to use it in X-only ladders (and in many cases, it is), and
representatives are easily negated if need be. I also understand why
representatives use 255 bits: that's how existing decoding functions
work.

What I don't get is the rationale behind using bit 256 to flip the sign
of the curve point. It doesn't do anything that I can tell: if a point
can be mapped, then so can its opposite, so we won't even cover more
points that way. Unless perhaps as a way of highlighting implementation
errors?

---

Anyway, my thoughts.
Loup.