Re: [CFRG] How to construct a hybrid signature combiner?

"D. J. Bernstein" <djb@cr.yp.to> Mon, 25 March 2024 09:27 UTC

Return-Path: <djb-dsn2-1406711340.7506@cr.yp.to>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A049DC14F685 for <cfrg@ietfa.amsl.com>; Mon, 25 Mar 2024 02:27:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.905
X-Spam-Level:
X-Spam-Status: No, score=-0.905 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_AFFORDABLE=1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V4Vmg_shEY1j for <cfrg@ietfa.amsl.com>; Mon, 25 Mar 2024 02:27:25 -0700 (PDT)
Received: from salsa.cs.uic.edu (salsa.cs.uic.edu [131.193.32.108]) by ietfa.amsl.com (Postfix) with SMTP id 2A8E9C14F682 for <cfrg@irtf.org>; Mon, 25 Mar 2024 02:27:24 -0700 (PDT)
Received: (qmail 30890 invoked by uid 1010); 25 Mar 2024 09:27:23 -0000
Received: from unknown (unknown) by unknown with QMTP; 25 Mar 2024 09:27:23 -0000
Received: (qmail 964866 invoked by uid 1000); 25 Mar 2024 09:27:11 -0000
Date: Mon, 25 Mar 2024 09:27:11 -0000
Message-ID: <20240325092711.964864.qmail@cr.yp.to>
From: "D. J. Bernstein" <djb@cr.yp.to>
To: cfrg@irtf.org
Mail-Followup-To: cfrg@irtf.org
In-Reply-To: <87o7b6szhh.fsf@kaka.sjd.se>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/LdvasJBpseekZtQkQF1nuPPDH_s>
Subject: Re: [CFRG] How to construct a hybrid signature combiner?
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://mailman.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://mailman.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Mar 2024 09:27:29 -0000

Simon Josefsson writes:
> Can we give some examples of bad hybrid signature schemes, to better
> understand the failure modes of signature combiners?

There have been some papers (e.g., https://eprint.iacr.org/2020/1525)
pointing out attacks against protocols for which it's reasonable to
think that extra hashing in signature schemes would stop the attacks.

The extra hashing can also be carried out by protocols, so signature
schemes can blame protocols for not being based purely on standard
properties such as EUF-CMA, while protocols can blame signature schemes
for not doing some cheap hashing to provide additional properties. The
finger-pointing lets both sides avoid taking any action. It's rare for
there to be any engineering analysis of how the ecosystem as a whole
should avoid the attacks.

I think that combiners are a good place to put the hashing. We want
combiners anyway to be sure that adding post-quantum systems isn't
losing security. The basic reasons for having multiple signature systems
and multiple protocols don't apply to combiners, so we should expect
fewer combiners than signature systems and protocols, meaning less
review work for the hashing.

> What properties should a good generic hybrid signature combiner have?
> Can we describe one example?

Here are some design goals and a specific combiner proposal.

As a starting point, to make it less likely for people to skip signature
verification, it's good to have APIs producing signed messages rather
than signatures. A hybrid of two signature systems then internally has

   (1) message -> first signature system -> signed message,
   (2) signed message -> second signature system -> double-signed message.

This also forces the attacker to break the second signature system
before being able to feed forgeries to the first signature system.

I think that auditors trusting the pre-quantum system will be happiest
if the first signature system is the pre-quantum system. It's then clear
that the message going into the application has been accepted by that
system, even if something goes rather horribly wrong with the software
for the post-quantum system. (Obviously a buffer overflow is game over
in any case, but for code with input-independent buffer sizes there are
tools that convincingly verify that there aren't buffer overflows.)

So far all of this compatible with your example of "s1 := EdDSA(m)"
followed by "s2 := ML-DSA(s1||m)": EdDSA signs first, and the entire
EdDSA signed message s1||m is given as input to ML-DSA. Now let's look
at what papers have recommended for extra hashing.

One recommendation (in, e.g., https://eprint.iacr.org/2020/1525) is to
include, as an extra output attached to the signed message, a hash of
the message and public key. The rationale for this recommendation has
two parts: first, without the hashing, signatures might be reusable for
other public keys; second, without the hashing, signatures might be
reusable for other messages under weak public keys (even if they aren't
reusable for other messages under legitimately generated public keys).

This extra hash costs 32 bytes in signature size using SHA3-256. The
extra size requires discussion:

   * Most proposals for extra hashing in combiners are clearly
     affordable because the CPU time for hashing is roughly 1% of the
     cost of communicating the data being hashed. This proposal is
     different, since it's incurring extra network traffic.

   * On the other hand, none of the claims that I've seen of
     unaffordability of post-quantum signatures are based on such small
     size differences. If there are real examples of 32 extra bytes
     being unaffordable on top of post-quantum signatures, then why are
     anti-hybrid arguments resorting to unquantified "less efficient"
     statements? (See https://blog.cr.yp.to/20240102-hybrid.html.)

   * In the context of web pages (normally >2MB spread across <=10
     connections, with maybe 6 signatures per connection), code signing,
     etc., it's not plausible that an extra 32 bytes per signature is an
     issue.

All things considered, I think it's best to include this hash.

Another recommendation (from the same paper) is to sign H(m,pk) instead
of signing m. This is more controversial:

   * This breaks collision-resilience of the signature system.

   * The rationale for the recommendation is a specific protocol that's
     broken if an attacker, without seeing m, can convert a signature on
     m into a signature on m under another public key. Saying that a
     signature system should provide "non-resignability" is inherently
     incompatible with a traditional signed-message ("message recovery")
     API, which, as noted above, is less error-prone than a
     detached-signature API.

   * The paper gave a definition of "non-resignability" and claimed that
     signing H(m,pk) provably achieves this property, but, as pointed
     out in https://eprint.iacr.org/2023/423, the proof is wrong, and
     the defined property is always breakable. There are now attempts to
     fix the definition, but having "proofs" sitting there for years
     without ever actually being checked doesn't make me think that the
     security properties here have actually been thought through.

The 2023 paper recommends signing H(m,pk,r) for a random r. This is more
attractive:

   * This is compatible with collision-resilience.

   * No matter how weak the arguments for "non-resignability" are, there
     are other papers recommending randomization as a countermeasure
     against faults. Having this as an extra layer avoids the test
     questions raised by randomizing the underlying signature system.

I'd put the hash inputs in the opposite order in case someone plugs in
SHA-2.

One more recommendation is to include the signature-system name and
application name as extra hash inputs (along with any context specified
by the application), in a way that can be uniquely parsed across
applications, so that signatures can't be moved across protocols.

(https://eprint.iacr.org/2023/423 describes other ways to avoid having
non-hybrid signatures being extractable from hybrid signatures, but I
don't think any of those are compatible with using signature systems as
black boxes inside a combiner, and meanwhile I think that people asking
for contexts want more than just this hybrid-specific property.)

A combiner proposal that combines (ahem) all of the above is for a
signed message to be (s2,s1,r,h,m) where

   m = the message being signed,
   r = H(fresh randomness chosen during signing),
   h = H(r,H(hybridpk),hybridsigname,appname,appcontext,m),
   s1 = Ed25519 signature of (r,h),
   s2 = post-quantum signature of (s1,r,h),
   H = SHA3-256.

Using H(hybridpk) instead of hybridpk makes clear that H(hybridpk) can
be saved alongside hybridpk, guaranteeing that the key is hashed only
once when it's generated or received (so, as above, the cost of hashing
long keys will be only about 1% of the cost of communicating the keys).
This allows a 2^128 attack producing two keys with the same H(hybridpk),
but EUF-CMA is still collision-resilient. I presume that we can manage
fixed lengths for hybridsigname, appname, and appcontext. 

---D. J. Bernstein