[Cfrg] review of draft-mcgrew-hash-sigs-08

Paul Selkirk <paul@psgd.org> Mon, 13 November 2017 20:24 UTC

To: cfrg@ietf.org
From: Paul Selkirk <paul@psgd.org>
Message-ID: <6a3950fc-c98b-9617-160f-beb51f0adc31@psgd.org>
Date: Mon, 13 Nov 2017 15:24:32 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/Xl4EKjU61Uh76Hnm_Ty_0bNuIs0>
Subject: [Cfrg] review of draft-mcgrew-hash-sigs-08
Precedence: list

Greetings. I've been asked to look into implementing this draft for the
CrypTech project (http://cryptech.is). I'm not a cryptographer, and I
had no previous familiarity with hash-based signatures. But I do have
some experience both writing and implementing standards, and I have to
admit it took several readings before I understood this one.

There is a certain amount of sloppiness and imprecision in terminology
in this document, e.g. "type" and "typecode" mean different things at
different levels, and it's not always immediately clear which typecode
is being talked about.

Also, some concepts are under-explained for the uninitiated, most
notably the hierarchical signing system (both the justification and the
construction).

For the record, I support this work, and believe it can be implemented
as-is. The following comments are mostly editorial, to improve the
clarity, consistency, and readability of the spec.

1.  Introduction

After paragraph 3, I'd add a short explanation of Merkle trees for the
benefit of newcomers like myself, e.g.

   In the Merkle scheme, a binary tree of height h is used to hold 2^h
   OTS key pairs. Each inner node of the tree holds a hash of the
   concatenation of its children's public keys. The public key of the
   tree is the hash value contained in the root node (a recursive hash
   of the OTS public keys), while the private key of the tree is the
   collection of all the OTS private keys, together with the index of
   the next OTS private key to use.

I would break up paragraph 4, and add a short explanation of the HSS, e.g.

   In this note we describe the Leighton-Micali Signature (LMS) system,
   which is a variant of the Merkle scheme, and a Hierarchical Signature
   System (HSS) built on top of it that can efficiently scale to larger
   numbers of signatures. In order to support signing large numbers of
   messages on resource-constrained systems, the Merkle tree can be
   subdivided into a number of smaller trees. Only the bottom-most tree
   is used to sign messages, while trees above that are used to sign
   their children. For example, in the simplest case with 2 levels, the
   root tree is used to sign 2^h trees with 2^h OTS key pairs, for a
   total of 2^(2h) key pairs. The advantage of this scheme is that only
   the "active" trees are instantiated, which saves both time (for key
   generation) and space (for key storage). Signing is also somewhat
   faster, although signatures are larger and verification is slower.

Paragraph break before "This note is structured as follows."

2.  Interface

It's not clear why paragraphs 2-4 are indented. Are you quoting another
document?

3.2.  Security string

As a small nit, you say things like "this is bytes 0-15 of the hash",
where you really mean "of the string being hashed". I tend to think of
"the hash" as being the output of the hash function.

A much larger nit is that the "security string" is not a single thing,
and this section is full of forward references to later parts of the
document, so it's very hard to parse it without having previously read
the rest of the document.

4.  LM-OTS One-Time Signatures

   In order to facilitate its use in an N-time signature system, the LM-
   OTS key generation, signing, and verification algorithms all take as
   input a diversification parameter q

This is the only use of the phrase "diversification parameter". I'm
really not sure what it means.

q is the leaf number, or alternately the current index into the LMS
secret key.  This feels like a layer violation.

   When the LM-OTS signature
   system is used outside of an N-time signature system, this value
   SHOULD be set to the all-zero value.

Using LM-OTS outside of LMS is outside the scope of this document.

4.1.  Parameters

   Increasing w will shorten the
   signature, but at a cost of a larger computation to generate and
   verify a signature.

Remove this sentence because it's redundant to the slightly fuller
discussion in section 4.2.

4.2.  Parameter Sets

As a purely editorial matter, it's unclear why this is a separate
section from "Parameters", where they're a single section 5.1 for LMS. I
might combine the sections, and merge the last paragraph of 4.1 with the
first paragraph of 4.2.

4.3.  Private Key

   The format of the LM-OTS private key is an internal matter to the
   implementation, and this document does not attempt to define it.

But then it goes on to suggest a format, and the rest of the document
assumes that format is used.

  1. retrieve the value p from the type,

This is redundant to step 3.

     and the value I the 16 byte
     identifier of the LMS public/private keypair that this LM-OTS private
     key will be used with

This is phrased awkwardly, and implies that it's someone else's job to
generate I.

I suggest changing step 1 to something like the following (with language
borrowed from section 3.2):

  1. Generate I, a 16 byte identifier.  It MUST be chosen uniformly at
     random, or via a pseudorandom process.

(Or insert this step in between steps 3 and 4, for consistency with
section 4.4.)

As a side note, our project already uses RFC4122 version 4 UUIDs, which
are effectively 122-bit random numbers with a few fixed bits. I'm
assuming that the security of this system doesn't require a full 128
bits for I.

4.6.  Signature Generation

   The LM-OTS signature of a message is generated by first prepending
   the LM key identifier I, the LM leaf identifier q,

Change "LM" to "LMS".

5.  Leighton Micali Signatures

"Leighton Micali" should be consistently hypenated as "Leighton-Micali"
throughout the document.

   In general, the j^th node at level L has node number 2^L + j.

This use of L is distinct from HSS, but could possibly cause confusion.
I might change L to i here.

I might also move this whole description of node numbers to section 5.3,
which is the only place it's actually used.

5.1.  Parameters

      H SHOULD be the same as in Section 4.1, but MAY be different.

   There are 2^h leaves in the tree.  The hash function used within the
   LMS system SHOULD be the same as the hash function used within the
   LM-OTS system used to generate the leaves.

You say the same thing in immediately adjacent paragraphs.

5.2.  LMS Private Key

   The format of the LMS private key is an internal matter to the
   implementation, and this document does not attempt to define it.

Again, you then define a suggested format, and the rest of the document
assumes that format is used. In particular, q is a mandatory and visible
part of the signature.

5.3.  LMS Public Key

The public key format should really be on a line by itself, pulled out
of the paragraph, e.g.

   The LMS public key is the string

      u32str(type) || u32str(otstype) || I || T[1]

Better yet, include an algorithm, e.g.

  1. Generate I, a 16 byte identifier.  It MUST be chosen uniformly at
     random, or via a pseudorandom process.

  2. compute the hash value of each leaf node as follows:
     r = 2 * 2^h - 1
     while (r >= 2^h) {
       T[r] = H(I||u32str(r)||u16str(D_LEAF)||OTS_PUB[r-2^h]);
       r = r - 1
     }

  3. compute the hash value of each inner node as follows:
     r = 2^h - 1
     while (r > 0) {
       T[r] = H(I||u32str(r)||u16str(D_INTR)||T[2*r]||T[2*r+1]);
       r = r - 1
     }

  4. return u32str(type) || u32str(otstype) || I || T[1]

5.4.  LMS Signature

Again, I'd pull out the symbolic representation of the signature onto
its own line (or into pseudo-code, as below), for clarity and emphasis.

More seriously, the authentication path is under-specified.

   The array of values contains the siblings of the
   nodes on the path from the leaf to the root but does not contain the
   nodes on the path themselves.  The array for a tree with height h
   will have h values.  The first value is the sibling of the leaf, the
   next value is the sibling of the parent of the leaf, and so on up the
   path to the root.

This never says what is in path[i]. A close reading of the verification
algorithm tells us that it's the m-byte hash value of the tree node, but
it really needs to be made explicit here.

Also, in the symbolic version, you say path[0] || path[1] || ... ||
path[h-1]. This reverses the usual ordering, where layer 0 is the root
of the tree, and layer h-1 is the bottom of the tree. I understand why
it's phrased this way, but it's rather jarring.

I think it would really clarify things to include an algorithm, e.g.

  1. set type to the typecode of the LMS algorithm

  2. using the current value q from the LMS private key, create the
     LM-OTS signature for the message, using OTS_PRIV[q]

  3. compute the array path as follows:
     r = 2^h + q
     i = 0
     while (r > 1) {
       if (r is odd):
         path[i] = T[r-1]
       else
         path[i] = T[r+1]
       r = r/2
       i = i + 1
     }

  4. S = u32str(q) || ots_signature || u32str(type) || path

  5. q = q + 1

  6. return S

6.  Hierarchical signatures

This is the most under-specified part of the whole document, which is
unfortunate, because it's the point of the whole document. The prose
should describe the rationale and form of the tree-of-trees (as I
suggested above), and there should be pseudo-code.

It's also worth explaining that the meta-tree is conceptual, as only one
tree at each level is instantiated at a time. On a first reading, it
appears to be a list of trees rather than a tree of trees.

Perhaps there should be a Parameters subsection, to discuss the number
of levels L. There isn't an algorithm or symbol associated with it, but
it still needs to be established before anything is instantiated.

L has a maximum value of 8, which seems oddly prescriptive. Is there an
engineering reason for that limit? The HSS scheme can sign a maximum of
2^200 messages with h=25, which is pretty huge, but only 2^40 messages
with h=5. I'm not trying to cause trouble, but I get curious when I see
magic numbers.

HSS does not require all trees to use the same LMS type code (i.e. to be
the same height). While most reasonable people would use same-height
trees, is there a use case for trees of different heights (or even
different hash algorithms, once there are codes to support that)?  It
seems like there should be some guidance here.

6.1.  Key Generation

While it's true that only the root tree needs to be instantiated before
the public key is published, the child trees need to be instantiated
before any messages can be signed, so I'd prefer to do that in the key
generation process.

Again, this would really benefit from some pseudo-code, e.g.

  1. generate L LMS key pairs, as specified in sections 5.2 and 5.3

  2. sign each child public key with the private key of its parent:
     i = 0
     while (i < L-1) {
       sig[i] = lms_signature(pub[i+1], priv[i])
       i = i + 1
     }

  3. return u32str(L) || pub[0]

As a notational issue, I think of the signature as belonging with the
thing being signed, rather than with the key that is used to sign it, e.g.

  2. sign each non-root public key with the private key of its parent:
     i = 1
     while (i < L) {
       sig[i] = lms_signature(pub[i], priv[i-1])
       i = i + 1
     }

I understand that it's done the first way so that sig[L-1] can be the
signature of the message, but that could be designated sig_msg or some
other way.

6.2.  Signature Generation

This is harder to pseudo-code, given the number of things that need to
be explained.

  1. If the message-signing key prv[L-1] is exhausted, regenerate that
     key pair, together with any parent key pairs that might be
     necessary. If the root key pair is exhausted, then the HSS key pair
     is exhausted, and it MUST NOT generate any more signatures.

     if (prv[L-1].q == 2^h) {
       d = L-1
       while (prv[d-1].q == 2^h) {
         if (d == 0)
           return FAILURE
         d = d - 1
       }
       while (d < L) {
         regenerate keypair pub[d], prv[d]
         sign pub[d] with prv[d-1]
       }
     }

  2. Sign the message
     sig[L-1] = lms_sign(msg, prv[L-1])

  3. Create the list of signed public keys
     i = 0
     while (i < L-1) {
       signed_pub_key[i] = sig[i] || pub[i+1]
       i = i + 1
     }

  4. return u32str(L-1) || signed_pub_key[0] || ... ||
signed_pub_key[L-2] || sig[L-1]

7.  Formats

I would prefer to rename some of the types for consistency:
ots_algorithm_type -> lmots_algorithm_type
ots_signature -> lmots_signature
hbs_algorithm_type -> lms_algorithm_type
hbs_reserved -> lms_reserved
hbs_public_key -> lms_public_key

Why is there a gap between hbs_reserved and lms_sha256_n32_h5? Is it
because the completely different enum ots_algorithm_type has values 1-4?

   Many of the objects start with a typecode.  A verifier MUST check
   each of these typecodes, and a verification operation on a signature
   with an unknown type, or a type that does not correspond to the type
   within the public key MUST return INVALID.  The expected length of a
   variable-length object can be determined from its typecode, and if an
   object has a different length, then any signature computed from the
   object is INVALID.

This seems out of place in a section devoted to XDR formats.

[Cfrg] review of draft-mcgrew-hash-sigs-08 Paul Selkirk