[Cfrg] review of draft-mcgrew-hash-sigs-08
Paul Selkirk <paul@psgd.org> Mon, 13 November 2017 20:24 UTC
Return-Path: <paul@psgd.org>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 44AAC1241F3 for <cfrg@ietfa.amsl.com>; Mon, 13 Nov 2017 12:24:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.234
X-Spam-Level:
X-Spam-Status: No, score=-6.234 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id coOLOoZhRYG5 for <cfrg@ietfa.amsl.com>; Mon, 13 Nov 2017 12:24:34 -0800 (PST)
Received: from psg.com (psg.com [IPv6:2001:418:1::62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 585D81201F8 for <cfrg@ietf.org>; Mon, 13 Nov 2017 12:24:34 -0800 (PST)
Received: from [73.119.134.196] (helo=[192.168.1.170]) by psg.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89 (FreeBSD)) (envelope-from <paul@psgd.org>) id 1eELHV-0008oQ-Km for cfrg@ietf.org; Mon, 13 Nov 2017 20:24:33 +0000
To: cfrg@ietf.org
From: Paul Selkirk <paul@psgd.org>
Message-ID: <6a3950fc-c98b-9617-160f-beb51f0adc31@psgd.org>
Date: Mon, 13 Nov 2017 15:24:32 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/Xl4EKjU61Uh76Hnm_Ty_0bNuIs0>
Subject: [Cfrg] review of draft-mcgrew-hash-sigs-08
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Nov 2017 20:24:37 -0000
Greetings. I've been asked to look into implementing this draft for the CrypTech project (http://cryptech.is). I'm not a cryptographer, and I had no previous familiarity with hash-based signatures. But I do have some experience both writing and implementing standards, and I have to admit it took several readings before I understood this one. There is a certain amount of sloppiness and imprecision in terminology in this document, e.g. "type" and "typecode" mean different things at different levels, and it's not always immediately clear which typecode is being talked about. Also, some concepts are under-explained for the uninitiated, most notably the hierarchical signing system (both the justification and the construction). For the record, I support this work, and believe it can be implemented as-is. The following comments are mostly editorial, to improve the clarity, consistency, and readability of the spec. 1. Introduction After paragraph 3, I'd add a short explanation of Merkle trees for the benefit of newcomers like myself, e.g. In the Merkle scheme, a binary tree of height h is used to hold 2^h OTS key pairs. Each inner node of the tree holds a hash of the concatenation of its children's public keys. The public key of the tree is the hash value contained in the root node (a recursive hash of the OTS public keys), while the private key of the tree is the collection of all the OTS private keys, together with the index of the next OTS private key to use. I would break up paragraph 4, and add a short explanation of the HSS, e.g. In this note we describe the Leighton-Micali Signature (LMS) system, which is a variant of the Merkle scheme, and a Hierarchical Signature System (HSS) built on top of it that can efficiently scale to larger numbers of signatures. In order to support signing large numbers of messages on resource-constrained systems, the Merkle tree can be subdivided into a number of smaller trees. Only the bottom-most tree is used to sign messages, while trees above that are used to sign their children. For example, in the simplest case with 2 levels, the root tree is used to sign 2^h trees with 2^h OTS key pairs, for a total of 2^(2h) key pairs. The advantage of this scheme is that only the "active" trees are instantiated, which saves both time (for key generation) and space (for key storage). Signing is also somewhat faster, although signatures are larger and verification is slower. Paragraph break before "This note is structured as follows." 2. Interface It's not clear why paragraphs 2-4 are indented. Are you quoting another document? 3.2. Security string As a small nit, you say things like "this is bytes 0-15 of the hash", where you really mean "of the string being hashed". I tend to think of "the hash" as being the output of the hash function. A much larger nit is that the "security string" is not a single thing, and this section is full of forward references to later parts of the document, so it's very hard to parse it without having previously read the rest of the document. 4. LM-OTS One-Time Signatures In order to facilitate its use in an N-time signature system, the LM- OTS key generation, signing, and verification algorithms all take as input a diversification parameter q This is the only use of the phrase "diversification parameter". I'm really not sure what it means. q is the leaf number, or alternately the current index into the LMS secret key. This feels like a layer violation. When the LM-OTS signature system is used outside of an N-time signature system, this value SHOULD be set to the all-zero value. Using LM-OTS outside of LMS is outside the scope of this document. 4.1. Parameters Increasing w will shorten the signature, but at a cost of a larger computation to generate and verify a signature. Remove this sentence because it's redundant to the slightly fuller discussion in section 4.2. 4.2. Parameter Sets As a purely editorial matter, it's unclear why this is a separate section from "Parameters", where they're a single section 5.1 for LMS. I might combine the sections, and merge the last paragraph of 4.1 with the first paragraph of 4.2. 4.3. Private Key The format of the LM-OTS private key is an internal matter to the implementation, and this document does not attempt to define it. But then it goes on to suggest a format, and the rest of the document assumes that format is used. 1. retrieve the value p from the type, This is redundant to step 3. and the value I the 16 byte identifier of the LMS public/private keypair that this LM-OTS private key will be used with This is phrased awkwardly, and implies that it's someone else's job to generate I. I suggest changing step 1 to something like the following (with language borrowed from section 3.2): 1. Generate I, a 16 byte identifier. It MUST be chosen uniformly at random, or via a pseudorandom process. (Or insert this step in between steps 3 and 4, for consistency with section 4.4.) As a side note, our project already uses RFC4122 version 4 UUIDs, which are effectively 122-bit random numbers with a few fixed bits. I'm assuming that the security of this system doesn't require a full 128 bits for I. 4.6. Signature Generation The LM-OTS signature of a message is generated by first prepending the LM key identifier I, the LM leaf identifier q, Change "LM" to "LMS". 5. Leighton Micali Signatures "Leighton Micali" should be consistently hypenated as "Leighton-Micali" throughout the document. In general, the j^th node at level L has node number 2^L + j. This use of L is distinct from HSS, but could possibly cause confusion. I might change L to i here. I might also move this whole description of node numbers to section 5.3, which is the only place it's actually used. 5.1. Parameters H SHOULD be the same as in Section 4.1, but MAY be different. There are 2^h leaves in the tree. The hash function used within the LMS system SHOULD be the same as the hash function used within the LM-OTS system used to generate the leaves. You say the same thing in immediately adjacent paragraphs. 5.2. LMS Private Key The format of the LMS private key is an internal matter to the implementation, and this document does not attempt to define it. Again, you then define a suggested format, and the rest of the document assumes that format is used. In particular, q is a mandatory and visible part of the signature. 5.3. LMS Public Key The public key format should really be on a line by itself, pulled out of the paragraph, e.g. The LMS public key is the string u32str(type) || u32str(otstype) || I || T[1] Better yet, include an algorithm, e.g. 1. Generate I, a 16 byte identifier. It MUST be chosen uniformly at random, or via a pseudorandom process. 2. compute the hash value of each leaf node as follows: r = 2 * 2^h - 1 while (r >= 2^h) { T[r] = H(I||u32str(r)||u16str(D_LEAF)||OTS_PUB[r-2^h]); r = r - 1 } 3. compute the hash value of each inner node as follows: r = 2^h - 1 while (r > 0) { T[r] = H(I||u32str(r)||u16str(D_INTR)||T[2*r]||T[2*r+1]); r = r - 1 } 4. return u32str(type) || u32str(otstype) || I || T[1] 5.4. LMS Signature Again, I'd pull out the symbolic representation of the signature onto its own line (or into pseudo-code, as below), for clarity and emphasis. More seriously, the authentication path is under-specified. The array of values contains the siblings of the nodes on the path from the leaf to the root but does not contain the nodes on the path themselves. The array for a tree with height h will have h values. The first value is the sibling of the leaf, the next value is the sibling of the parent of the leaf, and so on up the path to the root. This never says what is in path[i]. A close reading of the verification algorithm tells us that it's the m-byte hash value of the tree node, but it really needs to be made explicit here. Also, in the symbolic version, you say path[0] || path[1] || ... || path[h-1]. This reverses the usual ordering, where layer 0 is the root of the tree, and layer h-1 is the bottom of the tree. I understand why it's phrased this way, but it's rather jarring. I think it would really clarify things to include an algorithm, e.g. 1. set type to the typecode of the LMS algorithm 2. using the current value q from the LMS private key, create the LM-OTS signature for the message, using OTS_PRIV[q] 3. compute the array path as follows: r = 2^h + q i = 0 while (r > 1) { if (r is odd): path[i] = T[r-1] else path[i] = T[r+1] r = r/2 i = i + 1 } 4. S = u32str(q) || ots_signature || u32str(type) || path 5. q = q + 1 6. return S 6. Hierarchical signatures This is the most under-specified part of the whole document, which is unfortunate, because it's the point of the whole document. The prose should describe the rationale and form of the tree-of-trees (as I suggested above), and there should be pseudo-code. It's also worth explaining that the meta-tree is conceptual, as only one tree at each level is instantiated at a time. On a first reading, it appears to be a list of trees rather than a tree of trees. Perhaps there should be a Parameters subsection, to discuss the number of levels L. There isn't an algorithm or symbol associated with it, but it still needs to be established before anything is instantiated. L has a maximum value of 8, which seems oddly prescriptive. Is there an engineering reason for that limit? The HSS scheme can sign a maximum of 2^200 messages with h=25, which is pretty huge, but only 2^40 messages with h=5. I'm not trying to cause trouble, but I get curious when I see magic numbers. HSS does not require all trees to use the same LMS type code (i.e. to be the same height). While most reasonable people would use same-height trees, is there a use case for trees of different heights (or even different hash algorithms, once there are codes to support that)? It seems like there should be some guidance here. 6.1. Key Generation While it's true that only the root tree needs to be instantiated before the public key is published, the child trees need to be instantiated before any messages can be signed, so I'd prefer to do that in the key generation process. Again, this would really benefit from some pseudo-code, e.g. 1. generate L LMS key pairs, as specified in sections 5.2 and 5.3 2. sign each child public key with the private key of its parent: i = 0 while (i < L-1) { sig[i] = lms_signature(pub[i+1], priv[i]) i = i + 1 } 3. return u32str(L) || pub[0] As a notational issue, I think of the signature as belonging with the thing being signed, rather than with the key that is used to sign it, e.g. 2. sign each non-root public key with the private key of its parent: i = 1 while (i < L) { sig[i] = lms_signature(pub[i], priv[i-1]) i = i + 1 } I understand that it's done the first way so that sig[L-1] can be the signature of the message, but that could be designated sig_msg or some other way. 6.2. Signature Generation This is harder to pseudo-code, given the number of things that need to be explained. 1. If the message-signing key prv[L-1] is exhausted, regenerate that key pair, together with any parent key pairs that might be necessary. If the root key pair is exhausted, then the HSS key pair is exhausted, and it MUST NOT generate any more signatures. if (prv[L-1].q == 2^h) { d = L-1 while (prv[d-1].q == 2^h) { if (d == 0) return FAILURE d = d - 1 } while (d < L) { regenerate keypair pub[d], prv[d] sign pub[d] with prv[d-1] } } 2. Sign the message sig[L-1] = lms_sign(msg, prv[L-1]) 3. Create the list of signed public keys i = 0 while (i < L-1) { signed_pub_key[i] = sig[i] || pub[i+1] i = i + 1 } 4. return u32str(L-1) || signed_pub_key[0] || ... || signed_pub_key[L-2] || sig[L-1] 7. Formats I would prefer to rename some of the types for consistency: ots_algorithm_type -> lmots_algorithm_type ots_signature -> lmots_signature hbs_algorithm_type -> lms_algorithm_type hbs_reserved -> lms_reserved hbs_public_key -> lms_public_key Why is there a gap between hbs_reserved and lms_sha256_n32_h5? Is it because the completely different enum ots_algorithm_type has values 1-4? Many of the objects start with a typecode. A verifier MUST check each of these typecodes, and a verification operation on a signature with an unknown type, or a type that does not correspond to the type within the public key MUST return INVALID. The expected length of a variable-length object can be determined from its typecode, and if an object has a different length, then any signature computed from the object is INVALID. This seems out of place in a section devoted to XDR formats.
- [Cfrg] review of draft-mcgrew-hash-sigs-08 Paul Selkirk