[Mathmesh] UDF Design notes

Phillip Hallam-Baker <phill@hallambaker.com> Tue, 13 August 2019 16:49 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: mathmesh@ietfa.amsl.com
Delivered-To: mathmesh@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id BF98312024E for <mathmesh@ietfa.amsl.com>; Tue, 13 Aug 2019 09:49:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.4
X-Spam-Status: No, score=-1.4 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id auv9FubwWKh3 for <mathmesh@ietfa.amsl.com>; Tue, 13 Aug 2019 09:49:40 -0700 (PDT)
Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AD5931200D6 for <mathmesh@ietf.org>; Tue, 13 Aug 2019 09:49:40 -0700 (PDT)
Received: by mail-ot1-f41.google.com with SMTP id c7so2493354otp.1 for <mathmesh@ietf.org>; Tue, 13 Aug 2019 09:49:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=6/sYNxmnnyUAVNEJ8zI+whA2yiaaXMXZ/iQEhZb43bo=; b=mvGwgVR+OKkp3o6szDpEILxRcHD5nmCeck4bDDi4nwFTCD5B8WFQuAWf07YR+CE+aC fBGH+q/V/UyD2mGVhjQIkBnTkaPhUISSSCI1NzuBXhqjct1ANS3c/B7xTBXiGJRR2HI2 7jrVWOTanXqq5kIPy7WSf4d3zockb8ZxQZpbnFE4cFFbjYnOi66InmI7Kx2EVxlU8gMz NnITT9GmdLEEPpl/fRX9IdcWm3F395ih6IJ5gKNscenjRdK4cNBgkP2f1fXaBnq95oEV /avXNZbJnPdZ4dySJ3sh8UsP9F5HGWrwyv/yfxRZjyYE2aKl0ijffRkg1Xd6EpPJr+Vt KlIA==
X-Gm-Message-State: APjAAAWNMpx0UsouhePEy14wEYSw7FRTKIJl7lbTEu/1vpmgVbOcvLuZ ETbU13nP3QWOVVx/laUda9kLYV8sksHlUtQn3NrEUMsp
X-Google-Smtp-Source: APXvYqxNxTfPUhJ/+hhyxOtnv+iQrFoqPSGoHpOpFZ1sDeZLcELPZECFVgZ1cJRCqeWrbYHvv1yY9X7RrU3+tWNuYSo=
X-Received: by 2002:a9d:5a11:: with SMTP id v17mr15790535oth.87.1565714979723; Tue, 13 Aug 2019 09:49:39 -0700 (PDT)
MIME-Version: 1.0
From: Phillip Hallam-Baker <phill@hallambaker.com>
Date: Tue, 13 Aug 2019 12:49:02 -0400
Message-ID: <CAMm+LwgqiUNNqR73uO6=iuK0VUPEBWdVLVp7H=KxiB-O8_y+hQ@mail.gmail.com>
To: mathmesh@ietf.org, Christian Huitema <huitema@huitema.net>
Content-Type: multipart/alternative; boundary="0000000000002fae6d0590026df3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/mathmesh/j3ChQCEb5aeAubFI_-I2FHI32No>
Subject: [Mathmesh] UDF Design notes
X-BeenThere: mathmesh@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <mathmesh.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mathmesh>, <mailto:mathmesh-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mathmesh/>
List-Post: <mailto:mathmesh@ietf.org>
List-Help: <mailto:mathmesh-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mathmesh>, <mailto:mathmesh-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Aug 2019 16:49:43 -0000



I try to avoid giving explanations in documents that contain normative text
as this can be a source of ambiguity. So I will try to summarize the design
concerns here.

The original starting point was the need for a 'better PGP fingerprint'. In
particular, I wanted a digest representation that was:

* Reasonably compact in both printed and voiced form.
* Supported use of SHA2
* Could be expressed at varying precision as the application needs demanded
* Could be upgraded to make use of new algorithms if absolutely necessary
* Could be input via a keyboard
* Could be compared for equality
* Could be used to create a fingerprint of any type of data (not just keys)
* Support use in QR codes, DNS labels, etc.

After a while I quickly realized I would also need to ensure that

* UDFs could not be confused with OpenPGP fingerprints
* A UDF of a key could not be confused with a UDF of a document or

This led to the construct described in the document, i.e.

* A binary UDF is an octet stream in which the first octet describes the
digest algorithm used to create it.
* The binary UDF is canonically presented in BASE32

The choice of Base32 comes from the observation that fingerprints are
frequently read out aloud and having to distinguish case makes the process
tedious and error prone. So it cant be Base64 but Base32 gives us five bits
per char instead of four.

The Type Identifiers for SHA-2 and SHA-3 are chosen so as to cause the
first letter of a SHA-2 digest to always be an M (for Merkle-Damgard and to
remind us of the Rivest/MD5 legacy) and tthe first letter of a SHA-3 digest
will always be a K for Keccak.

I have tried various groupings of digits and ended up choosing 4 because it
makes the code least complex.

A UDF content digest is a digest of a typed content sequence. First
the Content Digest Value (CDV) is determined by applying the digest
algorithm to the content data:

CDV = H(<Data>))

Then the Typed Content Digest Value (TCDV) is determined by applying the
digest algorithm to the content type identifier and the CDV:

TCDV = H (<Content-ID> + ‘:’ + CDV)

The reason for this two step process is that while there are tens of
thousands of content types in use, 99% of documents out there are of less
than 50 content types. So if we have a fingerprint of a file and we are not
sure what content type was intended, we can check the 50 most likely
content types as quickly on a 1GB file as on a 1KB file.

Finally, I started seeing that people have Bitcoin addresses of the form
faceb-ookds-lkjh or whatever. Pretty kewl huh, the first letters of my key
tells you who owns it?

Erm no, what people have done there is provide a new attack vector because
most people will only remember the first ten digits of an address at most
and this mnemonic is encouraging people to assume from memory. Anyone who
wants to impersonate the above, just needs to search for a public key whose
first eight digits match and they will probably trick enough people to be

So I decided to encourage people not to do this and then thought, what if
they could have a shorter fingerprint that was just as strong.

There is IETF precedent for this as Christian Huitema (ccd) can explain.
There is unfortunately a patent to Microsoft. It is not clear that the MSFT
patent covers my approach however and since they allowed a royalty free
etc. license in the past, they might well in this instance.

Having developed a spec for content digests, I quickly realized that my
code was also generating a number of ASCII outputs that look like UDF
fingerprints but were not content digests. So I decided to assign code
points for the following:

* Nonces (N)
* Encryption keys (E)
* MAC results (A)
* Shamir Secret Share (S)

These are the features I use in the Mesh and so I have a fairly good
understanding of at least a partial set of requirements. But it is not
necessarily a complete list.

The two use cases that I am most tempted to add are representations for
public and private key literals.

PRO: One spec fits all.
CON: Even 25519 keys are long do we read them out over the phone? Perhaps
we should consider Base64?

I have not decided on this yet and I don't have a use case. But I might
when I get to thinking about TLS SNI encryption type things. I currently
have two possible ways forward.

The first is to add a Type indicator PKIX (112 = P)  which would be
followed by a PKIX KeyInfo block. This would be encoded in Base32. This
would look something like this for an Ed25519 key:


The second would be to do the above in Base64 but use a type indicator that
gives an initial letter in Base64:


I have no strong vies on this either way.