Re: [openpgp] Fingerprints

Phillip Hallam-Baker <phill@hallambaker.com> Fri, 17 April 2015 16:59 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F1B91A8868 for <openpgp@ietfa.amsl.com>; Fri, 17 Apr 2015 09:59:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.278
X-Spam-Level:
X-Spam-Status: No, score=-3.278 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, FREEMAIL_FROM=0.001, GB_I_LETTER=-2, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xaB6w6XL3QoD for <openpgp@ietfa.amsl.com>; Fri, 17 Apr 2015 09:59:48 -0700 (PDT)
Received: from mail-lb0-x234.google.com (mail-lb0-x234.google.com [IPv6:2a00:1450:4010:c04::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9FB631A8854 for <openpgp@ietf.org>; Fri, 17 Apr 2015 09:59:45 -0700 (PDT)
Received: by lbbzk7 with SMTP id zk7so87905884lbb.0 for <openpgp@ietf.org>; Fri, 17 Apr 2015 09:59:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=rftR7lZ9GiFwbdINmtKkk1k7UDaySJGv7UdJlDXCgWM=; b=J4cgcTar9Q8XtSwn1yggbO5vktpZt/ftgtCkbGokp4Bprp3XOIx0CuZDQoCBTUhANn DTJiQzw7Nsyl9GvyzMAxGFosVs7KZs8oYMPGd8VloadRO8qv6DMN/lvp1bCwrom79KBg eyci5Gu4EHpe4ax+0cfGsXAFjYrCu5WXo5paiZ9Pc7XtNZTrOtPvXuYn04+zWk8gOEIv owt56lAihVcbi+3PODM5AUF74M8NERdSsUwZ8puBNFAZNz3vmJmnYO5bfHMHaPicYouF zk/2h3a4Og3HPkRw2wWOEvyPrdVrRVySyjhprNL4H0DB3nJibQandgApKGuvqZPpDGWl VAng==
MIME-Version: 1.0
X-Received: by 10.112.198.225 with SMTP id jf1mr5022099lbc.91.1429289983990; Fri, 17 Apr 2015 09:59:43 -0700 (PDT)
Sender: hallam@gmail.com
Received: by 10.112.147.165 with HTTP; Fri, 17 Apr 2015 09:59:43 -0700 (PDT)
In-Reply-To: <553117AF.9070205@iang.org>
References: <CAMm+LwhbB+-MnGRBCvprgAGOuu+5CJ2rgod7EBGOQR5UNVrspQ@mail.gmail.com> <87y4m0ozlt.fsf@vigenere.g10code.de> <20150415135105.GJ3106@singpolyma-liberty> <FE2717DC-3950-4536-B83D-BD005D2F26A6@callas.org> <1429128262.1702.41.camel@scientia.net> <E07D3736-038C-4C97-B96B-77284A5A9B02@jabberwocky.com> <1429131461.1702.52.camel@scientia.net> <sjmegnkccau.fsf@securerf.ihtfp.org> <CAMm+LwjtuogtN1on_zzckOMxAcCKBbKPQeTFvmWq-TLmXMibZQ@mail.gmail.com> <553117AF.9070205@iang.org>
Date: Fri, 17 Apr 2015 12:59:43 -0400
X-Google-Sender-Auth: htm7lL6BuT3t2Qz49logWcF1gDA
Message-ID: <CAMm+Lwi0_0kdtBa8OjxwK3MOVcjMWRtrr8MPwb5bSgjvweiGPQ@mail.gmail.com>
From: Phillip Hallam-Baker <phill@hallambaker.com>
To: ianG <iang@iang.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/SJwFAp8AlCXxNsy1B5uI-aYXQro>
Cc: IETF OpenPGP <openpgp@ietf.org>
Subject: Re: [openpgp] Fingerprints
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Apr 2015 16:59:56 -0000

On Fri, Apr 17, 2015 at 10:24 AM, ianG <iang@iang.org> wrote:
> On 16/04/2015 18:46 pm, Phillip Hallam-Baker wrote:
>
>> <Fingerprint-ID>
>>
>> At the moment the consensus proposal seems to be that Fingerprint-ID
>> is a numeric code that has exactly two entries.
>
> I don't know why we'd do both.  I suppose it's because hashes are like
> mountains and seeing them, we have to walk up them.  If there are two, we
> have to walk up and down twice...

It is very much the settled consensus now that proliferating crypto
algorithms is a bad thing. The security of a system is determined by
its weakest algorithm, not the strongest. So we need the number of
algorithms to be as small as possible.

The number of algorithms has to be greater than zero.

Systems that have a single algorithm that cannot be changed have also
resulted in real problems. People tend to hard code code paths in the
expectation that they will never change.

So for all those reasons, the approach that seems to work best as the
general rule is exactly one mandatory to implement and exactly one
recommended alternative as a backup.

Given that we only have two hash algorithms that are likely
candidates, the only real discussion is which of the two should be
mandatory to implement for use in OpenPGP. Which is a discussion we
can and should defer to later. Right now I can make a fairly good
argument that SHA-2 is the better pick based on the facts as they
stand today. I do not expect those facts to be the same in 18 months
time.

But right now I don't have SHA-3 implementations on my platform and
nor do many others. And I really don't want to spend time writing a
library or using code from other sources when I know that I am certain
to have a BSD/MIT licensed alternative from a well resourced vendor in
a short space of time.


>> I suggest:
>>
>> 96: SHA-2-512
>> 144: SHA-3-512
>
> In the unfortunate event that we allocate multiple hashes + numbers, then I
> suggest we also allocate an X that is to be used for closed, internal
> trials.  This way, people are less likely to homestead spots and then come
> to us with arguments about how they're using ABC and they don't want people
> to change and bla bla.

I don't think this is a problem. I very much doubt anyone will want to
argue for an algorithm that is not SHA-2 or SHA-3 and one of the main
reasons to choose the 512 bit versions and truncate is that it removes
the incentive to argue for one of the shorter versions.

The reason to pre-allocate the two spots now is precisely to remove
the incentive for homesteading. Right now I don't have SHA-3 but I
need something to test. We are definitely going to see homesteading if
we don't declare a spot for SHA-2.


>> These numbers are not completely random. While the codes themselves
>> don't matter, using 0x60 and 0x90 has the pleasing and convenient
>> effect that SHA-2-512 fingerprints will always start with the letter M
>> (for Merkle-Damgard) and SHA-3-512 fingerprints will always start with
>> the letter S (for Spongeworthy).
>
> OK, cautious nod to the letters - although it would be pleasing if you could
> point to a web calculator that could lay out the conversions for those of us
> who've forgotten how do to hex-b32-ascii-dec in our head ;)

OK, using the Windows calculator in programmer mode:

96 decimal = 0x60 = 0110,0000
144 Decimal = 0x90 = 1001,0000

Base32 requires us to start with the most significant bits. So the
first characters are

01100 = 12 = 'M'
10010 = 18 = 'S'

The base32 encoding table is taken from: https://tools.ietf.org/html/rfc3548

This is not just the IETF encoding, it is I believe the same one that Phil Z.
proposed. And likely for the same reason - take the latin-1 alphabet first, then
discard the numbers 0, 1, 2 which are easily confused with O, I and Z.

I verified the analysis using these tools:
http://www.binaryhexconverter.com/decimal-to-binary-converter
http://tomeko.net/online_tools/hex_to_base32.php?lang=en


Note that I am not certain yet that we want to use Base32 encoding without
modification. We might well want to use a parity scheme so that the
fingerprint can be verified as it is typed.

Lets say we are using a fingerprint with six character blocks:

aaaaa-bbbbb-ccccc-ddddd-eeeee-fffff

We could make the least significant bit of block A a parity check
on the other bits of block A, the least significant of block B a parity check
on AB, the lsb of C a check on ABC, etc.

This adds quite a bit of robustness and allows the value to be checked
as it is typed. It is not a perfect check of course. But it is something. And
it might just be that six character blocks with parity checking has higher
user acceptability than five blocks without. So this might well strengthen
the system net.


> What is the extension strategy for when we've exhausted the 256
> possibilities in a byte?
>
> (Yes I realise you didn't specify a byte, but I guess that's part of the
> question.)

The general answer is 'reserve half the code points in the initial registry
to extension schemes'. So I see two options:

1) Prepend the identifier to the hash value, this obviously requires the
identifiers be issued in byte aligned increments.

Fingerprint = ID + Hash

If Fingerprint [0] < 128, the first byte is the algorithm identifier
If Fingerprint [1] < 196, the lower 14 bits of the first two bytes
   are the algorithm identifier.

This gives 128 possible single byte identifiers and 16384 two byte
values. I do not feel the need to specify additional expansion capability
since we never came close to exhausting a 16 bit algorithm registry for
a single algorithm when we encouraged multiple algorithms (suites
are a different issue)

2) Make the topmost 5 bits the identifier.

This is actually simpler to implement on many platforms as it is easier
to overwrite the first byte of a buffer than prepending another buffer
This discards data from the hash value of course but that is inevitable
when a fingerprint is used.


> We ourselves don't want more than a handful.  But if we open up the
> fingerprint standard to a wider audience, then austerity will be out the
> window.  What's our approach of the TLS group decides they want to add a few
> hundred?

The TLS group have run into problems because suites are a bad idea.

Choosing the strongest versions of the best of class Merkle-Damgard and
Spongeworthy algorithms should remove the incentive to proliferate.
I have never been in a situation where someone has been saying 'we
need a weaker algorithms'.

The COAP folk might want to use a rubbish algorithm 'for speed' of
course. But the only algorithm they are likely to agree on is SHA1
and that isn't a lot faster and there are all sorts of reasons why they
are going to be unable to make it their only algorithm.