[openpgp] Crowdsourcing Base214

Phillip Hallam-Baker <phill@hallambaker.com> Wed, 29 April 2015 14:15 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C219F1A6F33 for <openpgp@ietfa.amsl.com>; Wed, 29 Apr 2015 07:15:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.602
X-Spam-Level: **
X-Spam-Status: No, score=2.602 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, FREEMAIL_FROM=0.001, FRT_PROFILE2=1.981, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PtzK4KxOuRq6 for <openpgp@ietfa.amsl.com>; Wed, 29 Apr 2015 07:15:26 -0700 (PDT)
Received: from mail-la0-x235.google.com (mail-la0-x235.google.com [IPv6:2a00:1450:4010:c03::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D01F81A8932 for <openpgp@ietf.org>; Wed, 29 Apr 2015 07:13:26 -0700 (PDT)
Received: by layy10 with SMTP id y10so21120712lay.0 for <openpgp@ietf.org>; Wed, 29 Apr 2015 07:13:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:cc:content-type; bh=af7N8bYypFLgUfvhhofRfmGXli7WUZdzc6BCkFdsDiQ=; b=ZKYio39Y+Vv5f6oM9VW6syuON0fDR/osG/56L4TsJMtBKrMwyZPQ8M8jY1MEX2sGCt aK9kynRNAUM/Br6n72S3wimBUkZOPyasI3YYomcAccAdOyb9IkRgSh9UTG1sxVp0TKwD yV0DbNYwjFyIDQemm8yqeKtCbIknEz2r+gY0BreN/38Pc8g35I+vA54/M1A4GUni/okF TtFtgmktLEBRbdcFhWNobeO7Upf2T3N69dPuQWVHLFJ7XvHmFcdQIB5RZSgBycPC8dN5 7aIb8USKrDcyJ76cvFuCs7yO96TNzlnI+eXy4wX0wkI8AOiT2gLApv8ayS22eQ5cokT4 oaKQ==
MIME-Version: 1.0
X-Received: by 10.153.7.104 with SMTP id db8mr15086820lad.124.1430316805285; Wed, 29 Apr 2015 07:13:25 -0700 (PDT)
Sender: hallam@gmail.com
Received: by 10.112.203.163 with HTTP; Wed, 29 Apr 2015 07:13:25 -0700 (PDT)
Date: Wed, 29 Apr 2015 10:13:25 -0400
X-Google-Sender-Auth: cfa3cuRo4wKSiMUKXKGytcgIkxo
Message-ID: <CAMm+LwhTidbfpMQYzJ2MNQ7cdLfjGPdAXFmH2O3XLt5eBF2F1g@mail.gmail.com>
From: Phillip Hallam-Baker <phill@hallambaker.com>
To: "Neal H. Walfield" <neal@walfield.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <http://mailarchive.ietf.org/arch/msg/openpgp/-tviqA1oVrCcsyaiuwsaDLZHouw>
Cc: Alessandro Barenghi <alessandro.barenghi.polimi@gmail.com>, IETF OpenPGP <openpgp@ietf.org>
Subject: [openpgp] Crowdsourcing Base214
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Apr 2015 14:15:27 -0000

On Wed, Apr 29, 2015 at 6:08 AM, Neal H. Walfield <neal@walfield.org> wrote:

> I wonder if less if not more.
>
> If you look at the diceware list, it has "easy to remember words" like
> "aaaa", "abner" and "adair".  And, this list is just 7776 words long.
> These are not only hard for a native speaker to memorize, but also for
> those who speak english as a second language.
>
> If we are going to make a new word list, I would recommend using
> something based on the voice of america simply word list.  This
> includes 1500 simple words, which all english speakers with basic
> proficiency are familiar with.
>
> Alternatively, there is the PGP Biometric word list [1], which aren't
> as simple, but are phonetically distinct.
>
> [1] https://en.wikipedia.org/wiki/Biometric_word_list


The larger the alphabet, the shorter the fingerprint. Since there is
no need to keep the images/words on the device, the size of the
dictionary is not that critical.

Fingerprints with the PGP biometric list are rather too long. Looking
at the options, it seems like somewhere between 13 and 16 (inclusive)
is the sweet spot. Above 64K entries, curating the list is just too
hard.

Back in 1995, memory constraints were very different.


I would very much like to keep the size of the fingerprint within the
7+/-2 working memory limit and provide at least 100 effective bits.
That requires each glyph encode at least 14 bits.

Presenting images in two sets of four seems to work quite well on an
Apple Watch. And a smartphone seems to be able to present eight at
once without too much hassle.

The big advantage to 14 bits is that it then allows a direct mapping
to the CJK unified characters in Unicode.



This looks to me to be an excellent opportunity to engage the wider
community and to crowdsource parts of the process. There are hundreds
of people willing to help. Give each person a part of the image space
to curate and we can have the process done pretty quick.

So lets say someone has 'road motor transport' for 256 entries. She
then breaks that down into 'cars', 'trucks', 'buses', 'motorcycles'
and then within each category finds 64 distinctly different examples.
Someone else does the same for 'unpowered transport', 'marine
transport', etc.

A wiki is probably sufficient for the necessary collaboration.


The purpose of this isn't just to get the best result. Engage the
community and they become advocates and early adopters. And we need
advocates who are not from the crypto community.


For the word lists, I am thinking that the best approach is to start
off with a fairly large dictionary and filter it by putting it through
google translate and seeing what distinct words survive translation
from English to French and back.

Then take the dictionary and machine translate it into 16 odd
different languages as a starting point and compute Merkle trees over
each individual corpus.


Probably the thing to do is begin with a Base 2^14 scheme which could
be expanded if desired to 2^16.