Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)

Daniel Kahn Gillmor <> Wed, 06 November 2019 17:52 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 1977E12081B for <>; Wed, 6 Nov 2019 09:52:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.457
X-Spam-Status: No, score=-0.457 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DATE_IN_PAST_06_12=1.543, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=neutral reason="invalid (unsupported algorithm ed25519-sha256)" header.b=4o02UaLm; dkim=pass (2048-bit key) header.b=U612i2uT
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id jPR6ytPPesGs for <>; Wed, 6 Nov 2019 09:52:38 -0800 (PST)
Received: from ( [IPv6:2001:470:1:116::7]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id AC70712013F for <>; Wed, 6 Nov 2019 09:52:38 -0800 (PST)
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/simple;;; q=dns/txt; s=2019; t=1573062756; h=from : to : cc : subject : in-reply-to : references : date : message-id : mime-version : content-type : from; bh=osfgQeX0XKQDpsMEs64BEkezZWIFyAjs/kNmfLM6v/0=; b=4o02UaLmDbABHgqwHi5zjhhd9Kg0kW/vHiW35IMLc/Rpz/TRBSHY6ZTO 668LB1kRVrtyxVpFyGDePRM5l0zCAA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;;; q=dns/txt; s=2019rsa; t=1573062756; h=from : to : cc : subject : in-reply-to : references : date : message-id : mime-version : content-type : from; bh=osfgQeX0XKQDpsMEs64BEkezZWIFyAjs/kNmfLM6v/0=; b=U612i2uTj9IktB8SiTuEuLgam7OsddzEQ6ljTi6kEfF+M6PWJnHC0vFo STfITFdEZfpqNYe2A8BJikHd/v5P1OjQlwNTGCry9Xi8LA/g0RGQ0RFjQ7 kaFvtSbICtpX6Kq2xBijb6uC0kL2Vxnt4EO3CzBQoFyRkrpRaNfGQ/k7/O bnjcoV/tJcjo1q2mFBmZjfMqOTitnNOhe+1oaL0NBOYpZL8fxbmW/LG7fw N/ClUE9BTebqeGkLZze6ChR0WoS5JziNs0atfwwqJQe62qUCQmHeKN03vF qtd+tg14r7/YD7mUsOVCEd+MOUptcvFU+QSP4gxssoVgrUIFbFnN0A==
Received: from (unknown []) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPSA id 8F107F9A7; Wed, 6 Nov 2019 12:52:36 -0500 (EST)
Received: by (Postfix, from userid 1000) id C97582038D; Wed, 6 Nov 2019 01:37:14 -0500 (EST)
From: Daniel Kahn Gillmor <>
To: "Neal H. Walfield" <>
In-Reply-To: <>
References: <> <>
Autocrypt:; prefer-encrypt=mutual; keydata= mDMEXEK/AhYJKwYBBAHaRw8BAQdAr/gSROcn+6m8ijTN0DV9AahoHGafy52RRkhCZVwxhEe0K0Rh bmllbCBLYWhuIEdpbGxtb3IgPGRrZ0BmaWZ0aGhvcnNlbWFuLm5ldD6ImQQTFggAQQIbAQUJA8Jn AAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgBYhBMS8Lds4zOlkhevpwvIGkReQOOXGBQJcQsbzAhkB AAoJEPIGkReQOOXG4fkBAO1joRxqAZY57PjdzGieXLpluk9RkWa3ufkt3YUVEpH/AP9c+pgIxtyW +FwMQRjlqljuj8amdN4zuEqaCy4hhz/1DbgzBFxCv4sWCSsGAQQB2kcPAQEHQERSZxSPmgtdw6nN u7uxY7bzb9TnPrGAOp9kClBLRwGfiPUEGBYIACYWIQTEvC3bOMzpZIXr6cLyBpEXkDjlxgUCXEK/ iwIbAgUJAeEzgACBCRDyBpEXkDjlxnYgBBkWCAAdFiEEyQ5tNiAKG5IqFQnndhgZZSmuX/gFAlxC v4sACgkQdhgZZSmuX/iVWgD/fCU4ONzgy8w8UCHGmrmIZfDvdhg512NIBfx+Mz9ls5kA/Rq97vz4 z48MFuBdCuu0W/fVqVjnY7LN5n+CQJwGC0MIA7QA/RyY7Sz2gFIOcrns0RpoHr+3WI+won3xCD8+ sVXSHZvCAP98HCjDnw/b0lGuCR7coTXKLIM44/LFWgXAdZjm1wjODbg4BFxCv50SCisGAQQBl1UB BQEBB0BG4iXnHX/fs35NWKMWQTQoRI7oiAUt0wJHFFJbomxXbAMBCAeIfgQYFggAJhYhBMS8Lds4 zOlkhevpwvIGkReQOOXGBQJcQr+dAhsMBQkB4TOAAAoJEPIGkReQOOXGe/cBAPlek5d9xzcXUn/D kY6jKmxe26CTws3ZkbK6Aa5Ey/qKAP0VuPQSCRxA7RKfcB/XrEphfUFkraL06Xn/xGwJ+D0hCw==
Date: Wed, 06 Nov 2019 01:37:14 -0500
Message-ID: <>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature"
Archived-At: <>
Subject: Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 06 Nov 2019 17:52:41 -0000

Hi Neal--

Thanks for this thoughtful writeup!

On Tue 2019-11-05 23:35:11 +0100, Neal H. Walfield wrote:

> Beyond being more fleshed out, this grammar is different from the
> grammar in dkg's second proposal in a few ways.
> First, it matches comments.  dkg made this a non-goal.  Given that
> people who add comments intend them as comments and not as part of
> their name, it seems reasonable to me to not display comments in
> places where only the user's name is desired.  And, since it turns out
> that matching non-nested comments is relatively straightforward, why
> not?  Note: doing this might actually help deprecate comments, because
> they won't be shown as often.

User IDs are full UTF-8 strings.  The idea that any part of that string
would be hidden from the user is pretty disturbing to me.  Consider the
situation where someone is certifying a user ID on an OpenPGP
certificate.  If the comment is hidden, do they know what identity
assertion they're making?

I'd much rather have comments be deprecated *because they are weird and
show up in places where you'd think your name should go* rather than
have them be some vestigial thing that people don't even notice any

I would recommend dropping the comment from your grammar and letting the
"name" part subsume it, when you're splitting out e-mail address from
the rest of the user ID.

Furthermore, because you've allowed "(" and ")" in atext-specials, it
looks to me like your proposed grammar is ambiguous:

    bob (joe) <>

is either:

    name: "bob (joe)"
    comment: None
    addr-spec: ""


    name: "bob"
    comment: "joe"
    addr-spec: ""

I don't think this is helpful to anyone.

> The grammar more carefully handles whitespace.  It ignores whitespace
> at the beginning of the User ID (this is what motivates the
> name-char-start production) and between the individual components in
> the pgp-uid-convention production.  As is, the grammar only ignores
> the 0x20 space character.  We may also want to include the tab
> character, unicode's NO-BREAK SPACE (U+00A0) character and its
> IDEOGRAPHIC SPACE (U+3000) character for thoroughness.  But, since
> software will normally concatenate the individual components, just
> recognizing the ASCII space character here is probably fine.  Whatever
> the case, I think we can safely ignore the rest of unicode's
> whitespace characters:

I'm fine with being judicious about selecting whitespace characters.  In
addition to tab (U+0009, ascii "HT"), i note that you've declined to
include U+000A and U+000D (ascii "LF" and "CR") in the grammar at all.

I like that kind of opinionated decision, as unprintable symbols like
this are likely to be problematic in many ways (hard for users to
distinguish at least!)

I also think that whitespace at the beginning of a user ID is asking for
trouble, and would be happy with a grammar that considers that user ID
non-conventional.  Is there a use case for leading whitespace in a user

> My pgp-uid-convention production also matches user ids without email
> addresses, e.g., "Daniel Kahn Gillmor".  This is convenient.  Instead
> of having to figure out why parsing failed (is it not valid UTF-8? is
> it just missing an addr-spec?), we explicitly cover this common
> pattern in the grammar.  I think this will significantly simplify code
> that uses this interface: if there is an error, then the code can just
> assume the User ID is trash and can be ignored.

I should be clear that i intended my earlier proposal specifically to
match OpenPGP User ID conventions *that have an e-mail address in
them*.  There are indeed other User ID conventions (like "Daniel Kahn
Gillmor", or "ssh://foo.example") that aren't covered by this, and i
thought i would be doing folks a favor by focusing on the e-mail address
side of things specifically.  My thought was that common interfaces
would allow for matching against a User ID that has an e-mail address,
and then they would have other matchers for other common conventions
that they could try applying if this convention didn't match.

This is probably an implementation detail, though.

> In RFC 2822, "specials" are only allowed in a display name if they are
> quoted.  dkg removes this requirements.  I think this is mostly
> sensible, but it means that we can have User IDs like:
> "<> <>" where the first
> <> is the display name and the second is the addr-spec.
> I think we should exclude angle brackets from the display name.  In my
> grammar, I have an "atext-specials" which is just RFC 2822 specials
> without the angle brackets.

I totally agree with this constraint.  If you're doing away with
comments (as i recommend above) then you would have to prohibit angle
brackets in commas too, which seems fine to me.

Even if you decide to go ahead with splitting out comments, I would go
so far as to ban them in comments too.  is there any plausible reason
for including angle brackets in a comment?  Simplify simplify :)

> I'm a bit concerned about allowing the backslash character: with this
> grammar, it is just a normal character, but for an RFC 2822 parser,
> it's an escape character.  Since User IDs may be used in contexts
> where RFC 2822 things are expected, we should be careful.  But, I fear
> that if we reject it, we'll end up gratuitiously rejecting some
> emojis.  ¯\_(ツ)_/¯.

There are all kinds of things that will break if implementations
casually stick OpenPGP user IDs into an e-mail header, not just
backslashes.  for example, commas are likely to cause a problem.
consider trying to mail two people whose OpenPGP certificates have these
User IDs:

    Lucy Hernandez, MD <>
    Chuck Wilson, Jr. <>

A simple concatenation with commas yields the disastrous:

To: Lucy Hernandez, MD <>om>, Chuck Wilson, Jr. <>

and DQUOTE is just as bad if not worse :)

So i have no problem with including backslash in the display name area.