Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Wed, 06 November 2019 17:52 UTC

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: "Neal H. Walfield" <neal@walfield.org>
Cc: openpgp@ietf.org
In-Reply-To: <87v9rydk9s.wl-neal@walfield.org>
References: <87woe7zx7o.fsf@fifthhorseman.net> <87v9rydk9s.wl-neal@walfield.org>
Autocrypt: addr=dkg@fifthhorseman.net; prefer-encrypt=mutual; keydata= mDMEXEK/AhYJKwYBBAHaRw8BAQdAr/gSROcn+6m8ijTN0DV9AahoHGafy52RRkhCZVwxhEe0K0Rh bmllbCBLYWhuIEdpbGxtb3IgPGRrZ0BmaWZ0aGhvcnNlbWFuLm5ldD6ImQQTFggAQQIbAQUJA8Jn AAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgBYhBMS8Lds4zOlkhevpwvIGkReQOOXGBQJcQsbzAhkB AAoJEPIGkReQOOXG4fkBAO1joRxqAZY57PjdzGieXLpluk9RkWa3ufkt3YUVEpH/AP9c+pgIxtyW +FwMQRjlqljuj8amdN4zuEqaCy4hhz/1DbgzBFxCv4sWCSsGAQQB2kcPAQEHQERSZxSPmgtdw6nN u7uxY7bzb9TnPrGAOp9kClBLRwGfiPUEGBYIACYWIQTEvC3bOMzpZIXr6cLyBpEXkDjlxgUCXEK/ iwIbAgUJAeEzgACBCRDyBpEXkDjlxnYgBBkWCAAdFiEEyQ5tNiAKG5IqFQnndhgZZSmuX/gFAlxC v4sACgkQdhgZZSmuX/iVWgD/fCU4ONzgy8w8UCHGmrmIZfDvdhg512NIBfx+Mz9ls5kA/Rq97vz4 z48MFuBdCuu0W/fVqVjnY7LN5n+CQJwGC0MIA7QA/RyY7Sz2gFIOcrns0RpoHr+3WI+won3xCD8+ sVXSHZvCAP98HCjDnw/b0lGuCR7coTXKLIM44/LFWgXAdZjm1wjODbg4BFxCv50SCisGAQQBl1UB BQEBB0BG4iXnHX/fs35NWKMWQTQoRI7oiAUt0wJHFFJbomxXbAMBCAeIfgQYFggAJhYhBMS8Lds4 zOlkhevpwvIGkReQOOXGBQJcQr+dAhsMBQkB4TOAAAoJEPIGkReQOOXGe/cBAPlek5d9xzcXUn/D kY6jKmxe26CTws3ZkbK6Aa5Ey/qKAP0VuPQSCRxA7RKfcB/XrEphfUFkraL06Xn/xGwJ+D0hCw==
Date: Wed, 06 Nov 2019 01:37:14 -0500
Message-ID: <878soto6hx.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha256"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/guulh9vGTsKM84ww4VV2rsgh9Rs>
Subject: Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)
Precedence: list

Hi Neal--

Thanks for this thoughtful writeup!

On Tue 2019-11-05 23:35:11 +0100, Neal H. Walfield wrote:

> Beyond being more fleshed out, this grammar is different from the
> grammar in dkg's second proposal in a few ways.
>
> First, it matches comments.  dkg made this a non-goal.  Given that
> people who add comments intend them as comments and not as part of
> their name, it seems reasonable to me to not display comments in
> places where only the user's name is desired.  And, since it turns out
> that matching non-nested comments is relatively straightforward, why
> not?  Note: doing this might actually help deprecate comments, because
> they won't be shown as often.

User IDs are full UTF-8 strings.  The idea that any part of that string
would be hidden from the user is pretty disturbing to me.  Consider the
situation where someone is certifying a user ID on an OpenPGP
certificate.  If the comment is hidden, do they know what identity
assertion they're making?

I'd much rather have comments be deprecated *because they are weird and
show up in places where you'd think your name should go* rather than
have them be some vestigial thing that people don't even notice any
longer.

I would recommend dropping the comment from your grammar and letting the
"name" part subsume it, when you're splitting out e-mail address from
the rest of the user ID.

Furthermore, because you've allowed "(" and ")" in atext-specials, it
looks to me like your proposed grammar is ambiguous:

    bob (joe) <bob@example.net>

is either:

    name: "bob (joe)"
    comment: None
    addr-spec: "bob@example.net"

or:

    name: "bob"
    comment: "joe"
    addr-spec: "bob@example.net"

I don't think this is helpful to anyone.

> The grammar more carefully handles whitespace.  It ignores whitespace
> at the beginning of the User ID (this is what motivates the
> name-char-start production) and between the individual components in
> the pgp-uid-convention production.  As is, the grammar only ignores
> the 0x20 space character.  We may also want to include the tab
> character, unicode's NO-BREAK SPACE (U+00A0) character and its
> IDEOGRAPHIC SPACE (U+3000) character for thoroughness.  But, since
> software will normally concatenate the individual components, just
> recognizing the ASCII space character here is probably fine.  Whatever
> the case, I think we can safely ignore the rest of unicode's
> whitespace characters:
>
>   https://en.wikipedia.org/wiki/Whitespace_character

I'm fine with being judicious about selecting whitespace characters.  In
addition to tab (U+0009, ascii "HT"), i note that you've declined to
include U+000A and U+000D (ascii "LF" and "CR") in the grammar at all.

I like that kind of opinionated decision, as unprintable symbols like
this are likely to be problematic in many ways (hard for users to
distinguish at least!)

I also think that whitespace at the beginning of a user ID is asking for
trouble, and would be happy with a grammar that considers that user ID
non-conventional.  Is there a use case for leading whitespace in a user
ID?

> My pgp-uid-convention production also matches user ids without email
> addresses, e.g., "Daniel Kahn Gillmor".  This is convenient.  Instead
> of having to figure out why parsing failed (is it not valid UTF-8? is
> it just missing an addr-spec?), we explicitly cover this common
> pattern in the grammar.  I think this will significantly simplify code
> that uses this interface: if there is an error, then the code can just
> assume the User ID is trash and can be ignored.

I should be clear that i intended my earlier proposal specifically to
match OpenPGP User ID conventions *that have an e-mail address in
them*.  There are indeed other User ID conventions (like "Daniel Kahn
Gillmor", or "ssh://foo.example") that aren't covered by this, and i
thought i would be doing folks a favor by focusing on the e-mail address
side of things specifically.  My thought was that common interfaces
would allow for matching against a User ID that has an e-mail address,
and then they would have other matchers for other common conventions
that they could try applying if this convention didn't match.

This is probably an implementation detail, though.

> In RFC 2822, "specials" are only allowed in a display name if they are
> quoted.  dkg removes this requirements.  I think this is mostly
> sensible, but it means that we can have User IDs like:
> "<foo@example.org> <foo@example.org>" where the first
> <foo@example.org> is the display name and the second is the addr-spec.
> I think we should exclude angle brackets from the display name.  In my
> grammar, I have an "atext-specials" which is just RFC 2822 specials
> without the angle brackets.

I totally agree with this constraint.  If you're doing away with
comments (as i recommend above) then you would have to prohibit angle
brackets in commas too, which seems fine to me.

Even if you decide to go ahead with splitting out comments, I would go
so far as to ban them in comments too.  is there any plausible reason
for including angle brackets in a comment?  Simplify simplify :)

> I'm a bit concerned about allowing the backslash character: with this
> grammar, it is just a normal character, but for an RFC 2822 parser,
> it's an escape character.  Since User IDs may be used in contexts
> where RFC 2822 things are expected, we should be careful.  But, I fear
> that if we reject it, we'll end up gratuitiously rejecting some
> emojis.  ¯\_(ツ)_/¯.

There are all kinds of things that will break if implementations
casually stick OpenPGP user IDs into an e-mail header, not just
backslashes.  for example, commas are likely to cause a problem.
consider trying to mail two people whose OpenPGP certificates have these
User IDs:

    Lucy Hernandez, MD <lucy@example.com>
    Chuck Wilson, Jr. <chuck@example.net>

A simple concatenation with commas yields the disastrous:

To: Lucy Hernandez, MD <lucy@example.com>, Chuck Wilson, Jr. <chuck@example.net>

and DQUOTE is just as bad if not worse :)

So i have no problem with including backslash in the display name area.

    --dkg

Attachment: signature.asc

[openpgp] User ID conventions (it's not really a … Daniel Kahn Gillmor
Re: [openpgp] User ID conventions (it's not reall… Daniel Kahn Gillmor
Re: [openpgp] User ID conventions (it's not reall… Michael Richardson
Re: [openpgp] User ID conventions (it's not reall… Jon Callas
Re: [openpgp] User ID conventions (it's not reall… Daniel Kahn Gillmor
Re: [openpgp] User ID conventions (it's not reall… Daniel Kahn Gillmor
Re: [openpgp] User ID conventions (it's not reall… Neal H. Walfield
Re: [openpgp] User ID conventions (it's not reall… brian m. carlson
Re: [openpgp] User ID conventions (it's not reall… Neal H. Walfield
Re: [openpgp] User ID conventions (it's not reall… Neal H. Walfield
Re: [openpgp] User ID conventions (it's not reall… Daniel Kahn Gillmor
Re: [openpgp] User ID conventions (it's not reall… brian m. carlson

Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)

Attachment: signature.asc