Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Tue, 17 September 2019 01:44 UTC

Return-Path: <dkg@fifthhorseman.net>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A50161200D7 for <openpgp@ietfa.amsl.com>; Mon, 16 Sep 2019 18:44:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=neutral reason="invalid (unsupported algorithm ed25519-sha256)" header.d=fifthhorseman.net header.b=XVA+NS1T; dkim=pass (2048-bit key) header.d=fifthhorseman.net header.b=ryr4E6zi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5gIefe8qnyAA for <openpgp@ietfa.amsl.com>; Mon, 16 Sep 2019 18:44:01 -0700 (PDT)
Received: from che.mayfirst.org (che.mayfirst.org [IPv6:2001:470:1:116::7]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 64B13120026 for <openpgp@ietf.org>; Mon, 16 Sep 2019 18:44:01 -0700 (PDT)
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019; t=1568684640; h=from : to : subject : in-reply-to : references : date : message-id : mime-version : content-type : from; bh=olu1JNME7KBfBnfTrHrtE+TOdjTfNE6X/HkkG7RpiIs=; b=XVA+NS1ThOQTW0Mkc05NQDMTgMkENsD0f9icbmlthRzMK82/yzl4dTzb 8kCm8+hkPY81UDTNp1KGyNCNBEsEAw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019rsa; t=1568684640; h=from : to : subject : in-reply-to : references : date : message-id : mime-version : content-type : from; bh=olu1JNME7KBfBnfTrHrtE+TOdjTfNE6X/HkkG7RpiIs=; b=ryr4E6ziBWAiowHgTuofu6qBM2owMpqRrI/pCI+nzUMAfn2mAW4NyU+T mxfMJqUjpBpxqUotlQnH2ZkvBz0yiZAU90aNLG+YYOUtp+tRyYvp5uAdMK 7+SW3QHowjbvaslWRuGl6A/w11j35MT2/sZtlXvQyPetLip2xXd1lB6lde zSWX46ZoSX6kwMXm4SVk8G/UD5U19B74Vdtq6w2TBZxRvzyllwtyVrgdzQ NThXWqnx3xUlyFE23B1TocCxxTBRamyKfBE4+6/XHhYE7VYQ+krxGtBkgE szZj0jOAfEP9QdeCusiuRUpgGzSNjGoRlzQCOalHq0ZOwMvFAutgXg==
Received: from fifthhorseman.net (unknown [38.109.115.130]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 2EE6FF9A5 for <openpgp@ietf.org>; Mon, 16 Sep 2019 21:44:00 -0400 (EDT)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 6919220373; Mon, 16 Sep 2019 21:43:55 -0400 (EDT)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: openpgp@ietf.org
In-Reply-To: <87woe7zx7o.fsf@fifthhorseman.net>
References: <87woe7zx7o.fsf@fifthhorseman.net>
Autocrypt: addr=dkg@fifthhorseman.net; prefer-encrypt=mutual; keydata= mDMEXEK/AhYJKwYBBAHaRw8BAQdAr/gSROcn+6m8ijTN0DV9AahoHGafy52RRkhCZVwxhEe0K0Rh bmllbCBLYWhuIEdpbGxtb3IgPGRrZ0BmaWZ0aGhvcnNlbWFuLm5ldD6ImQQTFggAQQIbAQUJA8Jn AAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgBYhBMS8Lds4zOlkhevpwvIGkReQOOXGBQJcQsbzAhkB AAoJEPIGkReQOOXG4fkBAO1joRxqAZY57PjdzGieXLpluk9RkWa3ufkt3YUVEpH/AP9c+pgIxtyW +FwMQRjlqljuj8amdN4zuEqaCy4hhz/1DbgzBFxCv4sWCSsGAQQB2kcPAQEHQERSZxSPmgtdw6nN u7uxY7bzb9TnPrGAOp9kClBLRwGfiPUEGBYIACYWIQTEvC3bOMzpZIXr6cLyBpEXkDjlxgUCXEK/ iwIbAgUJAeEzgACBCRDyBpEXkDjlxnYgBBkWCAAdFiEEyQ5tNiAKG5IqFQnndhgZZSmuX/gFAlxC v4sACgkQdhgZZSmuX/iVWgD/fCU4ONzgy8w8UCHGmrmIZfDvdhg512NIBfx+Mz9ls5kA/Rq97vz4 z48MFuBdCuu0W/fVqVjnY7LN5n+CQJwGC0MIA7QA/RyY7Sz2gFIOcrns0RpoHr+3WI+won3xCD8+ sVXSHZvCAP98HCjDnw/b0lGuCR7coTXKLIM44/LFWgXAdZjm1wjODbg4BFxCv50SCisGAQQBl1UB BQEBB0BG4iXnHX/fs35NWKMWQTQoRI7oiAUt0wJHFFJbomxXbAMBCAeIfgQYFggAJhYhBMS8Lds4 zOlkhevpwvIGkReQOOXGBQJcQr+dAhsMBQkB4TOAAAoJEPIGkReQOOXGe/cBAPlek5d9xzcXUn/D kY6jKmxe26CTws3ZkbK6Aa5Ey/qKAP0VuPQSCRxA7RKfcB/XrEphfUFkraL06Xn/xGwJ+D0hCw==
Date: Mon, 16 Sep 2019 21:43:54 -0400
Message-ID: <87muf3zoh1.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/KZulSRMD4xjAUp60zqaENEmRBNQ>
Subject: Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Sep 2019 01:44:03 -0000

As usual, when you try to implement something, you find the missing
pieces. :P

On Mon 2019-09-16 18:35:07 -0400, Daniel Kahn Gillmor wrote:
>     pgp-uid-prefix-char    = atext / specials

The above line appears in both proposals, but it contains a mistake.  It
should read instead:

     pgp-uid-prefix-char    = atext / specials / SPACE

The following python3 code implements proposal 2 using python's built-in
re module (i have not tested it with python2, given python2's clunky
unicode support).

    import re

    specials = r'[()<>\[\]:;@\\,."]'
    atext = "[-A-Za-z0-9!#$%&'*+/=?^_`{|}~\x80-\U0010ffff]"
    dot_atom_text = atext + r"+(?:\." + atext + "+)*"
    pgp_addr_spec = dot_atom_text + "@" + dot_atom_text
    pgp_uid_prefix_char = "(?:" + atext + "|" + specials + "| )"
    addr_spec_raw = "(?P<addr_spec_raw>" + pgp_addr_spec + ")"
    addr_spec_wrapped = pgp_uid_prefix_char + "*<(?P<addr_spec_wrapped>" + pgp_addr_spec + ")>"
    pgp_uid_convention = "^(?:" + addr_spec_raw + "|" + addr_spec_wrapped + ")$"

    pgp_uid_convention_re = re.compile(pgp_uid_convention, re.UNICODE)

    m = pgp_uid_convention_re.search(uid)

If there's a resultant match object, then the pgp-addr-spec can be found
in either m["addr_spec_raw"] or m["addr_spec_wrapped"], depending on
whether there are angle-brackets involved or not.

Note the atext definition is the extended form of atext.

anyway, i hope this clarification is useful.

     --dkg

PS if the above is useful, anyone should feel free to reuse any of the
   above code in any context under any license.  If you do so, and you
   want to provide an attribution so that complaints can be directed
   properly, that's fine, but i have no need for credit if you don't
   want to bother.