[dane] [fyi] BCP proposal: regular expressions for Internet Mail identifiers

Sean Leonard <dev+ietf@seantek.com> Tue, 22 March 2016 22:57 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: dane@ietfa.amsl.com
Delivered-To: dane@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ED26D12D1B3; Tue, 22 Mar 2016 15:57:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a6SKXzfSKrjU; Tue, 22 Mar 2016 15:57:41 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 21FDE12D1A1; Tue, 22 Mar 2016 15:57:41 -0700 (PDT)
Received: from [192.168.123.7] (unknown [75.83.2.34]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id F31C8509B5; Tue, 22 Mar 2016 18:57:39 -0400 (EDT)
References: <56F1CD23.2040002@seantek.com>
To: "pkix@ietf.org" <pkix@ietf.org>, dane@ietf.org
From: Sean Leonard <dev+ietf@seantek.com>
X-Forwarded-Message-Id: <56F1CD23.2040002@seantek.com>
Message-ID: <56F1CE3D.6010609@seantek.com>
Date: Tue, 22 Mar 2016 15:59:09 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <56F1CD23.2040002@seantek.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/dane/TW7oTcogz3dmHBkU7y-hR25Js9I>
Subject: [dane] [fyi] BCP proposal: regular expressions for Internet Mail identifiers
X-BeenThere: dane@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: DNS-based Authentication of Named Entities <dane.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dane>, <mailto:dane-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dane/>
List-Post: <mailto:dane@ietf.org>
List-Help: <mailto:dane-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dane>, <mailto:dane-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Mar 2016 22:57:43 -0000

[+PKIX and DANE]
To those not on the lists below, this is an FYI about a new work 
proposal for regular expressions for email address and Message-ID 
identifiers. The original thread is just on ietf-smtp and dispatch, to 
avoid excessive cross-posting. The draft is 
draft-seantek-mail-regexen-00. Thank you.

-------- Forwarded Message --------
Subject: 	[dispatch] BCP proposal: regular expressions for Internet Mail 
identifiers
Date: 	Tue, 22 Mar 2016 15:54:27 -0700
To: 	ietf-smtp@ietf.org, dispatch@ietf.org



Greetings IETF-SMTP Gods and Denizens (and dispatch):

Over the winter I worked on a new Internet-Draft that I would like to
propose the IETF adopts: Regular Expressions for Internet Mail. The
draft focuses on two identifiers: email addresses and Message-IDs.

The purpose of this standard (proposed as a Best Current Practice) is to
have *IETF-vetted* expressions that implementers and non-mail standards
authors can plug-and-chug without futzing with trying to interpret 40
years of (occasionally conflicting and arcane) RFCs and implementation
lore. There are many non-mail systems out there (read: nearly every web
app, reservation system, customer database, etc. on Earth) that use or
consume email addresses as identifiers, and their inability to accept
the most obvious valid characters (like "+" or even "-"; I have used
apps that do not even accept "-") is a great source of interoperability
problems. (This document is also relevant to some other threads about
the nature of email address identifiers in security artifacts such as
certificates, PGP keys, and DNS records: anyone who is vouching for an
email address ought to be sure that they are recording something that
actually is a valid email address in the first place.) We should get
this right now, before Unicode/EAI makes interoperability issues 50000x
more expensive to correct.

The document is not meant to modify the mail standards, but merely to
reflect and track them as they are updated over time.

As a first draft, the document is in rough shape and has extensive notes
about issues that came up during R&D but have yet to be addressed.
Significant areas that need adequate treatment include:
1. the impact of Unicode (EAI) on identifiers.
2. handling domain names, which comprise 50% of an email address, but
perhaps 85% of the complexity when Unicode gets involved.
2. "deliverable email address" (complying with the modern SMTP
infrastructure) vs. other kinds of email addresses (Internet Message
Format, historic forms).
3. regular expression engines and grammars (i.e., which grammars to use,
which are widely used and produce uniform results).
4. efficiency of the regular expressions.
5. different expressions for validation (testing), part extraction
(capturing groups), decoding, encoding, and searching through text.
6. test vectors.

Hopefully the adoption of this work as an IETF item, coupled with input
from those with extensive experience

(Thanks to John Levine, Pete Resnick, and others for taking initial
questions and discussion on the topic.)
Discussion welcome. Thanks.

Sean


-------- Forwarded Message --------
Subject: 	New Version Notification for draft-seantek-mail-regexen-00.txt
Date: 	Mon, 21 Mar 2016 16:55:53 -0700
From: 	internet-drafts@ietf.org



A new version of I-D, draft-seantek-mail-regexen-00.txt
has been successfully submitted by Sean Leonard and posted to the
IETF repository.

Name:		draft-seantek-mail-regexen
Revision:	00
Title:		Regular Expressions for Internet Mail
Document date:	2016-03-21
Group:		Individual Submission
Pages:		24
URL:            https://www.ietf.org/internet-drafts/draft-seantek-mail-regexen-00.txt
Status:         https://datatracker.ietf.org/doc/draft-seantek-mail-regexen/
Htmlized:       https://tools.ietf.org/html/draft-seantek-mail-regexen-00


Abstract:
    Internet Mail identifiers are used ubiquitously throughout computing
    systems as building blocks of online identity. Unfortunately,
    incomplete understandings of the syntaxes of these identifiers has
    led to interoperability problems and poor user experiences. Many
    users use specific characters in their addresses that are not
    properly accepted on various systems. This document prescribes
    normative regular expression (regex) patterns for all Internet-
    connected systems to use when validating or parsing Internet Mail
    identifiers, with special attention to regular expressions that work
    with popular languages and platforms.


_______________________________________________
dispatch mailing list
dispatch@ietf.org
https://www.ietf.org/mailman/listinfo/dispatch