Re: [dispatch] [ietf-smtp] BCP proposal: regular expressions for Internet Mail identifiers
Sean Leonard <dev+ietf@seantek.com> Tue, 29 March 2016 18:40 UTC
Return-Path: <dev+ietf@seantek.com>
X-Original-To: dispatch@ietfa.amsl.com
Delivered-To: dispatch@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C3B5912DFFA; Tue, 29 Mar 2016 11:40:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.602
X-Spam-Level:
X-Spam-Status: No, score=-2.602 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2qBaF6xuek2X; Tue, 29 Mar 2016 11:40:22 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5353E12E0EC; Tue, 29 Mar 2016 11:10:35 -0700 (PDT)
Received: from [192.168.123.7] (unknown [75.83.2.34]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id BE0CE509B6; Tue, 29 Mar 2016 14:10:33 -0400 (EDT)
To: John C Klensin <john-ietf@jck.com>, dispatch@ietf.org, ietf-smtp <ietf-smtp@ietf.org>
References: <87a8lp10i2.fsf@hobgoblin.ariadne.com> <56F30A52.50305@seantek.com> <CAL0qLwagOOByZXsLcRN9CC0aARSGSh9kCGoO7hSMUhSdkHtssw@mail.gmail.com> <0AC7C26B5A969CA50015ACFB@JcK-HP8200.jck.com>
From: Sean Leonard <dev+ietf@seantek.com>
Message-ID: <56FAC574.1010503@seantek.com>
Date: Tue, 29 Mar 2016 11:12:04 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1
MIME-Version: 1.0
In-Reply-To: <0AC7C26B5A969CA50015ACFB@JcK-HP8200.jck.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/dispatch/o08827YNRLMsz806okTChIHrDkI>
Subject: Re: [dispatch] [ietf-smtp] BCP proposal: regular expressions for Internet Mail identifiers
X-BeenThere: dispatch@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: DISPATCH Working Group Mail List <dispatch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dispatch>, <mailto:dispatch-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dispatch/>
List-Post: <mailto:dispatch@ietf.org>
List-Help: <mailto:dispatch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dispatch>, <mailto:dispatch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Mar 2016 18:40:25 -0000
On 3/28/2016 6:18 AM, John C Klensin wrote: > > --On Sunday, March 27, 2016 22:41 -0700 "Murray S. Kucherawy" > <superuser@gmail.com> wrote: > >> ... >> And if what you're really producing is regular expressions >> that match anything that the ABNFs in the mail RFCs will >> legitimately produce, you might want to do a standards track >> document that explicitly updates those documents where those >> ABNFs are listed. > Murray, > > That captures my concern about this effort. Based on prior > experience (including RFC RFC 3696 and even the effort to make > RFCs 2821 and 5321 internally consistent), it is _really_ easy > to express a requirement in two different ways and have them be > _almost_ the same. That is a problem because different people > will read different docs. > > It seems to me that it would be much better to either do this as > an Informational document that is clearly identified as Sean's > opinion about regular expressions that impose the same > requirements as 5321/5322 but that those continue to control or > to do a standards-track document that contains both the regular > expressions and ABNF, makes clear which one is primary, and > updates the syntax requirements of the base specs. As Dale expressed (thanks!), "BCPs are *standards* not for protocols but for *things that people do*. So in regard to [draft-seantek-mail-regexen], the "thing that people do" is "write code that validates e-mail addresses for further processing". And the point [...] is that people need to write correct code for validating e-mail addresses." Sean's opinion about regular expressions for Mail Identifiers (email addresses, Message-IDs) is not interesting. If my opinion were all that interesting, I would just publish it on Stack Overflow and call it a day (see SO Questions [46155] and [201323]). What is interesting is the IETF's vetted and (rough)-consensus view on the topic. This topic is a favorite pet project of programmers. It tends to go: 1) "oh, I know what an email address is! It has dots and alphas and maybe a hyphen" (WRONG), 2) "oh, I'll just read RFC 5322 and roll my own" (also wrong, but in more subtle ways...for one, RFC 5322 has distinct syntax from RFC 5321), or 3) "I'm lazy, let's just copy whatever regex shows up on Google first" (pragmatic, usually not right). Wouldn't it be better if programmers could uniformly go: 4) "Given my email address recognition problem, I'll just copy the regex from BCP xyz", rather than spending dozens if not hundreds of hours pouring over email standards documents and testing them against millions of arcane email address combinations. The current draft-seantek-mail-regexen is pretty clear (currently) that it does not attempt to change the Mail standards. If folks want to change those documents, may I suggest a separate Standards Track document that does exactly that. Just because a document is labeled "BCP" (or, for that matter, "Standards Track") does not mean that every last single statement in the document is normative and error-free. Otherwise, the RFC 3280 and RFC 5280 PKIX standards that say that you are supposed to compare an entire email address case-insensitively (Section 4.1.2.6 of RFC 3280, Section 4.2.1.6 of RFC 5280) would have overridden RFCs 5322, 5321, 2822, RFC 2821, etc. etc. We have an errata process. Basically if the regular expressions are wrong, they need to be made right. One can complain about problems, or one can fix them. Turns out that regular expressions and ABNF are homomorphic under certain conditions. As shown in draft-seantek-mail-regexen, "deliverable email addresses" (RFC 5321 + RFC 6531) certainly fall in that definition, as they can be expressed in a regular language (i.e., computed with a finite state automaton). Therefore, translating between the two is basically computationally verifiable. The results may not look pretty but they will work. Perhaps a bigger problem is one's view as to how normative ABNF is in the context of IETF standards documents. It is possible to have ABNF that says somename = *(ALPHA / DIGIT) but then have normative text that says that <somename> is limited to 31 characters and MUST start with an alphabetic character. Moreover, some ABNF (RFC 5321 / RFC 5322 in particular) have "obsolete syntax"; whether to admit such syntax is a highly context-sensitive engineering decision. Addressing all of these points requires rubbing more than two brain cells together. [46155]: http://stackoverflow.com/questions/46155/validate-email-address-in-javascript [201323]: http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address > > Perhaps a BCP that recommends use of strings that are clearly a > proper subset of what the standard allows would be ok, but it > needs to be frightfully clear that it is a recommended subset, > not a requirement. I am not really interested in subsets, except those subsets driven by the standards themselves. (ASCII-only vs. EAI is a reasonable subset, provided that both expressions are provided. I would rather do EAI-only but we can be pragmatic about that.) Best regards, Sean
- [dispatch] BCP proposal: regular expressions for … Sean Leonard
- Re: [dispatch] BCP proposal: regular expressions … Adam Roach
- Re: [dispatch] BCP proposal: regular expressions … Dale R. Worley
- Re: [dispatch] BCP proposal: regular expressions … Ben Campbell
- Re: [dispatch] BCP proposal: regular expressions … Paul Kyzivat
- Re: [dispatch] BCP proposal: regular expressions … Sean Leonard
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Ted Lemon
- Re: [dispatch] BCP proposal: regular expressions … Ben Campbell
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Alexey Melnikov
- Re: [dispatch] BCP proposal: regular expressions … Murray S. Kucherawy
- Re: [dispatch] BCP proposal: regular expressions … Murray S. Kucherawy
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … John C Klensin
- Re: [dispatch] BCP proposal: regular expressions … Sean Leonard
- Re: [dispatch] BCP proposal: regular expressions … Murray S. Kucherawy
- Re: [dispatch] BCP proposal: regular expressions … Dale R. Worley
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … John C Klensin
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Sean Leonard
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Sean Leonard
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … John C Klensin
- Re: [dispatch] BCP proposal: regular expressions … Martin J. Dürst
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Valdis.Kletnieks
- Re: [dispatch] BCP proposal: regular expressions … Arnt Gulbrandsen
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Sean Leonard
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Arnt Gulbrandsen
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Ned Freed
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Sean Leonard
- Re: [dispatch] [ietf-smtp] BCP proposal: regular … Dale R. Worley