Re: [ietf-smtp] Address transformations

Sean Leonard <dev+ietf@seantek.com> Mon, 01 August 2016 22:11 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: ietf-smtp@ietfa.amsl.com
Delivered-To: ietf-smtp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2826A12D922 for <ietf-smtp@ietfa.amsl.com>; Mon, 1 Aug 2016 15:11:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.591
X-Spam-Level:
X-Spam-Status: No, score=-2.591 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iFOJbtH-ka7F for <ietf-smtp@ietfa.amsl.com>; Mon, 1 Aug 2016 15:11:06 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 15AEF12D8A9 for <ietf-smtp@ietf.org>; Mon, 1 Aug 2016 15:11:06 -0700 (PDT)
Received: from [192.168.123.110] (unknown [75.83.2.34]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 4F735509B5; Mon, 1 Aug 2016 18:11:04 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Sean Leonard <dev+ietf@seantek.com>
In-Reply-To: <C89024BE60C83A703228B419@JcK-HP8200>
Date: Mon, 01 Aug 2016 15:11:02 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <5F2C40DB-38FC-409A-971D-A710DCF5A51E@seantek.com>
References: <20160731133547.45914.qmail@ary.lan> <96F384E1DAAADB3587A45240@JcK-HP8200> <alpine.OSX.2.11.1607311215550.79626@ary.lan> <C4C7CFAD81036E3F1095F39F@JcK-HP8200> <53AEDA75-99FC-491A-8B1B-028A8B773D8B@seantek.com> <C89024BE60C83A703228B419@JcK-HP8200>
To: John C Klensin <john-ietf@jck.com>
X-Mailer: Apple Mail (2.3124)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-smtp/-bcLs7jJ4JdgR57ECTktilBWVUM>
Cc: Tony L Hansen <tony@att.com>, ietf-smtp@ietf.org, Joe Hildebrand <jhildebr@cisco.com>
Subject: Re: [ietf-smtp] Address transformations
X-BeenThere: ietf-smtp@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Discussion of issues related to Simple Mail Transfer Protocol \(SMTP\) \[RFC 821, RFC 2821, RFC 5321\]" <ietf-smtp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-smtp/>
List-Post: <mailto:ietf-smtp@ietf.org>
List-Help: <mailto:ietf-smtp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Aug 2016 22:11:08 -0000

> On Aug 1, 2016, at 6:08 AM, John C Klensin <john-ietf@jck.com> wrote:
> 
> --On Sunday, July 31, 2016 19:25 -0700 Sean Leonard
> <dev+ietf@seantek.com> wrote:
> 
>> (RFC 5321)
>> All this stuff brings up a very interesting point about
>> deliverable email addresses in draft-seantek-mail-regexen:
>> 
>> What are the domain part limitations on deliverable email
>> addresses, that are not encapsulated by the RFC 5321 ABNF?
>> ...
>> The string "1.2.3.4" is a valid domain production.
>> However, it is not (or *should not*) be considered a
>> deliverable email address, because when passed to the famous
>> function "gethostbyname", that function will certainly
>> return 1.2.3.4; it will not perform a lookup of the domain
>> record with a top-level label of "4". 
>> ...
> 
> The "fame" of gethostbyname doesn't make it a standard or any
> type.  In fact, it has been widely suggested on IETF and other
> lists that it is broken.

Actually, getaddrinfo (the successor to gethostbyname) *is* in the POSIX standard: IEEE Std 1003.1, 2013 Edition. <http://pubs.opengroup.org/onlinepubs/9699919799/> And it does specifically call out support for IPv4 and IPv6 address literals.

That being said, I do not wish to devolve into a standards-baiting discussion about that API call. I am interested however how mail systems actually implement the parsing of the domain production. If you give any number of popular, widely implemented SMTP servers RCPT TO:<foo@11.22.33.44>, will it query DNS for “11.22.33.44”, or will it attempt to contact the server at the IPv4 address 11.22.33.44 ? Implementers can answer this question...

> 
>> ... 
>> I have tried "foo@1.2.3.256" -- Windows and Unix/Linux
>> stacks will try to query the DNS for that string and not parse
>> it as IPv4. So, it is syntactically a valid, deliverable email
>> address.
> 
> It is syntactically a valid email address.  It is unlikely to be
> deliverable.  And what Windows and Unix/Linux do is not
> sufficient to justify "So".
> 
>> But foo@411 will get IPv4-parsed to 0.0.1.155, which
>> means that "411" should not be considered a valid domain
>> name for a deliverable email address.
> 
> Actually, if foo@411 is treated as containing an address
> literal, rather than a domain name, that is an error
> (independent of what were done with the address literal once
> that determination was made.  It is, of course, not a valid
> domain name either, and that is independent of whether it is
> invalid because it is all digits, because it is numeric and at
> the root, or simply because it is not a label that appears in
> the root zone.  Of course, it could also be a local name that
> should really be interpreted as foo@411.local,
> foo@411.example.com, etc., depending on local configuration
> setup in, e.g., the submission server.

Ok. However, Section 2.3.5 of RFC 5321 says:
   The domain name, as described in this document and in RFC 1035 [2],
   is the entire, fully-qualified name (often referred to as an "FQDN").
   A domain name that is not in FQDN form is no more than a local alias.
   Local aliases MUST NOT appear in any SMTP transaction.

The ABNF of RFC 5321 shows that you cannot have a single bare label
with a trailing dot. It seems to me that a single bare label
indicates that the item has to be a FQDN, and never a local alias;
therefore, it must not be all-numeric.


> 
>> ...
> 
> More generally, it seems to me that, if you take the questions
> up a level, the either the answers to your several questions
> become either clear or the questions are trivial.
> 
> First, my recollection is that there was a strong belief when
> RFC 821 was written that the distinction between "address
> literal" and "domain" in the domain part was to be made by
> obvious, no-lookahead, syntax, not by either heuristics or "is
> this a number" tests.  So, as far as SMTP is concerned,
> foo@[1.2.3.4] is a mailbox containing an address literal while
> foo@1.2.3.4 is a mailbox containing a domain.  I/we carried the
> view forward when we had to deal with IPv6 addresses.  That led
> to the decision to explicitly designate the address family
> associated with an address literal rather than leaving it to
> heuristics on, e.g., what syntax is used in the address.
> Whether something that SMTP identifies lexically as an address
> literal is valid or not (and whether a particular SMTP client or
> server knows what to do with it) is largely not an SMTP problem.
> Similarly, whether something that SMTP lexically identifies as a
> domain is valid (or can ever be valid) is another question and
> not really an SMTP issue except that...
> 
> 	(i)  There is a long history of SMTP (and other email)
> 	clients (including submission servers) trying to do
> 	sanity checks on putative domain names.  Some of that
> 	history is rooted in systems that were not directly
> 	connected to the Internet, and hence unable to do DNS
> 	lookups themselves, wanting to eliminate (reject,
> 	bounce, or, in the case of submission servers and
> 	gateways, fix) messages with bogus mailbox addresses as
> 	early in the transmission processes as possible.  At one
> 	time, it was very common for SMTP clients (and even some
> 	MUAs) to have lists of valid TLD labels or valid TLD
> 	labels of other than two characters in length and/or for
> 	them to "know" that there was only one four-letter TLD
> 	label and no TLD label longer than four characters long.
> 	Some of those tests became much more fragile when ICANN
> 	started introducing gTLDs at a relatively high rate
> 	including many with labels other than three characters
> 	in length and were further complicated by IDN TLDs and
> 	are usually considered poor practices, especially for
> 	hosts directly connected to the Internet, today.
> 	
> 	(ii) RFC 1123 (in a section not affected by the
> 	"replacing the mail transport..." comment in 5321) says
> 	that TLD names must be alphabetic.  At least so far and
> 	with the exception for the form of IDNA labels, ICANN
> 	has not violated that rule.  So, 1.2.3.123 is still
> 	clearly invalid as a domain name.


Ok, I found the text: Section 2.1 of RFC 1123. Thanks.

This can be implemented by checking that the final sub-domain is not
all-numeric:

((?&sub_domain)\.)+(?![0-9]+[^\-A-Za-z])(?&sub_domain)

This will permit the following:

foo@1.2.3.0xe

which getaddrinfo will consume as an IPv4 address literal.
However, as long as we at the IETF are okay with that (or don’t care),
then we are all good.

> 
> In both of the above cases, the issues are DNS ones, or ones in
> the interface between an SMTP system and the DNS, not SMTP
> issues.

Yes. The patterns for *deliverable* email addresses
are intended to incorporate DNS rules. Generic email addresses
(RFC 5322) do not need to be deliverable; therefore, they
do not necessarily need to incorporate DNS rules.

There is probably a way to express the “last sub-domain cannot be all-numeric” in ABNF,
and therefore, to translate that to an integrated regular expression (as opposed to a regular expression with a negative-lookahead). I am not sure if it would be more efficient.

I think it would be:

Domain = *(sub-domain ".") final-sub-domain

sub-domain     = Let-dig [Ldh-str]

AH = ALPHA / "-"

final-sub-domain = ALPHA [Ldh-str] /
             DIGIT *(ALPHA / DIGIT / "-") ALPHA /
             DIGIT 1*(*DIGIT AH *DIGIT) DIGIT

> 
> Second, coming back to address literals, if SMTP identifies
> something as an address literal (i.e., the string "@[" appears)
> then, while syntax is given in 5321 for IPv4 address literals,
> all of the others (present and future) are ultimately 
>   Tag ":" String
> 
> For IPv6, the Tag is "IPv6" and the string specified by
> standards for presentation forms of IPv6 addresses, not SMTP.
> For anything else, the Tag is required to be standardized and
> that standard or the related one is expected to specify the
> syntax of the String (but there are few character restrictions
> on it).    It was, and is, quite intentional that, if some set
> of systems, by prior agreement, invented the "foozy" address
> format and associated protocols but did not standardize it
> through the IETF, 
> 
>    usermailbox@[Foozy:666]
> could work perfectly well within their collective environment
> without causing interoperability problems with
> standards-conforming systems.

I want to be sure about this. If someone enters an email address of

user@[23.34.45.56]

into an input field (e.g., web browser or mail client) that *purports to accept all deliverable email addresses*, shall it be accepted as valid input, or not?

Section 2.3.4 of RFC 5321 says:
   Hosts are known by names
   (see the next section); they SHOULD NOT be identified by numerical
   addresses, i.e., by address literals as described in Section 4.1.2.

Well that is is a SHOULD NOT, not a MUST NOT. I just want to be sure that we are collectively sure that an input field that *purports to accept all deliverable email addresses*, *needs* to accept address-literal syntax.

If the answer is yes, it needs to accept address-literal syntax, shall the input field accept General-address-literal? Shall it accept IPv6-address-literal (which, by the way, will always get matched as General-address-literal, and therefore, is superfluous if we permit General-address-literal)?

My current position is that if address-literal syntax is needed, it should only accept IPv4 and IPv6 syntaxes; general address literals are not “deliverable” using the modern Internet & SMTP infrastructure. If a new address literal becomes globally routable on the public Internet and is standardized by the IETF, then the regular expression should be updated (along with billions of instances of email and Internet software). I do not have a position on whether address-literal syntax itself is needed, but I am a little skeptical that such an email address fits a working engineering definition of “deliverable” using the modern Internet & SMTP infrastructure.

> 
> Finally, while the email specs were quite careful to make a
> lexical distinction between a domain and an address literal in a
> mailbox name, USLs/URIs did not follow that lead and appear to
> rely on heuristics or a subtle understanding of the strings
> involved instead.  That is one of many cases in which an
> identifier that looks more or less like an email address may not
> actually be an email address and certainly doesn't imply
> constraints on email addresses or email systems.  If you want to
> discuss such identifiers, this is almost certainly not the right
> list.

Yep, not talking about URIs here.

Regards,

Sean