Re: [ietf-smtp] Address transformations

John C Klensin <john-ietf@jck.com> Mon, 01 August 2016 13:22 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ietf-smtp@ietfa.amsl.com
Delivered-To: ietf-smtp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0E4A012DC6A for <ietf-smtp@ietfa.amsl.com>; Mon, 1 Aug 2016 06:22:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.177
X-Spam-Level:
X-Spam-Status: No, score=-3.177 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-1.287, T_FILL_THIS_FORM_SHORT=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7l6dymZNdcSx for <ietf-smtp@ietfa.amsl.com>; Mon, 1 Aug 2016 06:22:13 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BAB8F12DC6D for <ietf-smtp@ietf.org>; Mon, 1 Aug 2016 06:08:46 -0700 (PDT)
Received: from [198.252.137.10] (helo=JcK-HP8200) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1bUCxZ-0001lO-Dl; Mon, 01 Aug 2016 09:08:45 -0400
Date: Mon, 01 Aug 2016 09:08:40 -0400
From: John C Klensin <john-ietf@jck.com>
To: Sean Leonard <dev+ietf@seantek.com>, ietf-smtp@ietf.org
Message-ID: <C89024BE60C83A703228B419@JcK-HP8200>
In-Reply-To: <53AEDA75-99FC-491A-8B1B-028A8B773D8B@seantek.com>
References: <20160731133547.45914.qmail@ary.lan> <96F384E1DAAADB3587A45240@JcK-HP8200> <alpine.OSX.2.11.1607311215550.79626@ary.lan> <C4C7CFAD81036E3F1095F39F@JcK-HP8200> <53AEDA75-99FC-491A-8B1B-028A8B773D8B@seantek.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-smtp/-V40mEM8mjQxgDJeQAIbsdIarAE>
Subject: Re: [ietf-smtp] Address transformations
X-BeenThere: ietf-smtp@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Discussion of issues related to Simple Mail Transfer Protocol \(SMTP\) \[RFC 821, RFC 2821, RFC 5321\]" <ietf-smtp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-smtp/>
List-Post: <mailto:ietf-smtp@ietf.org>
List-Help: <mailto:ietf-smtp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Aug 2016 13:22:17 -0000

--On Sunday, July 31, 2016 19:25 -0700 Sean Leonard
<dev+ietf@seantek.com> wrote:

> (RFC 5321)
> All this stuff brings up a very interesting point about
> deliverable email addresses in draft-seantek-mail-regexen:
> 
> What are the domain part limitations on deliverable email
> addresses, that are not encapsulated by the RFC 5321 ABNF?
>...
> The string "1.2.3.4" is a valid domain production.
> However, it is not (or *should not*) be considered a
> deliverable email address, because when passed to the famous
> function "gethostbyname", that function will certainly
> return 1.2.3.4; it will not perform a lookup of the domain
> record with a top-level label of "4". 
>...

The "fame" of gethostbyname doesn't make it a standard or any
type.  In fact, it has been widely suggested on IETF and other
lists that it is broken.

>... 
> I have tried "foo@1.2.3.256" -- Windows and Unix/Linux
> stacks will try to query the DNS for that string and not parse
> it as IPv4. So, it is syntactically a valid, deliverable email
> address.

It is syntactically a valid email address.  It is unlikely to be
deliverable.  And what Windows and Unix/Linux do is not
sufficient to justify "So".

> But foo@411 will get IPv4-parsed to 0.0.1.155, which
> means that "411" should not be considered a valid domain
> name for a deliverable email address.

Actually, if foo@411 is treated as containing an address
literal, rather than a domain name, that is an error
(independent of what were done with the address literal once
that determination was made.  It is, of course, not a valid
domain name either, and that is independent of whether it is
invalid because it is all digits, because it is numeric and at
the root, or simply because it is not a label that appears in
the root zone.  Of course, it could also be a local name that
should really be interpreted as foo@411.local,
foo@411.example.com, etc., depending on local configuration
setup in, e.g., the submission server.

>...

More generally, it seems to me that, if you take the questions
up a level, the either the answers to your several questions
become either clear or the questions are trivial.

First, my recollection is that there was a strong belief when
RFC 821 was written that the distinction between "address
literal" and "domain" in the domain part was to be made by
obvious, no-lookahead, syntax, not by either heuristics or "is
this a number" tests.  So, as far as SMTP is concerned,
foo@[1.2.3.4] is a mailbox containing an address literal while
foo@1.2.3.4 is a mailbox containing a domain.  I/we carried the
view forward when we had to deal with IPv6 addresses.  That led
to the decision to explicitly designate the address family
associated with an address literal rather than leaving it to
heuristics on, e.g., what syntax is used in the address.
Whether something that SMTP identifies lexically as an address
literal is valid or not (and whether a particular SMTP client or
server knows what to do with it) is largely not an SMTP problem.
Similarly, whether something that SMTP lexically identifies as a
domain is valid (or can ever be valid) is another question and
not really an SMTP issue except that...

	(i)  There is a long history of SMTP (and other email)
	clients (including submission servers) trying to do
	sanity checks on putative domain names.  Some of that
	history is rooted in systems that were not directly
	connected to the Internet, and hence unable to do DNS
	lookups themselves, wanting to eliminate (reject,
	bounce, or, in the case of submission servers and
	gateways, fix) messages with bogus mailbox addresses as
	early in the transmission processes as possible.  At one
	time, it was very common for SMTP clients (and even some
	MUAs) to have lists of valid TLD labels or valid TLD
	labels of other than two characters in length and/or for
	them to "know" that there was only one four-letter TLD
	label and no TLD label longer than four characters long.
	Some of those tests became much more fragile when ICANN
	started introducing gTLDs at a relatively high rate
	including many with labels other than three characters
	in length and were further complicated by IDN TLDs and
	are usually considered poor practices, especially for
	hosts directly connected to the Internet, today.
	
	(ii) RFC 1123 (in a section not affected by the
	"replacing the mail transport..." comment in 5321) says
	that TLD names must be alphabetic.  At least so far and
	with the exception for the form of IDNA labels, ICANN
	has not violated that rule.  So, 1.2.3.123 is still
	clearly invalid as a domain name.

In both of the above cases, the issues are DNS ones, or ones in
the interface between an SMTP system and the DNS, not SMTP
issues.

Second, coming back to address literals, if SMTP identifies
something as an address literal (i.e., the string "@[" appears)
then, while syntax is given in 5321 for IPv4 address literals,
all of the others (present and future) are ultimately 
   Tag ":" String

For IPv6, the Tag is "IPv6" and the string specified by
standards for presentation forms of IPv6 addresses, not SMTP.
For anything else, the Tag is required to be standardized and
that standard or the related one is expected to specify the
syntax of the String (but there are few character restrictions
on it).    It was, and is, quite intentional that, if some set
of systems, by prior agreement, invented the "foozy" address
format and associated protocols but did not standardize it
through the IETF, 

    usermailbox@[Foozy:666]
could work perfectly well within their collective environment
without causing interoperability problems with
standards-conforming systems.

Finally, while the email specs were quite careful to make a
lexical distinction between a domain and an address literal in a
mailbox name, USLs/URIs did not follow that lead and appear to
rely on heuristics or a subtle understanding of the strings
involved instead.  That is one of many cases in which an
identifier that looks more or less like an email address may not
actually be an email address and certainly doesn't imply
constraints on email addresses or email systems.  If you want to
discuss such identifiers, this is almost certainly not the right
list.

best,
    john