Re: [ietf-smtp] Are A-label and U-label addresses supposed to be equivalent ?

John C Klensin <john-ietf@jck.com> Tue, 14 July 2020 05:16 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ietf-smtp@ietfa.amsl.com
Delivered-To: ietf-smtp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 879BD3A1071 for <ietf-smtp@ietfa.amsl.com>; Mon, 13 Jul 2020 22:16:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id erHIcLtNRbpa for <ietf-smtp@ietfa.amsl.com>; Mon, 13 Jul 2020 22:16:44 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 562F13A106F for <ietf-smtp@ietf.org>; Mon, 13 Jul 2020 22:16:44 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1jvDJ0-0007bq-Io; Tue, 14 Jul 2020 01:16:38 -0400
Date: Tue, 14 Jul 2020 01:16:33 -0400
From: John C Klensin <john-ietf@jck.com>
To: yaojk <yaojk@cnnic.cn>, John R Levine <johnl@taugh.com>, ietf-smtp <ietf-smtp@ietf.org>
Message-ID: <491D0EC016D586FAE12DB805@PSB>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-smtp/8B_Kvodp9plOWx0i_rf-G9WC9gY>
Subject: Re: [ietf-smtp] Are A-label and U-label addresses supposed to be equivalent ?
X-BeenThere: ietf-smtp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion of issues related to Simple Mail Transfer Protocol \(SMTP\) \[RFC 821, RFC 2821, RFC 5321\]" <ietf-smtp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-smtp/>
List-Post: <mailto:ietf-smtp@ietf.org>
List-Help: <mailto:ietf-smtp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Jul 2020 05:16:47 -0000


--On Monday, 13 July, 2020 10:46 +0800 Jiankang Yao
<yaojk@cnnic.cn> wrote:

>...
>> > --On Saturday, July 11, 2020 22:30 -0400 John R Levine
>> <johnl@taugh.com> wrote:
> 
>> > Let's say I had these two addresses, which are the same
>> > except that one has a U-label and the other has the
>> > equivalent A-label.  They're both non-ASCII addresses since
>> > the mailbox name is in Chinese
>> > 
>> > 用户1@后缀.services.net
>> > 用户1@xn--fqr621h.services.net
>> > 
>> > Presumably if you're sending mail to those addresses, you
>> > should send them to the same place.
> 
>> Given the way the interactions between RFC 6531, RFC 5321, and
>> IDNA work, you have little choice: the sending MTA needs to go
>> through IDNA to get an FQDN that can actually get looked up
>> and, after that, the place where it sends (or tries to send)
>> the message is all about the MX records. 
>> 
> 
>> >   But on the final delivery
>> > MTA, is that one address or two?  Different mail software
>> > implement it differently.
> 
> I think that it SHOULD implement it in the same way. That is
> to treat these two addresses to be same. 
> 
> 用户1@后缀.services.net
> 用户1@xn--fqr621h.services.net
> 
> 
> Besides john's example,
> there is another example:
> 
> Do we regard these two addreses to be same?
> user@EXAMPLE.com
> user@example.com
> 
> 
> I think that it is yes.
> user@EXAMPLE.com is another form of user@example.com.

It is yes, but that is irrelevant because the DNS specifications
involve/require case insensitivity for ASCII domain names.  IDNA
does not permit upper-case non-ASCII (e.g., EXÁMPLE.com is
invalid, is not a U-label, and has no U-label equivalent. 

> For the same reason, 
> 用户1@后缀.services.net in another form of 
> 用户1@xn--fqr621h.services.net

For DNS purposes, yes.  But the only time an SMTP-client is
supposed to be checking the DNS is at resolution time.
Otherwise, absent specific language in the SMTPUTF8 specs (which
I couldn't find when I looked back through them yesterday), the
RFC 5321 rule that MTAs other than the final delivery one are
not supposed to be trying to figure out what addresses "mean"
almost certainly applies.
 
>> > Looking at RFCs 6530 and 6531 I get the impression the
>> > authors assumed they'd be the same but I don't see anywhere
>> > it explicitly says so.
> 
>> Speaking as one of the authors, I don't think we discussed it
>> (rather than assuming anything).  There is a specific reason
>> why we probably didn't, which goes to the "what questions are
>> you really asking" issue.  The core SMTPUTF8 specs rather
>> strongly discourage using anything but native character UTF-8
>> strings anywhere other than in the SMTP client when it is
>> trying to figure the next hop out.  
>> 
> 
> 
> Besides john's explanation,
> RFC6531 has some clarification.
> 
> Section 3.7.  Additional ESMTP Changes and Clarifications
> 
>    The information carried in the mail transport process
> involves    addresses ("mailboxes") and domain names in
> various contexts in    addition to the MAIL and RCPT commands
> and extended alternatives to    them.  In general, the rule is
> that, when RFC 5321 specifies a    mailbox, this SMTP
> extension requires UTF-8 form to be used for the    entire
> string.  When RFC 5321 specifies a domain name, the
> internationalized domain name SHOULD be in U-label form if the
>    SMTPUTF8 extension is supported; otherwise, it SHOULD be in
> A-label    form.
> 
> So in the email address, both U-label and A-label are
> different forms for the Internationalized Domain Names.

That is not what the above says.   

What it says is that 

(1) When SMTPUTF8 is in use with a non-ASCII local-part, UTF-8
is required to be used in both the local-part and the
domain-part.    That makes 用户1@xn--fqr621h.services.net a
violation of the spec, regardless of what it matches.

(2) When SMTPUTF8 is in use with a non-ASCII domain-part, a
domain-part that contains non-ASCII labels SHOULD be in U-label
form.  For example, user@后缀.services.net is, at least,
strongly preferred to user@xn--fqr621h.services.net

(3) When SMTPUTF8 is not supported (and therefore not in use),
A-labels SHOULD be used.  For that case, the rules fall back to
the rules of RFC 5321 and it prohibits non-ASCII character in
mailboxes so both of your examples are invalid.  The SHOULD is
actually out of scope for RFC 6531 and someone should file an
erratum.

It doesn't say a word about matching of different forms, only
what forms are allowed (or preferred).  And, for the examples
you and John have used, use of the A-label form is already
non-conforming (modulo the deliberate loophole around the
SHOULD, but that does not apply to the first case), so it may be
the question itself is improperly formed, at least before the
robustness principle is applied.  But I can find no
justification for asserting the two mailbox forms MUST be
treated as matching.

best,
    john