Re: [I18ndir] HTML, email addresses, etc

John C Klensin <john-ietf@jck.com> Thu, 11 June 2020 10:45 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8CB773A0829 for <i18ndir@ietfa.amsl.com>; Thu, 11 Jun 2020 03:45:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4uP3sQxzRTOO for <i18ndir@ietfa.amsl.com>; Thu, 11 Jun 2020 03:45:12 -0700 (PDT)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F0B483A17F5 for <i18ndir@ietf.org>; Thu, 11 Jun 2020 03:45:11 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1jjKhn-0005Gi-JJ; Thu, 11 Jun 2020 06:45:07 -0400
Date: Thu, 11 Jun 2020 06:45:00 -0400
From: John C Klensin <john-ietf@jck.com>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Nico Williams <nico@cryptonector.com>
cc: i18ndir@ietf.org
Message-ID: <3C128AC017D0F3435EAA0DC3@PSB>
In-Reply-To: <ae686ffc-c09f-610a-9395-71808e7b497f@it.aoyama.ac.jp>
References: <B7D61128A7109785BD555955@jkacere15> <20200610211834.GG3100@localhost> <ae686ffc-c09f-610a-9395-71808e7b497f@it.aoyama.ac.jp>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/rgOJDT_LAxpckKIRwsNWjO7q9mw>
Subject: Re: [I18ndir] HTML, email addresses, etc
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jun 2020 10:45:15 -0000


--On Thursday, June 11, 2020 16:25 +0900 "Martin J. Dürst"
<duerst@it.aoyama.ac.jp> wrote:

> On 11/06/2020 06:18, Nico Williams wrote:
>> On Sun, Jun 07, 2020 at 11:46:31PM -0400, John C Klensin
>> wrote:
>>> Hi.
>>> 
>>> The following comments by John Levine and myself might be
>>> relevant to people on this list.
>>> 
>>> https://github.com/whatwg/html/issues/4562#issuecomment-6400
>>> 70477
>> 
>> I'm not sure that we need a new input type for EAI addresses.
>> If the user types in non-ASCII, then presumably *they* know
>> that the domain can handle EAI.
> 
> They know that the receiving domain can handle EAI. But before
> the mail is received, it actually has to be sent. That happens
> from an MTA associated (in some way) with the Web page where
> the the user types in the address.

It may be a fine distinction, but while they presumably know the
delivery server (receiving domain) can handle the address
(otherwise, they would not have that address), they cannot know
whether there is a path from what submission server is
presumably accessed from that web page to the delivery server.
Equally or more important, we need to remember that, because of
the use of email addresses as identifiers, there may be systems
that don't accept identifier strings that are perfectly
acceptable when actually used in email.   

We have a very common example of that which doesn't even involve
i18n issues.  Assume you have an email server or service that is
perfectly ok with alice+bob@example.com (an address style that
is perfectly acceptable under RFC 5321/5322) and that allows
such addresses.  Whether it treats it as "same user, different
folder from alice+jane@example.com" (the subaddress case), a
completely different mailbox (the "'+' is just another character
case), or discards the "+" and treats it internally as
"alicebob@example.com" (leads inevitably to different mailboxes)
is up to it -- no one else has any control.  But, when some web
page that is being used for data entry by the user decides that
"+" does not belong in local parts and hence, when the user
types alice+bob@example.com says "invalid address" and will not
let the user proceed without providing an address that it likes,
it just does not make any difference what the delivery server
will accept.


>> Another way to put it is that if we did add a new input type,
>> should developers build websites where the user must pick
>> whether to enter an EAI or not?
> 
> No. The distinction doesn't apply to a single Web site.
> [Except for your point about browser differences and for cases
> where a Web site has different parts where mail gets routed
> differently. E.g. imagine a trading web site that sends mails
> to sellers via mail system A and mail to buyers via mail
> system B.]
> 
> But it applies to the Web as a whole. Most current web sites
> send their mail via an MTA that is not (yet) set up for EAI.
> And many users may have both an ASCII-only and an EAI email
> address. It helps a lot if users get told by the browser that
> the Web site they are on (more specifically, the associated
> outgoing MTA) can't handle their EAI address.

Yes.

>> Or should developers just always use the new input type?
>> 
>> (Answer to the second question: No! First the page must
>> detect whether the browser supports the new input type!)
>> 
>> If we add a new input type, then after a while that will be
>> all that is used.  Ergo it's not needed.

Unfortunately, when whatever is there (the older type or
keyword) is widely deployed and well-entrenched, "a while" can
be rather long.  For examples, see the number of SMTP-senders
who are still using HELO or, if you prefer a more painful
example, IPv6.

>> The mere fact that the user typed in an internationalized
>> mailbox name is all the evidence we need that it is likely
>> that the domain supports EAI.
>> 
>> You could say that because *your* system doesn't support EAI
>> for outbound email, you won't accept it as input.  _This_
>> could be a new option on the input.
> 
> That's exactly what this option is about.

Exactly.

And, to address your other comment:

>> (2) As to whether we should give them a rule or regular
>> expression to associate with the new type that goes much
>> beyond
>>     bunch-of-octets-for-local-part@bunch-of-octets-for-domain
>>     -part I rather doubt it.
> 
> Please let's make that *at least*
> 
> bunch-of-characters-for-local-part@bunch-of-characters-for-dom
> ain-part

I assume that, in the above, "character" means "Unicode code
point in UTF-8".  Fine with me.  I used "octets" only because,
following a variation on what I think was the argument Nico was
making, I didn't want to impose a requirement for validation on
the browser (or other HTML interpreter).

best,
   john