Re: [EAI] [IETF] Display of Email Addresses [was: Internationalized Email Internet Draft]

John C Klensin <> Fri, 14 October 2016 16:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DD6A3129867 for <>; Fri, 14 Oct 2016 09:34:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.896
X-Spam-Status: No, score=-4.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-2.996] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 4oT2n6LUOOFG for <>; Fri, 14 Oct 2016 09:34:42 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 6B1A8129864 for <>; Fri, 14 Oct 2016 09:34:42 -0700 (PDT)
Received: from [] (helo=JcK-HP8200) by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1bv5RO-000B2n-8l; Fri, 14 Oct 2016 12:34:38 -0400
Date: Fri, 14 Oct 2016 12:34:33 -0400
From: John C Klensin <>
To:, "HANSEN, TONY L" <>,
Message-ID: <DB82BB41C548C1D17CDA7BEE@JcK-HP8200>
In-Reply-To: <>
References: <> <> <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Archived-At: <>
Cc: Harish Chowdhary <>
Subject: Re: [EAI] [IETF] Display of Email Addresses [was: Internationalized Email Internet Draft]
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 14 Oct 2016 16:34:45 -0000

--On Friday, October 14, 2016 14:49 +0000 wrote:

> John / Tony,
> Continuing my splitting of topics!   Hope this makes some
> kind of sense to others.
>> By contrast, Section 1.1 talks about display of email
>> addresses, including the local part ("in Punycode" [2]). 
>> While a mail delivery server is free to create whatever
>> aliases for a ?>mailbox local part it likes, including
>> "xn-t2bmh3a" or "123456", "george" or "example", in general
>> converting a local part using the Punycode algorithm and
>> displaying the result is >prohibited by the EAI standards
>> (and, incidentally, RFC5321).  More important, it will
>> often lose information and is potentially very dangerous.

> This is a very interesting problem.   We are hoping to do
> some kind of spreadsheet or other visual where we can show
> what happens with a number of mail servers.  For example,
> what does Yahoo mail do, what does gmail do, why some clients
> fail, etc. 

I am glad to see another one, but note that this was done with
what was then available before the Beijing workshop, that
something similar has been done (or claimed to be done) in the
"Universal Acceptance" group (see below), and maybe elsewhere.

You also need to be very careful about the tests you run.  For
example, on delivery and when referring to its own (virtual)
mailbox names, gmail apparently discards some or all ASCII
delimiter characters in local parts, treating local parts
containing such characters as equivalent to ones with the
characters dropped.  I have no idea what they do with delimiter
characters from other scripts, but some other systems would
consider that delimiter-dropping behavior pathological or worse.
A different way of looking at that is that they are preserving
the local parts but dropping certain characters in mapping from
mailbox name to actual storage.  Either is fine with the
protocols as long as the computational aliasing is done only at
or beyond the delivery server.  

There are also a large number of issues at the boundaries
between email and HTML/HTTP that affect many implementations
that fail to account for the differences.  That problem, and to
some extent the gmail behavior above, are examples of another
situation we see quite often: the problems are there even in
all-ASCII strings and identifiers, i18n simply amplifies them
and makes them more obvious.

>  I am in the process of setting up a demo system
> for all this.  Let me tell you, I have learned quite a bit.

Sadly, I am not surprised.  I am also not surprised if a good
deal of what you have learned has been painful.  It certainly
has been for many of the rest of us.  I think it is also
important to keep in mind that most of the underlying issues are
ultimately the result of variations and evolution in human
language and writing systems.   Some of the decisions we have
made in the IETF, and that the Unicode Consortium has made, have
made some issues harder and some easier but, as long as we want
to deal with the full range of human languages and writing
systems, most of the problems are inherent and the choices end
up being about compromises (or, if you prefer, winners and
losers).  Some of the relationships are rather old.  For
example, rather explicit decisions were made many centuries ago
that Latin and Chinese characters should be easily
distinguishable by people with some familiarity with them but
without necessarily being literate.  Latin ended up with a lower
familiarity requirement by having very few characters; Chinese
ended up with many characters and the advantages of a single
writing system that could express meaning of many languages,
even ones that are mutually incomprehensible while Rome took the
approach that everyone should simply learn Latin.  That  makes
the two scripts special even though some evolution to both in
more recent centuries has muddied the distinguishability

>   Including about DNS queries which don't resolve properly.
>  Sigh.  That is still ANOTHER topic.   As I say, I want to
> get organized and have a good way to show this.   Not quite
> there yet!  

See Andrew's comment (and my earlier one) about the Universal
Acceptance effort.   But I suggest you need to go further than
reaching out to them.  The sad reality is that there are very
few people in the Internet (not just IETF) community with a good
understanding of email protocols and operations, DNS and IDNA
protocols and operations, writing systems and character coding
including the many writing system issues that have nothing to do
with character coding, and Unicode operations.  On many days,
I'm not even sure I'm part of that rather small group.   Most of
us are willing to try to explain issues to others, but only to
those who want to put the energy into listening and learning,
not just saying thing that amount to "as long as I think it
works for my language, everything else is fine" or "just tell me
what to do because I have no intention of learning or thinking
about the issues".  That expertise is spread sufficiently thin
that answering a question or reviewing a document in one places
causes something else to be delayed or not happen (e.g., as
mentioned earlier, responding to your notes, which I considered
important, has retarded progress on both PRECIS and URNBIS work
for several days).  So there are extremely pragmatic reasons for
you to try to either merge your work into the Universal
Acceptance effort (or vice versa) or to establish really clear
boundaries between the two.

And that is probably all the time I can put in on this today.