[EAI] Mixed addresses (was: Re: Comments on draft-ietf-eai-frmwrk-4952bis-01 (and -02))

John C Klensin <klensin@jck.com> Mon, 12 July 2010 22:39 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 26C943A6844 for <ima@core3.amsl.com>; Mon, 12 Jul 2010 15:39:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.082
X-Spam-Level:
X-Spam-Status: No, score=-1.082 tagged_above=-999 required=5 tests=[AWL=-0.897, BAYES_40=-0.185]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xchYahzWibPY for <ima@core3.amsl.com>; Mon, 12 Jul 2010 15:39:46 -0700 (PDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by core3.amsl.com (Postfix) with ESMTP id 5465E3A69F7 for <ima@ietf.org>; Mon, 12 Jul 2010 15:39:46 -0700 (PDT)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1OYRf7-0008jw-1p; Mon, 12 Jul 2010 18:39:45 -0400
Date: Mon, 12 Jul 2010 18:39:43 -0400
From: John C Klensin <klensin@jck.com>
To: Jason Nelson <jason@exchange.microsoft.com>, Shawn Steele <Shawn.Steele@microsoft.com>, Joseph Yee <jyee@ca.afilias.info>
Message-ID: <D6503C74A8C97DB3927C02E1@PST.JCK.COM>
In-Reply-To: <6CE0D5FA2297784CB7A6F5F09865C895307F8353@DF-M14-05.exchange.corp.microsoft.com>
References: <E14011F8737B524BB564B05FF748464A0DA552C0@TK5EX14MBXC141.redmond.corp.microsoft.com> <FCA41AD1F0296F7ADB9AF84B@PST.JCK.COM> <E14011F8737B524BB564B05FF748464A0DA57755@TK5EX14MBXC141.redmond.corp.microsoft.com> <AFF8867C-A6B3-49EC-AFF4-77684461067F@ca.afilias.info> <E14011F8737B524BB564B05FF748464A0DA57978@TK5EX14MBXC141.redmond.corp.microsoft.com> <6CE0D5FA2297784CB7A6F5F09865C895307F8353@DF-M14-05.exchange.corp .microsoft.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Cc: ima@ietf.org
Subject: [EAI] Mixed addresses (was: Re: Comments on draft-ietf-eai-frmwrk-4952bis-01 (and -02))
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jul 2010 22:39:48 -0000

--On Monday, July 12, 2010 18:58 +0000 Jason Nelson
<jason@exchange.microsoft.com> wrote:

>...
> In the current design, with a message destined to two
> recipients, A and B, A supports UTF8 and B does not - what
> should B's P2 content look like?  I get that the RFC
> specifically deals with addresses in transit, and that we
> should not be mucking about with messages when not the
> submission server, but resolving how the mail goes to both the
> guys in the US and the your business partner in Taiwan with
> both being able to reply across the board, that's giving me a
> few headaches.

I wonder how many others on this mailing list are familiar
enough with that other Standard messaging architecture to know
what "P2 Content" is?   :-)

In any event, it is these mixed cases that are the reason why
many of us hoped that we could devise an in-transit downgrading
mechanism that would be easy to implement and work universally
in a predictable and interoperable way.   There are, I believe,
three such cases, each with a slightly different scenario:

	(i) Mixed ASCII and non-ASCII destination
	(forward-pointing) addresses with the ASCII address
	sites presumably being non-EAI-capable, as in your
	example.
	
	(ii) ASCII-only destination addresses, some of which are
	EAI-capable, and a backward-pointing ("sender")
	non-ASCII address.
	
	(iii) Mailing lists with mixed ASCII and non-ASCII
	addresses.

Even if it is fully deliverable, with all information preserved
in headers, replying to a message in one of these categories can
easily create one of the others --and a less-tractable situation.

For purposes of that classification,
ASCII-local-part@Punycoded-IDN is an ASCII-only address.  If one
had code that immediately converted IDN A-labels into U-label
form, there are an additional set of cases having to do with
systems that are IDNA-aware but perhaps not EAI-aware.

Anyway, downgrading failed.  We could review/ debate degrees of
failure (and waste a lot of time), but it was clear that we
weren't going to get the hoped-for predictability as soon as we
discovered that messages for which alternate ASCII addresses
existed for some of the non-ASCII ones but not all of them.
Relative to your example, please note the relationship between
"one ASCII destination, one non-ASCII destination" and "two
non-ASCII destinations, one with an alternate address available
and the other without one".

But the problem really isn't new.  Ignore the addresses for a
moment and assume that A is not only UTF8-capable but prefers
message content in Lower Slobbovian.  B can handle only ASCII
and English.  The sender, miraculously, can speak and write both
Lower Slobbovian and English.  Now, what is to be sent?

	(i) If the message content is sent in English, A will be
	unhappy and may reply in Lower Slobbovian with parts of
	the original message copied.   B will receive the
	message from the sender with everything intact and
	comprehensible.  When A replies, the reply message will
	be unreadable to B will not be able to read (even though
	it can be received) and that may be displayed as a low
	of question marks (since B presumably doesn't have Lower
	Slobbovian fonts installed and might not even be able to
	handle "text/plain; charset=utf8").
	
	(ii) If the message content is sent in Lower Slobbovian,
	B is out of luck.  The original message will be received
	but will be incomprehensible -- only the "From" and "To"
	header fields and their envelope counterparts are likely
	to be useful and readable.  And any reply from A won't
	even contain traces of English/ASCII.
	
	(iii) If the message is sent with both charsets (ASCII
	and UTF8) and languages (English and Lower Slobbovian)
	using multipart/alternative...  Well, we had high hopes
	for that one, but it was never as useful as expected
	(and is not supported well, unlike the use of
	multipart/alternative for text and HTML body parts).
	But assume it worked as intended and think about what
	happens to A and B.  Each sees the message in her
	preferred language and characters, is only vaguely aware
	(if at all) that the other language and characters are
	there.  So far, so good.  Now A replies to the message
	she is seeing (a Lower Slobbovian one), in the language
	she sees.  And B responds in the language she is seeing
	(English) in that language.    In a way, this is the
	worst scenario of all because a slight change in the
	example results in A and B seeing content in languages
	that the other one can't read at all.

The address case affects whether messages that might be
unreadable (or that would lead to unreadable replies) can be
delivered or reflected properly in headers, but it doesn't
change things as compared to those nasty content example.  And,
while there are some advantages of having messages the user
can't read delivered (as long as they can be rendered, one might
be able to find a translator), they are pretty marginal in most
cases... especially when they cannot be rendered.

I think the bottom line is:

(1) Internationalization is a messy business that really is not
going to be helpful other than for communication within language
communities.  EAI is no exception in that regard.

(2) A decision to send messages in a possible-mixed environment
wrt addressing, like a decision to send messages in a language
that some recipients might not be able to read (or even render)
is high-risk -- especially in a world in which a popular source
of high spam likelihood scores is "message in a language or
script I don't normally read or can't read at all".

(3) If someone is communicating internationally -- with people
who use multiple languages and writing systems -- the
conservative approach is going to be to use ASCII addresses and
least-common-denominator languages and scripts.  EAI is going to
work best within a language community where all (or almost all)
of the systems are EAI-aware, capable of rendering the relevant
script, and all of the addresses are in that script too,
regardless of how they are encoded.  

While we may need to do some education, those principles are
user-level common sense even if they are lots better understood
about content than about addresses today.  We might or might not
be able to reinforce the education with automated tools (and
might or might not want to).  But those users who violate the
principles will sooner or later get burned with messages that
either don't reach all recipients or replies to those messages
that are incomprehensible.  Like small children and hot stoves,
few of them will need to repeat the experience very many times.

I wish I had a more optimistic answer.  But we need to remember
that every step we take toward a non-ASCII/ non-English (or at
least non-Western-European-language) Internet is a step that
enables some users and language communities at the price of
reducing the degree to which everyone connected to the Internet
can communicate with all the others.  IMO, that is a good
tradeoff, but not one that is without costs.

FWIW, we've had variations on the above conversation/ comments/
analysis enough times that maybe it should be written into a
more public and permanent place.  If there is WG consensus that
it should go into 4952bis, I'm ok with that although I'd
appreciate it if someone else would suggest text.   If not, we
should think about other places and opportunities.

    john