[EAI] [IETF] Content Issues [ was: Internationalized Email Internet Draft]

<nalini.elkins@insidethestack.com> Fri, 14 October 2016 13:30 UTC

Return-Path: <nalini.elkins@insidethestack.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B80DF1294F2 for <ima@ietfa.amsl.com>; Fri, 14 Oct 2016 06:30:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o731kysiCJxx for <ima@ietfa.amsl.com>; Fri, 14 Oct 2016 06:30:51 -0700 (PDT)
Received: from nm23-vm0.bullet.mail.ne1.yahoo.com (nm23-vm0.bullet.mail.ne1.yahoo.com [98.138.91.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 953721294E7 for <ima@ietf.org>; Fri, 14 Oct 2016 06:30:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1476451850; bh=cW2FbQ7BXfQSnSz2IgeWYPSo/k2ftRAtBJeaMJa2oFc=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:From:Subject; b=LBeiORySZ/fzmF55lD/kt6bwV7vcQ3e21UA6Ct3EXd5V5JZVZ9wIa+ZvO8v/NawoLAMsmYv/reKn5QlvK5S0iP9Rnkgatd55jun9DoaO+oGYhPbbVRUQXEDyL0Mc9XdtAej9Ycn8K9cCl2KnKDWyJEIQtGfWuZFIqUdbyYKmMomYK3AQef49ozmSqXLnEpFr4EmncLPJvGLGk8XCD5+qcq30niILlWbp5LAVWHKngpCJyK19MJIcp6b2ju78HHYB6xwA5wbHzDyOHYsQxXbHBvSofblanBVRb42dVSTXetYbKwbxX0w7KxZ03R7jjRhZcIlAhHaUQ56ZM/C/2XzWUQ==
Received: from [98.138.100.113] by nm23.bullet.mail.ne1.yahoo.com with NNFMP; 14 Oct 2016 13:30:50 -0000
Received: from [98.138.89.232] by tm104.bullet.mail.ne1.yahoo.com with NNFMP; 14 Oct 2016 13:30:50 -0000
Received: from [127.0.0.1] by omp1047.mail.ne1.yahoo.com with NNFMP; 14 Oct 2016 13:30:50 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 34070.9218.bm@omp1047.mail.ne1.yahoo.com
X-YMail-OSG: DpYJt7IVM1leq00MBBx0gNq6DsEnvXq33yWxExXUQqcvhpPCKPpsVYOM0lFXnV3 uMl0B6ax_LbVGeXVRboe48tPjfIkOrY9oOXLMXEafrcpdlvAdeboPv_Mo4ojfG22IrcThmqwv6Px pmO7W2tF_ASLapn1g9KlpheT505gIO5JgjnjEQzNAi.5k43DeiiMw0hXSzdHekQS1TOqqSE6.knx qIq_.IhRP.0SoAvhfDSF2z0aAxPRvdK2jRZhIiJGa_AtWuhIrWO_Xv7_qQqvj83XbEudpSjJiVc7 xCbIL.s85o6C5URkXMsFsIlMRW7L74CVXYr.4M7EwvXVVG3eIiIJSBYHF5ig0H9MSp3KbDPHvzy1 sFHqZsIZvLFXUB2YmQ8i4IQHRce4eNX3iVq5a_PyYRbX7I.UMOAbQEKMeiof_BkY9MeR4V8INCuj OTObW78svMqhZ0lQFdIYRwS12pO7p0H_CdTuvQ0r7_Q9HkiRTsG_I4y5WhpooFqNx1_64oxUjeVZ 4jIRRu3exBiF.eYFECIwbn2ScGGKmog60SUrDuw3QxUFKVTn5e2U-
Received: from jws200064.mail.ne1.yahoo.com by sendmailws118.mail.ne1.yahoo.com; Fri, 14 Oct 2016 13:30:49 +0000; 1476451849.644
Date: Fri, 14 Oct 2016 13:30:36 +0000
From: nalini.elkins@insidethestack.com
To: John C Klensin <klensin@jck.com>, "HANSEN, TONY L" <tony@att.com>, "ima@ietf.org" <ima@ietf.org>
Message-ID: <489025644.216489.1476451836537@mail.yahoo.com>
In-Reply-To: <E125B6AC26988823306936BF@JcK-HP5.jck.com>
References: <20161006055447.32573.qmail@pro-236-157.rediffmailpro.com> <9EC0EB65-9C58-43ED-9A80-1DA32C58E3E0@att.com> <E125B6AC26988823306936BF@JcK-HP5.jck.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_216488_699762746.1476451836529"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ima/8iNSGT6R5LUpqiKlG7ePXfd9qQs>
Cc: Harish Chowdhary <harish@nixi.in>
Subject: [EAI] [IETF] Content Issues [ was: Internationalized Email Internet Draft]
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: nalini.elkins@insidethestack.com
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Oct 2016 13:30:55 -0000

John / Tony,
I am going to split your comments into separate threads so that I can keep track of each.   The first is about co-mingling content vs. headers.
>(1) The so-called EAI standards, as listed in the Introduction, are about email envelope and header information presented directly (e.g., in UTF-8) as non-ASCII characters.  A good deal of the document appears to address mail >content information such as textual message bodies, in other scripts.  With the possible exception of language selection when a message is sent with the same basic text in several languages (multipart/alternative was designed with >that case in mind but have been used in other ways), we thought we solved that content problem with MIME in 1992.  If MIME is inadequate, the authors or others should produce a document explaining the issues and not confuse >them with EAI / SMTPUTF8.  If it is adequate, then, like Tony although perhaps for different reasons, I don't see what Section 1.2 is doing here, what the relevance of Section 3.2 is, and several other statements should be examined >carefully to be sure they are talking about addresses and/or headers and not content.

Yes.  I see your point.   Let me say first the basic thing that we are trying to do is to discuss the holistic user experience of internationalized emails from an operational point of view.   In so doing, the co-mingling happened.  We could do a second draft for content issues or change the abstract of this one to better state what our real goal is.
Secondly, as you guys know well, there are lots of other issues with IDN, browser support, etc.   What we were actually hoping is that we could have a forum (perhaps like DNSOps or v6Ops) where we could come together to define and discuss such problems, move towards best practices (or work arounds! Not that I like that, but it happens.)   Because we have not even started on problems that we see such as search algorithm ranking of IDNs and so on.   We were hoping that others would step up to author such other drafts.
 Thanks,
Nalini ElkinsInside Products, Inc.www.insidethestack.com(831) 659-8360

      From: John C Klensin <klensin@jck.com>
 To: "HANSEN, TONY L" <tony@att.com>; ima@ietf.org 
 Sent: Sunday, October 9, 2016 7:57 PM
 Subject: Re: [EAI] [IETF] Internationalized Email Internet Draft
   


--On Thursday, October 06, 2016 4:39 PM +0000 "HANSEN, TONY L"
<tony@att.com> wrote:

> I think getting deployment feedback from EAI is important, and
> this draft is an excellent start.
> 
> I'm not convinced that section 1.2 describes a real problem.
> People do this all the time today with various combinations of
> languages. Why is the combination of Russian and Chinese any
> different? If you think it is, then please expand on the
> aspect that does make it more difficult.
> 
> I forwarded a number of nits to the authors.

Hi.  I was going to hold off until some later and more mature
version of this draft, but since Tony has commented, while I
believe the issues with EAI deployment are important, I see
several problems with this draft, some of which were actually
discussed in the WG but appear to be ignored here.  Perhaps more
important, it is seriously incomplete relative to issues that
have been discussed at great length in the EAI WG, at the APEC
meeting on internationalized email in Beijing in October 2014,
the May 2015 workshop in Thailand, and elsewhere.  I strongly
suggest that, if there is going to be a discussion in Seoul,
this document is in need of a great deal of work first.  Some of
those issues are:

(1) The so-called EAI standards, as listed in the Introduction,
are about email envelope and header information presented
directly (e.g., in UTF-8) as non-ASCII characters.  A good deal
of the document appears to address mail content information such
as textual message bodies, in other scripts.  With the possible
exception of language selection when a message is sent with the
same basic text in several languages (multipart/alternative was
designed with that case in mind but have been used in other
ways), we thought we solved that content problem with MIME in
1992.  If MIME is inadequate, the authors or others should
produce a document explaining the issues and not confuse them
with EAI / SMTPUTF8.  If it is adequate, then, like Tony
although perhaps for different reasons, I don't see what Section
1.2 is doing here, what the relevance of Section 3.2 is, and
several other statements should be examined carefully to be sure
they are talking about addresses and/or headers and not content.

(2) Within an address, there is, as the I-D points out and
consistent with RFC 5321, a local part and a domain part.  RFCs
6530 and 6531 make it quite clear (at least we thought they did)
that they are handled differently.  For the domain part, the
rules are laid out in the IDNA2008 specs (RFC 5890ff).  Issues
about look-alike characters have been extensively discussed and
written about (even though some of us have questioned the
quality of some of that work).  It does not seem useful to me to
revisit those issues here, especially without reference to the
prior work and discussions or if some of the discussion here is
wrong or contains obvious omissions.  As an example from the
first paragraph of Section 6.1, Latin "c" (U+0063) and Cyrillic
"c" (U+0441) are typically written with identical graphemes, but
are not on the list.    More important, while the "paypal"
example with U+0430 substituted for "a" (U+0061) has been used
repeatedly, including in a careful study in an article that is
not cited in this draft, it is possible to write "раура1"
with the first five characters in Cyrillic and the last one a
digit (which is script independent)
(\u'0440'\u'0430'\u'0443'\u'0440'\u'040'\u'0031' [1]), therefore
not even violating conventions prohibiting mixed-script labels.
There is, of course, no ambiguity in the A-label form, although
the authors quite properly point out that it is not
user-friendly.

By contrast, Section 1.1 talks about display of email addresses,
including the local part ("in Punycode" [2]).  While a mail
delivery server is free to create whatever aliases for a mailbox
local part it likes, including "xn-t2bmh3a" or "123456",
"george" or "example", in general converting a local part using
the Punycode algorithm and displaying the result is prohibited
by the EAI standards (and, incidentally, RFC5321).  More
important, it will often lose information and is potentially
very dangerous.

(3) Arabic should not be confused with a strictly right-to-left
writing system.  I am not aware of any such systems in wide use
for contemporary languages today.  The problem is that numerals,
whether written in European digits, Arabic or Arabic-Indic
digits, Chinese (Han) digits, or many others, have been written
left to right since that type of positional notation was
invented and became widely used.  As a result, the scripts are
referred to (in Unicode-speak) as "bidirectional" or "bidi" [3].
Their implications for domain names and IDNA are the subject of
RFC 5893.

(4) Multiple addresses for one user (and Section 4).  Keeping in
mind that many people maintain a number of identities, and even
multiple email addresses, for different purposes, I don't
understand what point you are trying to make with this section.
Many of us believe that users who have mailboxes whose names
involve non-ASCII local parts and who engage in communications
outside their primary language group will find it necessary to
maintain either separate all-ASCII mailboxes or all-ASCII
aliases to their primary mailboxes and to do so for a very long
time.  That issue has been extensively analyzed and discussed
but this document avoids that work, which is both a problem and
an opportunity.

(5) Section 2.1 asserts that email servers), implying all of
them, store data (messages?) in relational databases.  That is
simply false.  Some do; others don't.  Even for those that do,
there may be a difference between Unicode-capable data storage
and Unicode-capable keys or indexes.  There is also absolutely
no requirement that any such system store Unicode strings
encoded in UTF-8; many do not.  

(6) There is a necessary difficulty with SMTPUTF8, which is that
one cannot transmit a message with non-ASCII characters in
addresses or headers to a system that does not support them.
Final delivery systems should probably not accept messages
unless they have reason to predict that the mail store will
handle them _and_ that the user associated with the target
mailbox will be able to retrieve them.  Since a user with an
all-ASCII mailbox name might still receive a message with, e.g.,
a non-ASCII backward-pointing address in the envelope or
headers, making that decision is not straightforward.  That
leads to a strong case that, if one wants broad deployment of
SMTPUTF8, the place to start is with the MUAs (including the
Webmail systems) and associated POP and IMAP servers and
clients.  The "to various extents" list in the first part of
Section 3 is not particularly helpful in that regard.

(7) Finally, this is an internationalization (i18n) problem as
much as it is an email problem.  Terminology (and, where
characters or code points are referred to, their precise
identification) is very important because the alternative is
typically a good deal of user confusion about what you are
talking about and other impediments to making progress.  Saying
"English" were you mean "Basic Latin Script" or "ASCII" is not
helpful, especially given that 5321 local parts can include any
ASCII character and that ASCII is not sufficient to write
English.  Conversely, it appears that there are a few places
where, correctly or incorrectly, you really do mean "English"
when you say that.  Similarly, talking about one particular
encoding when you mean "Unicode" is confusing and may be
misleading.  RFC 6365 may give you a start on some of the issues.

regards,
    john


  -------------
[1] I recommend the authors have a look at RFC 5137.

[2] Punycode is an encoding method, not a display format.  See
RFC 5890, Section  2.3.4. 

[3] http://unicode.org/reports/tr9/

_______________________________________________
IMA mailing list
IMA@ietf.org
https://www.ietf.org/mailman/listinfo/ima