Re: [EAI] [IETF] Content Issues [ was: Internationalized Email Internet Draft]

Franck Martin <fmartin@linkedin.com> Fri, 14 October 2016 16:53 UTC

Return-Path: <fmartin@linkedin.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DF77E127077 for <ima@ietfa.amsl.com>; Fri, 14 Oct 2016 09:53:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.317
X-Spam-Level:
X-Spam-Status: No, score=-7.317 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.996, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=linkedin.com header.b=ANP+pX8F; dkim=pass (1024-bit key) header.d=linkedin.com header.b=hSc/AEbT
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cl0Z11sUoPPa for <ima@ietfa.amsl.com>; Fri, 14 Oct 2016 09:53:54 -0700 (PDT)
Received: from mail522.linkedin.com (mail522.linkedin.com [108.174.6.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4683A129505 for <ima@ietf.org>; Fri, 14 Oct 2016 09:53:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linkedin.com; s=proddkim1024; t=1476464032; bh=N1hIlkCaWXgvmfmM63KD5MkpGnEVlmFsZ7P+KfxhTu0=; h=MIME-Version:From:Date:Subject:To:Content-Type; b=ANP+pX8FVddfOdQdzGkTVhrxm2RdcmkvkBl9B152kgOGkwVTb+qsNvaoTADK3yNtz aeWxd9oeEt+FEIf4qf9kYfR8aSC/3Tu6Lr2IER+rOwkNq+TEir/kwDZ82GlDIEsf2/ RleN6/7mcaHIMI0xvB93lBXATd+QxjFZ+uiifXrY=
Authentication-Results: mail522.prod.linkedin.com x-tls.subject="/C=US/ST=California/L=Mountain View/O=Google Inc/CN=smtp.gmail.com"; auth=pass (cipher=ECDHE-RSA-AES128-GCM-SHA256)
Authentication-Results: mail522.prod.linkedin.com; iprev=pass policy.iprev="2607:f8b0:400d:c0d::248"; spf=softfail smtp.mailfrom="fmartin@linkedin.com" smtp.helo="mail-qt0-x248.google.com"; dkim=pass header.d=linkedin.com; tls=pass (verified) key.ciphersuite="TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256" key.length="128" tls.v="tlsv1.2" cert.client="C=US,ST=California,L=Mountain View,O=Google Inc,CN=smtp.gmail.com" cert.clientissuer="C=US,O=Google Inc,CN=Google Internet Authority G2"
Received: from [2607:f8b0:400d:c0d::248] ([2607:f8b0:400d:c0d::248.33473] helo=mail-qt0-x248.google.com) by mail522.prod.linkedin.com (envelope-from <fmartin@linkedin.com>) (ecelerity 3.6.21.53563 r(Core:3.6.21.0)) with ESMTPS (cipher=ECDHE-RSA-AES128-GCM-SHA256 subject="/C=US/ST=California/L=Mountain View/O=Google Inc/CN=smtp.gmail.com") id 58/17-11653-0AD01085; Fri, 14 Oct 2016 16:53:52 +0000
Received: by mail-qt0-x248.google.com with SMTP id z54so81432897qtz.0 for <ima@ietf.org>; Fri, 14 Oct 2016 09:53:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linkedin.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=N1hIlkCaWXgvmfmM63KD5MkpGnEVlmFsZ7P+KfxhTu0=; b=hSc/AEbTpcIW5cTR59hXxFtP2QPnOy3afa4HO4Tm06cZYK2zZQNzgOR9d724BfoCzO hSBq56nzzUpJo+GnknhlMZtnFAGpw8jU4Dqbh8NFFhyaZswMU7G+8G651l+G+43luV/4 l1iwAa444fv2uQGZXaivKkKh+GFEUydhViz2w=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=N1hIlkCaWXgvmfmM63KD5MkpGnEVlmFsZ7P+KfxhTu0=; b=L20c7fDWPL4Sk4ZtJVggGVcrmSwJn74iaCeWBu1fqhc7ufw7PjK9N2FUQcV2ST24un fkYa+CkAwRXReIYVXL5x/fdO7vM8t5r2wlEUn+2/mImLBd6J+Qwr3FLRgc4RbUj3jiI4 STmm3MEI1ayjZO21s8Bt/a6VQanFfKCJEopJ2ohATkuZfhtmEDnqy1ECuNYSGFNtN3lf 0eTRoSYodDttS1uMBCoBDSZ4eYuROV5xKRL+7u7y9pSGoQz6rXpP6jbDOyauEQUeXLNy eXLv7YGjff4P3e712C5TjR3JhpuKUg27YieMJXgNdpJ8KoJKaRWIZyBe4SWzktWqOXsk 2g2Q==
X-Gm-Message-State: AA6/9RlRu3RQN5WUTr0nasJOPZDj1lx3MlKgNdnko+WtZ5Qt5mRY0JHjjEOLXHevaNG8frkIHjmbiE4vyge2Bk/6nKQLlLD991BGKgCwCloLZCFIjN4SxDrfbelTRKYSBC7CAfUpyWA5r/Dr3aPTLAv88w==
X-Received: by 10.55.163.214 with SMTP id m205mr12168741qke.68.1476464018518; Fri, 14 Oct 2016 09:53:38 -0700 (PDT)
X-Received: by 10.55.163.214 with SMTP id m205mr12168717qke.68.1476464018175; Fri, 14 Oct 2016 09:53:38 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.140.95.84 with HTTP; Fri, 14 Oct 2016 09:53:17 -0700 (PDT)
In-Reply-To: <489025644.216489.1476451836537@mail.yahoo.com>
References: <20161006055447.32573.qmail@pro-236-157.rediffmailpro.com> <9EC0EB65-9C58-43ED-9A80-1DA32C58E3E0@att.com> <E125B6AC26988823306936BF@JcK-HP5.jck.com> <489025644.216489.1476451836537@mail.yahoo.com>
From: Franck Martin <fmartin@linkedin.com>
Date: Fri, 14 Oct 2016 09:53:17 -0700
Message-ID: <CANyRh9-dag0j4KjE8_h7KmH=chFGrbn24=9c6Hyw+1JdiN79Vg@mail.gmail.com>
To: nalini.elkins@insidethestack.com
Content-Type: multipart/alternative; boundary=94eb2c070a8453df7c053ed61105
Archived-At: <https://mailarchive.ietf.org/arch/msg/ima/JHy1F7Jn0R7Bn49a39fEhoIBroA>
Cc: Harish Chowdhary <harish@nixi.in>, "ima@ietf.org" <ima@ietf.org>
Subject: Re: [EAI] [IETF] Content Issues [ was: Internationalized Email Internet Draft]
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Oct 2016 16:53:58 -0000

I tried to subscribe to this list using my email address 弗兰克@互联网.公司 but
mailman replied:

Your subscription is not allowed because the email address you gave is
insecure.

On Fri, Oct 14, 2016 at 6:30 AM, <nalini.elkins@insidethestack.com>; wrote:

> John / Tony,
>
> I am going to split your comments into separate threads so that I can keep
> track of each.   The first is about co-mingling content vs. headers.
>
> >(1) The so-called EAI standards, as listed in the Introduction, are
> about email envelope and header information presented directly (e.g., in
> UTF-8) as non-ASCII characters.  A good deal of the document appears to
> address mail >content information such as textual message bodies, in
> other scripts.  With the possible exception of language selection when a
> message is sent with the same basic text in several languages
> (multipart/alternative was designed with >that case in mind but have been
> used in other ways), we thought we solved that content problem with MIME
> in 1992.  If MIME is inadequate, the authors or others should produce a
> document explaining the issues and not confuse >them with EAI /
> SMTPUTF8.  If it is adequate, then, like Tony although perhaps for
> different reasons, I don't see what Section 1.2 is doing here, what the
> relevance of Section 3.2 is, and several other statements should be
> examined >carefully to be sure they are talking about addresses and/or
> headers and not content.
>
> Yes.  I see your point.   Let me say first the basic thing that we are
> trying to do is to discuss the holistic user experience of
> internationalized emails from an operational point of view.   In so doing,
> the co-mingling happened.  We could do a second draft for content issues or
> change the abstract of this one to better state what our real goal is.
>
> Secondly, as you guys know well, there are lots of other issues with IDN,
> browser support, etc.   What we were actually hoping is that we could have
> a forum (perhaps like DNSOps or v6Ops) where we could come together to
> define and discuss such problems, move towards best practices (or work
> arounds! Not that I like that, but it happens.)   Because we have not even
> started on problems that we see such as search algorithm ranking of IDNs
> and so on.   We were hoping that others would step up to author such other
> drafts.
>
>
> Thanks,
>
> Nalini Elkins
> Inside Products, Inc.
> www.insidethestack.com
> (831) 659-8360
>
>
> ------------------------------
> *From:* John C Klensin <klensin@jck.com>;
> *To:* "HANSEN, TONY L" <tony@att.com>;; ima@ietf.org
> *Sent:* Sunday, October 9, 2016 7:57 PM
> *Subject:* Re: [EAI] [IETF] Internationalized Email Internet Draft
>
>
>
> --On Thursday, October 06, 2016 4:39 PM +0000 "HANSEN, TONY L"
> <tony@att.com>; wrote:
>
> > I think getting deployment feedback from EAI is important, and
> > this draft is an excellent start.
> >
> > I'm not convinced that section 1.2 describes a real problem.
> > People do this all the time today with various combinations of
> > languages. Why is the combination of Russian and Chinese any
> > different? If you think it is, then please expand on the
> > aspect that does make it more difficult.
> >
> > I forwarded a number of nits to the authors.
>
> Hi.  I was going to hold off until some later and more mature
> version of this draft, but since Tony has commented, while I
> believe the issues with EAI deployment are important, I see
> several problems with this draft, some of which were actually
> discussed in the WG but appear to be ignored here.  Perhaps more
> important, it is seriously incomplete relative to issues that
> have been discussed at great length in the EAI WG, at the APEC
> meeting on internationalized email in Beijing in October 2014,
> the May 2015 workshop in Thailand, and elsewhere.  I strongly
> suggest that, if there is going to be a discussion in Seoul,
> this document is in need of a great deal of work first.  Some of
> those issues are:
>
> (1) The so-called EAI standards, as listed in the Introduction,
> are about email envelope and header information presented
> directly (e.g., in UTF-8) as non-ASCII characters.  A good deal
> of the document appears to address mail content information such
> as textual message bodies, in other scripts.  With the possible
> exception of language selection when a message is sent with the
> same basic text in several languages (multipart/alternative was
> designed with that case in mind but have been used in other
> ways), we thought we solved that content problem with MIME in
> 1992.  If MIME is inadequate, the authors or others should
> produce a document explaining the issues and not confuse them
> with EAI / SMTPUTF8.  If it is adequate, then, like Tony
> although perhaps for different reasons, I don't see what Section
> 1.2 is doing here, what the relevance of Section 3.2 is, and
> several other statements should be examined carefully to be sure
> they are talking about addresses and/or headers and not content.
>
> (2) Within an address, there is, as the I-D points out and
> consistent with RFC 5321, a local part and a domain part.  RFCs
> 6530 and 6531 make it quite clear (at least we thought they did)
> that they are handled differently.  For the domain part, the
> rules are laid out in the IDNA2008 specs (RFC 5890ff).  Issues
> about look-alike characters have been extensively discussed and
> written about (even though some of us have questioned the
> quality of some of that work).  It does not seem useful to me to
> revisit those issues here, especially without reference to the
> prior work and discussions or if some of the discussion here is
> wrong or contains obvious omissions.  As an example from the
> first paragraph of Section 6.1, Latin "c" (U+0063) and Cyrillic
> "c" (U+0441) are typically written with identical graphemes, but
> are not on the list.    More important, while the "paypal"
> example with U+0430 substituted for "a" (U+0061) has been used
> repeatedly, including in a careful study in an article that is
> not cited in this draft, it is possible to write "раура1"
> with the first five characters in Cyrillic and the last one a
> digit (which is script independent)
> (\u'0440'\u'0430'\u'0443'\u'0440'\u'040'\u'0031' [1]), therefore
> not even violating conventions prohibiting mixed-script labels.
> There is, of course, no ambiguity in the A-label form, although
> the authors quite properly point out that it is not
> user-friendly.
>
> By contrast, Section 1.1 talks about display of email addresses,
> including the local part ("in Punycode" [2]).  While a mail
> delivery server is free to create whatever aliases for a mailbox
> local part it likes, including "xn-t2bmh3a" or "123456",
> "george" or "example", in general converting a local part using
> the Punycode algorithm and displaying the result is prohibited
> by the EAI standards (and, incidentally, RFC5321).  More
> important, it will often lose information and is potentially
> very dangerous.
>
> (3) Arabic should not be confused with a strictly right-to-left
> writing system.  I am not aware of any such systems in wide use
> for contemporary languages today.  The problem is that numerals,
> whether written in European digits, Arabic or Arabic-Indic
> digits, Chinese (Han) digits, or many others, have been written
> left to right since that type of positional notation was
> invented and became widely used.  As a result, the scripts are
> referred to (in Unicode-speak) as "bidirectional" or "bidi" [3].
> Their implications for domain names and IDNA are the subject of
> RFC 5893.
>
> (4) Multiple addresses for one user (and Section 4).  Keeping in
> mind that many people maintain a number of identities, and even
> multiple email addresses, for different purposes, I don't
> understand what point you are trying to make with this section.
> Many of us believe that users who have mailboxes whose names
> involve non-ASCII local parts and who engage in communications
> outside their primary language group will find it necessary to
> maintain either separate all-ASCII mailboxes or all-ASCII
> aliases to their primary mailboxes and to do so for a very long
> time.  That issue has been extensively analyzed and discussed
> but this document avoids that work, which is both a problem and
> an opportunity.
>
> (5) Section 2.1 asserts that email servers), implying all of
> them, store data (messages?) in relational databases.  That is
> simply false.  Some do; others don't.  Even for those that do,
> there may be a difference between Unicode-capable data storage
> and Unicode-capable keys or indexes.  There is also absolutely
> no requirement that any such system store Unicode strings
> encoded in UTF-8; many do not.
>
> (6) There is a necessary difficulty with SMTPUTF8, which is that
> one cannot transmit a message with non-ASCII characters in
> addresses or headers to a system that does not support them.
> Final delivery systems should probably not accept messages
> unless they have reason to predict that the mail store will
> handle them _and_ that the user associated with the target
> mailbox will be able to retrieve them.  Since a user with an
> all-ASCII mailbox name might still receive a message with, e.g.,
> a non-ASCII backward-pointing address in the envelope or
> headers, making that decision is not straightforward.  That
> leads to a strong case that, if one wants broad deployment of
> SMTPUTF8, the place to start is with the MUAs (including the
> Webmail systems) and associated POP and IMAP servers and
> clients.  The "to various extents" list in the first part of
> Section 3 is not particularly helpful in that regard.
>
> (7) Finally, this is an internationalization (i18n) problem as
> much as it is an email problem.  Terminology (and, where
> characters or code points are referred to, their precise
> identification) is very important because the alternative is
> typically a good deal of user confusion about what you are
> talking about and other impediments to making progress.  Saying
> "English" were you mean "Basic Latin Script" or "ASCII" is not
> helpful, especially given that 5321 local parts can include any
> ASCII character and that ASCII is not sufficient to write
> English.  Conversely, it appears that there are a few places
> where, correctly or incorrectly, you really do mean "English"
> when you say that.  Similarly, talking about one particular
> encoding when you mean "Unicode" is confusing and may be
> misleading.  RFC 6365 may give you a start on some of the issues.
>
> regards,
>     john
>
>
>   -------------
> [1] I recommend the authors have a look at RFC 5137.
>
> [2] Punycode is an encoding method, not a display format.  See
> RFC 5890, Section  2.3.4.
>
> [3] http://unicode.org/reports/tr9/
>
>
> _______________________________________________
> IMA mailing list
> IMA@ietf.org
> https://www.ietf.org/mailman/listinfo/ima
>
>
>
> _______________________________________________
> IMA mailing list
> IMA@ietf.org
> https://www.ietf.org/mailman/listinfo/ima
>
>