Re: [EAI] mailto

John C Klensin <klensin@jck.com> Thu, 04 November 2010 18:16 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A4C6628C0FA for <ima@core3.amsl.com>; Thu, 4 Nov 2010 11:16:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.926
X-Spam-Level:
X-Spam-Status: No, score=-1.926 tagged_above=-999 required=5 tests=[AWL=-0.227, BAYES_00=-2.599, J_CHICKENPOX_66=0.6, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lQkmKeUfeJAO for <ima@core3.amsl.com>; Thu, 4 Nov 2010 11:16:15 -0700 (PDT)
Received: from bs.jck.com (ns.jck.com [209.187.148.211]) by core3.amsl.com (Postfix) with ESMTP id 3C00A28C0DB for <ima@ietf.org>; Thu, 4 Nov 2010 11:16:15 -0700 (PDT)
Received: from [127.0.0.1] (helo=localhost) by bs.jck.com with esmtp (Exim 4.34) id 1PE4M5-000Bqa-PV; Thu, 04 Nov 2010 14:16:10 -0400
Date: Thu, 04 Nov 2010 14:16:06 -0400
From: John C Klensin <klensin@jck.com>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Message-ID: <2BCA27C2130A1C6CD0691A6F@JcK-eee10>
In-Reply-To: <4CD29DD3.4070103@it.aoyama.ac.jp>
References: <E14011F8737B524BB564B05FF748464A109F3625@TK5EX14MBXC137.redmond.corp.microsoft.com> <4CCFA3F3.9070305@it.aoyama.ac.jp> <DAB0CCE35AA1AF33C35E209A@PST.JCK.COM> <4CD29DD3.4070103@it.aoyama.ac.jp>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Cc: jwz@jwz.org, Shawn Steele <Shawn.Steele@microsoft.com>, ima@ietf.org, "Larry Masinter \\\\(masinter@adobe.com\\\\)" <masinter@adobe.com>
Subject: Re: [EAI] mailto
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Nov 2010 18:16:16 -0000

Hi, Martin,

--On Thursday, 04 November, 2010 20:49 +0900 "\"Martin J.
Dürst\"" <duerst@it.aoyama.ac.jp> wrote:

>...
>> (1) EAI thinks an email address is, in Shawn's notation,
>> unicode@unicode (or possibly unicode@string-with-A-labels).
>> It does not permit %-escapes in the domain name and many mail
>> systems will interpret %-signs in the local part as something
>> else entirely, e.g., routing information.
> 
> Yes. Except that it was ascii@ascii, that has been the same
> since ages.

And there have been email implementations for a decade or two
(even if not "ages") who, in the name of "8-bit-clean" or
interpretations of the robustness principle (with which I'm not
personally sympathetic) have been accepting non-ASCII (and often
non-Unicode) strings and trying to make them work.   And they
have worked in local environments and by prior agreement -- the
discussion in draft-iab-idn-encoding applies here too.

>> (2) For an IRI, mailto:unicode@unicode is perfectly
>> reasonable. However, the mapping to a URI, unless it is
>> scheme/protocol dependent, is likely to produce either
>>     mailto:%-escapes@%-escapes
>> or
>>     mailto:%-escapes@string-with-A-labels
>> 
>> Both of which are really bad news if one tries to get from
>> mailto:String to an email address by dropping the mailto and
>> leaving String.
> 
> Yes. But please note that it has never been possible to get
> from "mailto:String" to the email address simply by dropping
> "mailto:". 

You mean "never valid", not "never possible", don't you?  I
could introduce you to several large financial and "e-business"
operations that do it many times a day.  

While I'm often an advocate of "let those who violated the
standard deal with their own messes when we do something that
tightens or enforces the rules" and think it applies here, we
need to understand what we are getting ourselves into with this.

> In particular, RFC 2368 (which is more than 12
> years old and strictly and purely about ASCII-only mail
> addresses only) says, in Section 2:
>...
 
> So your troubling "parallel discussions" may be mostly due to
> some people having read the specs, and others not. I seriously
> recommend you to read RFC 2368; you may (but should not) be
> surprised at how much text comes directly from there.

I have read it and know what it says.  I am not only concerned
about those who have not but about those who deliberately have
ignored its provisions, secure in the belief that we will never
do anything that has bad consequences for them.  See above. 

>> I think that means that any sort of i18n MAILTO processor has
>> to get from those forms back to the non-escaped
>> Unicode-in-UTF-8 strings that EAI expects before passing an
>> internationalized address off to a mail-sending or processing
>> operation.
> 
> Yes. If it simply implements %-decoding, without looking at
> the byte values, it will easily get the UTF-8 strings
> (Unicode-in-UTF-8 is a pleonasm). 

Having run into a file about two weeks ago that had apparently
started life as BIG5 and then had the same transformation that
UTF-8 uses applied to it, and then was shipped with a
content-type of "UTF-8" (according to the guilty party, because
there is no registration for "B5UTF-8" or equivalent) please
forgive me if I sometimes add detail that should not be
necessary.

> So EAI will be happy. What
> may go wrong is where this UTF-8 string hits an non-EAI mail
> agent.

yes

>...
>> If one
>> reads the second paragraph of Section 5 of RFC 6068 (and the
>> warning in the third-to-last paragraph of Section 7),
> 
> I would also like to point to the many examples in Section 6,
> for example:
>...
> While the example section is written for people who generate
> mailto: URIs from mail addresses, it shouldn't be too
> difficult to turn these examples around, implicitly reading
> them e.g.:
>...

Right.  But the point was that some of the key examples are not
present for EAI purposes and may  create some false sense of
security, not that the existing examples are incorrect in any
way.

>...
> If you have any proposals to improve the language and make it
> easier to understand, I'll be happy to integrate them.

And I'll be happy to look at it _after_ EAI completes all of the
decision-making that might bear on it.

>...
>> Note that this is a key difference from web and web-like
>> applications, where the applications can be assumed to be able
>> to deal with decoding the %-escapes themselves.
> 
> No, there is no such key difference. URIs (and IRIs) can
> contain %-escapes, and any software (be that Web-like or not)
> taking apart an URI and using its components has to know how
> to deal with these. This is the same since ages, and RFC 6068
> didn't change anything.

But that was exactly the point.  Web-like applications take URIs
and use their components natively.   Mail applications generally
don't, so we are dealing with the need for a method-specific
decoding intermediary.  And, if the folks who do those are not
familiar with both the MAILTO spec and the mail ones (or read
the MAILTO spec very carefully and conform exactly even without
knowing what they are doing), or if they are in a hurry and
don't care, bad things can happen that are excluded in the
"application deals with URIs directly" cases.

>...

best,
   john