Re: [EAI] mailto: escaping

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 08 March 2010 07:03 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: ima@core3.amsl.com
Delivered-To: ima@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8AF203A686E for <ima@core3.amsl.com>; Sun, 7 Mar 2010 23:03:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.21
X-Spam-Level:
X-Spam-Status: No, score=0.21 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JcYN32bRu76g for <ima@core3.amsl.com>; Sun, 7 Mar 2010 23:03:37 -0800 (PST)
Received: from scmailgw01.scop.aoyama.ac.jp (scmailgw01.scop.aoyama.ac.jp [133.2.251.41]) by core3.amsl.com (Postfix) with ESMTP id 4B2CC3A68A8 for <ima@ietf.org>; Sun, 7 Mar 2010 23:03:37 -0800 (PST)
Received: from scmse01.scbb.aoyama.ac.jp (scmse01.scbb.aoyama.ac.jp [133.2.253.158]) by scmailgw01.scop.aoyama.ac.jp (secret/secret) with SMTP id o2873U38017310 for <ima@ietf.org>; Mon, 8 Mar 2010 16:03:30 +0900
Received: from (unknown [133.2.206.133]) by scmse01.scbb.aoyama.ac.jp with smtp id 6f53_2e7e_b384c742_2a80_11df_9637_001d096c566a; Mon, 08 Mar 2010 16:03:30 +0900
Received: from [IPv6:::1] ([133.2.210.1]:43529) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S1329D99> for <ima@ietf.org> from <duerst@it.aoyama.ac.jp>; Mon, 8 Mar 2010 16:03:28 +0900
Message-ID: <4B94A12E.7070308@it.aoyama.ac.jp>
Date: Mon, 08 Mar 2010 16:03:10 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.1) Gecko/20090902 Eudora/3.0b3
MIME-Version: 1.0
To: Shawn Steele <Shawn.Steele@microsoft.com>
References: <mailman.13830.1247508102.4936.ima@ietf.org> <CAD7705D4A93814F97D3EF00790AF0B315FA6650@tk5ex14mbxc105.redmond.corp.microsoft.com> <4A5BABF8.4080900@isode.com> <CAD7705D4A93814F97D3EF00790AF0B315FA6B01@tk5ex14mbxc105.redmond.corp.microsoft.com>
In-Reply-To: <CAD7705D4A93814F97D3EF00790AF0B315FA6B01@tk5ex14mbxc105.redmond.corp.microsoft.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: "ima@ietf.org" <ima@ietf.org>
Subject: Re: [EAI] mailto: escaping
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Mar 2010 07:03:39 -0000

Hello Shawn,

I'm not sure I ever replied to your mail. I think your comments were 
partially directed at draft-duerst-mailto-bis and partially at 
draft-ietf-eai-mailto.

On 2009/07/14 7:27, Shawn Steele wrote:
> I didn't see any discussion of my mailto: comments.
>
> I think that it's good for the mailto: doc to provide escaping of the UTF-8 code points, however I also think that many clients will be quite liberal in what they allow.  (Safari&  IE on Windows already seem to allow mailto:café(at)example.org and I don't see any reason to break that)

The escaping is provided as part of the URI definition, because URIs 
don't take anything else than US-ASCII. If escaping is based on UTF-8, 
then the scheme automatically qualifies to be used with IRIs, and 
therefore you can automatically use cafe(at)éxample.org (note the 
position of the é, more on that below) under draft-duerst-mailto-bis as 
an IRI.

As for café(at)example.org or café(at)éxample.org, 
draft-duerst-mailto-bis doesn't allow that, because it addresses all the 
pre-EAI I18N stuff. But there is already some forward-compatible text in 
draft-duerst-mailto-bis:

 >>>>
    5. Percent-encoding of non-ASCII octets in the <local-part> of an
       <addr-spec> is reserved for the internationalization of the
       <local-part>. Non-ASCII characters MUST first be encoded
       according to UTF-8 , and then each octet of the corresponding
       UTF-8 sequence MUST be percent-encoded to be represented as URI
       characters. Any other percent-encoding of non-ASCII characters is
       prohibited. When a <local-part> containing non-ASCII characters
       will be used to compose a message, the <local-part> MUST be
       transformed to conform to whatever encoding may be defined in a
       future specification for the internationalization of email
       addresses.

This is essentially just saying "don't even THINK about anything else 
than UTF-8(-based percent-encoding) for the left hand side", and "we 
don't know yet how or when that might actually work" (but you can now 
just say "oh, I think I know how this works" and use EAI).

> In other words in 2) I'd like to see the language:
>
> "Applications MAY process UTF-8 encoded values in mailbox and hvalue directly even if they aren't percent-encoded."

Applications can process IRIs if they choose to do so (and I very much 
hope they will choose to do so if they don't already do so). Actually, 
IRIs allow more than just to process raw UTF-8, it is also possible to 
pick up an IRI encoded in let's say iso-2022-jp from a Japanese email 
and convert it to UTF-8 and use it to send an email.

> Also it looks like hvalue SHOULD be percent encoded, but in the mailbox section it says "Percent-encoding can be used to denote non-ASCII characters" without any SHOULDs that I can see.

Because we are talking about URI syntax, it's assumed from the context 
that you cannot use non-ASCII characters directly (as long as you work 
with URIs rather than IRIs).

> I'm happy for hvalue (and mailbox) to have SHOULD be percent-encoded for IRI [RFC3987] compatibility, though I note that humans typing mailto values directly aren't likely to do this.

So humans aren't going to use URIs when IRIs are more convenient. Fine, 
as long as the software handles them (which in your case seems to be 
already the case).

> 4.1 also says "angle brackets...are mandatory", but doesn't use MUST language.  I'd like it to be SHOULD:
>
>     Please note that if the left-hand side of the mail address contains
>     non-ASCII characters, the less-than and greater-than sign (angle
>     brackets, escaped as %3C and %3E) SHOULD be included.
>
> (Why are<>  necessary?)

I think that was the direct conclusion from the fact that with EAI, it's 
impossible to just write
To: café(at)éxample.org
but one has to write
To: <café(at)éxample.org>

I think we can discuss this once we go back to draft-ietf-eai-mailto. It 
would definitely be great if we could get rid of some escaping clutter.

Regards,   Martin.

> I'd also like similar language if the angle brackets are missing
>
> "Applications MAY process UTF-8 encoded mailbox values directly even if the RECOMMENDED angle brackets are not present"
>
> I don't think it's realistic to expect current systems (Browsers, OS, etc.) to be more restrictive than they already are.  Specifically UTF-8 domain names tend to work fine with mailto in IDN aware systems, regardless of percent encoding or angle brackets.
>
> Any comments this time? :)
>
> -Shawn
>
> _______________________________________________
> IMA mailing list
> IMA@ietf.org
> https://www.ietf.org/mailman/listinfo/ima

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp