Re: [EAI] IDN in Unicode

John C Klensin <klensin@jck.com> Fri, 24 April 2015 23:44 UTC

Return-Path: <klensin@jck.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B9AA31ACE24 for <ima@ietfa.amsl.com>; Fri, 24 Apr 2015 16:44:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.61
X-Spam-Level:
X-Spam-Status: No, score=-2.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AkxHo2vu1CYH for <ima@ietfa.amsl.com>; Fri, 24 Apr 2015 16:44:24 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EC6B71ACDF8 for <ima@ietf.org>; Fri, 24 Apr 2015 16:44:23 -0700 (PDT)
Received: from [198.252.137.35] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <klensin@jck.com>) id 1YlnGZ-000J9m-Dc; Fri, 24 Apr 2015 19:44:15 -0400
Date: Fri, 24 Apr 2015 19:44:10 -0400
From: John C Klensin <klensin@jck.com>
To: Shawn Steele <Shawn.Steele@microsoft.com>, ima@ietf.org
Message-ID: <D568FEFA9D358E7DAE05F84C@JcK-HP8200.jck.com>
In-Reply-To: <BLUPR03MB1378D012C16E1FD2A700B12D82EC0@BLUPR03MB1378.namprd03.prod.outlook.com>
References: <BLUPR03MB1378D012C16E1FD2A700B12D82EC0@BLUPR03MB1378.namprd03.prod.outlook.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.35
X-SA-Exim-Mail-From: klensin@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <http://mailarchive.ietf.org/arch/msg/ima/IzTcpE2x9_Q6ucT4wLQGfDu5ASs>
Subject: Re: [EAI] IDN in Unicode
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Apr 2015 23:44:25 -0000


--On Friday, April 24, 2015 19:21 +0000 Shawn Steele
<Shawn.Steele@microsoft.com> wrote:

>> The basic propose is to find solution for IDN domains and
>> email addresses be present in original scripts.  
> 
> That seems like a completely different discussion than "let's
> use UTF-32".  

I think that is the point that Martin, Andrew, myself, and maybe
others have been trying to make.

> Many applications handle domain names natively in either UTF-8
> or UTF-16, either of which are sufficient to describe the
> domains in their original scripts.  
> 
> Punycode is kind of a hack to provide a mechanism to resolve
> domain names using the preexisting infrastructure that is
> limited to the ASCII space.  However many applications, like
> EAI, prefer UTF-8.

I wouldn't characterize either the Punycode algorithm or its
results in quite that way.   It may be that one person's hack is
someone else's elegant solution.   It is also worth noting that
Punycode is part of a fairly complex system of rules for IDNA
that deal with a number of issues other than the
ASCII-compatibility one.  If it were not for those issues, it
might have been plausible to note that the DNS can support
labels containing of a string of arbitrary octets, develop a
convention for identifying such strings as containing UTF-8 when
appropriate, and then simply tell people to wait until the
relevant protocols were upgraded.  There are multiple reasons
why IDNA (and hence the Punycode algorithm) was a better idea
for domain names, with two important ones being the symbolic
importance of claiming to have deployed IDNs and the desire of
some members of the ICANN community to start selling IDN labels
as soon as possible with usability as a secondary consideration.

> Our recommendations provide for IDN to be looked up with
> Punycode only in the resolution phase.  For many apps that is
> transparent to the app and they can just use Unicode with the
> system API set.  Some apps, however may need to recognize
> Punycode domains for whatever reason or additional protocol
> limitations.  However in those cases it is still recommended
> that they prefer Unicode and only use the Punycode when
> absolutely necessary.

That is in essence what RFC 6055 says although there are some
nuances that might be worth attention.   In addition and
independent of when the Punycode algorithm is applies, there are
some significant advantages to applying the tests for U-label
validity as early in the process as possible.  If that is not
done, error reporting is likely to deteriorate into "you lose"
or complaining about strings the user has never seen rather than
being able to report problems to users in and with contexts that
make sense to them.

    john