Re: [EAI] IDN in Unicode
John C Klensin <klensin@jck.com> Fri, 24 April 2015 23:44 UTC
Return-Path: <klensin@jck.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B9AA31ACE24 for <ima@ietfa.amsl.com>; Fri, 24 Apr 2015 16:44:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.61
X-Spam-Level:
X-Spam-Status: No, score=-2.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AkxHo2vu1CYH for <ima@ietfa.amsl.com>; Fri, 24 Apr 2015 16:44:24 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EC6B71ACDF8 for <ima@ietf.org>; Fri, 24 Apr 2015 16:44:23 -0700 (PDT)
Received: from [198.252.137.35] (helo=JcK-HP8200.jck.com) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <klensin@jck.com>) id 1YlnGZ-000J9m-Dc; Fri, 24 Apr 2015 19:44:15 -0400
Date: Fri, 24 Apr 2015 19:44:10 -0400
From: John C Klensin <klensin@jck.com>
To: Shawn Steele <Shawn.Steele@microsoft.com>, ima@ietf.org
Message-ID: <D568FEFA9D358E7DAE05F84C@JcK-HP8200.jck.com>
In-Reply-To: <BLUPR03MB1378D012C16E1FD2A700B12D82EC0@BLUPR03MB1378.namprd03.prod.outlook.com>
References: <BLUPR03MB1378D012C16E1FD2A700B12D82EC0@BLUPR03MB1378.namprd03.prod.outlook.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.35
X-SA-Exim-Mail-From: klensin@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <http://mailarchive.ietf.org/arch/msg/ima/IzTcpE2x9_Q6ucT4wLQGfDu5ASs>
Subject: Re: [EAI] IDN in Unicode
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Apr 2015 23:44:25 -0000
--On Friday, April 24, 2015 19:21 +0000 Shawn Steele <Shawn.Steele@microsoft.com> wrote: >> The basic propose is to find solution for IDN domains and >> email addresses be present in original scripts. > > That seems like a completely different discussion than "let's > use UTF-32". I think that is the point that Martin, Andrew, myself, and maybe others have been trying to make. > Many applications handle domain names natively in either UTF-8 > or UTF-16, either of which are sufficient to describe the > domains in their original scripts. > > Punycode is kind of a hack to provide a mechanism to resolve > domain names using the preexisting infrastructure that is > limited to the ASCII space. However many applications, like > EAI, prefer UTF-8. I wouldn't characterize either the Punycode algorithm or its results in quite that way. It may be that one person's hack is someone else's elegant solution. It is also worth noting that Punycode is part of a fairly complex system of rules for IDNA that deal with a number of issues other than the ASCII-compatibility one. If it were not for those issues, it might have been plausible to note that the DNS can support labels containing of a string of arbitrary octets, develop a convention for identifying such strings as containing UTF-8 when appropriate, and then simply tell people to wait until the relevant protocols were upgraded. There are multiple reasons why IDNA (and hence the Punycode algorithm) was a better idea for domain names, with two important ones being the symbolic importance of claiming to have deployed IDNs and the desire of some members of the ICANN community to start selling IDN labels as soon as possible with usability as a secondary consideration. > Our recommendations provide for IDN to be looked up with > Punycode only in the resolution phase. For many apps that is > transparent to the app and they can just use Unicode with the > system API set. Some apps, however may need to recognize > Punycode domains for whatever reason or additional protocol > limitations. However in those cases it is still recommended > that they prefer Unicode and only use the Punycode when > absolutely necessary. That is in essence what RFC 6055 says although there are some nuances that might be worth attention. In addition and independent of when the Punycode algorithm is applies, there are some significant advantages to applying the tests for U-label validity as early in the process as possible. If that is not done, error reporting is likely to deteriorate into "you lose" or complaining about strings the user has never seen rather than being able to report problems to users in and with contexts that make sense to them. john
- [EAI] IDN in Unicode Shawn Steele
- Re: [EAI] IDN in Unicode John C Klensin