Re: [idn] process
"Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord.cnri.reston.va.us> Sat, 26 February 2005 08:23 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA10731 for <idn-archive@lists.ietf.org>; Sat, 26 Feb 2005 03:23:39 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D4xAV-000JiU-BL for idn-data@psg.com; Sat, 26 Feb 2005 08:19:19 +0000
Received: from [128.32.132.165] (helo=nicemice.net) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1D4xAS-000Ji9-2Q for idn@ops.ietf.org; Sat, 26 Feb 2005 08:19:16 +0000
Received: from amc by nicemice.net with local (Exim 3.35 #1 (Debian)) id 1D4xAP-0004rB-00 for <idn@ops.ietf.org>; Sat, 26 Feb 2005 00:19:13 -0800
Date: Sat, 26 Feb 2005 08:19:13 +0000
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord.cnri.reston.va.us>
To: idn@ops.ietf.org
Subject: Re: [idn] process
Message-ID: <20050226081913.GD14956~@nicemice.net>
Reply-To: IETF idn working group <idn@ops.ietf.org>
References: <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <421FCBD7.8000805@vanderpoel.org> <421FA55B.9000308@vanderpoel.org> <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com> <00a401c51af3$7863aae0$030aa8c0@DEWELL>
User-Agent: Mutt/1.5.6+20040722i
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Doug Ewell <dewell@adelphia.net> wrote: > Is it really possible that we spent a year and a half, two years on > putting together an IDN architecture, and during all that time nobody > ever gave the slightest thought to the possibility of someone using > IDNs for spoofing purposes, No, it was thought about, and it was decided that the IDNA protocol was not the place to address those issues; that they should be addressed in registries and user interfaces. IDNA could have addressed the easier portion of the problem (prohibiting punctuation and symbols) (and for a while I was arguing for that), but it still would have left the harder part of the problem (dealing with script mixtures and homographs among letters) for the registries and user interfaces to deal with, so why not let them deal with the easier part too? (Of course, one could then ask why that argument doesn't apply to all the invisible characters that IDNA does prohibit. I have no good answer at the moment. Maybe invisibility was the only disqualifying attribute that everyone could agree on.) John C Klensin <klensin@jck.com> wrote: > I hope that those who wrote the IDNA specs will agree with the > statement of those principles I'm about to make, or at least that they > are close... they may not. > > (1) To the extent possible, we should accommodate all Unicode > characters, excluding as little as possible. That (or something very similar) was a principle that went into the IDNA spec. I personally was inclined to define both internationalized domain names and internationalized host names, where the former would be completely general (allowing *all* Unicode characters, even the invisible ones), and the latter would be much narrower (excluding most punctuation and symbols). This would be an analogy to traditional domain names (which allow all ASCII characters, even control characters) and traditional host names (which allow only the ASCII letters, digits, and one punctuation mark, the hyphen-minus). On the other hand, there was an argument that the traditional distinction between domain names and host names was the source of endless confusion and debate, and was a mistake that should not be repeated with IDNs. I have some sympathy for that argument. In any case, we ended up with just one set of non-ASCII characters for IDNs, between the two extremes: only invisible characters are excluded. (I think there's one exception--a visible space character that is also excluded). > (2) When code points had been identified by UTC as the same as, or > equivalent to, others, we tended to map them together, rather than > picking one and prohibiting the others. This was more than a tendency; it was strictly followed. > This has caused more problems than most of us expected, with people > being surprised when they register or query using one character and > the result that comes back uses another. I think this happens only for the case-folding mappings. The normalization mappings should not surprise anyone. > It also creates a near-homograph problem that we haven't "discovered" > in the last couple of weeks: If we have character X mapping to > character Y, but X looks vaguely like Z, then there may be no Y-Z > homograph, but there may be an X-Z one. True. And again, I think it's just the case-folding mappings that do this, not the normalization mappings. > Curiously, if we followed existing precedents, we could even move > IDNA from Proposed to Draft and change the tables to eliminate many > mappings and characters: no change to the algorithm, just elimination > of some features that didn't work in practice. If we want to place further restrictions the set of characters used in IDNs, I think it would be pretty rude of us to simply add them to the set of prohibited characters in Nameprep. What about the guy who registered <not_equal>.com? What if people had already bookmarked that site, and created links to it? Are we just going to break those links? A less rude approach would be to recommend that domain labels containing certain characters not be displayed. Their ACE forms could still be display, and they could still be looked up. The domain holder in this example could register a new displayable domain name, and could put an HTTP redirector at the old site, and existing bookmarks and links would continue to work. Erik van der Poel <erik@vanderpoel.org> wrote: > I believe it would be difficult to reach consensus on a relatively > narrow extension of the LDH rule. Just for starters, the hyphen used > to separate names and other strings in the Western world is not used > in Japan for Katakana, because Katakana uses a middle dot (U+30FB) to > separate 2 Katakana strings. In fact, this character is allowed in > .jp. But notice how seldom the hyphen-minus is actually used in domain names. People prefer to just run words together, even in languages that customarily use word breaks. Maybe the analogous characters in other scripts (like the katakana middle dot) would likewise be very seldom used in practice (especially in Japan where the lack of word breaks is the norm), and would not be missed if they were deprecated. > It may be possible to "tune" the tables, but nowhere in your email do > I find any reference to the ACE prefix. I think that we should also > figure out exactly which types of changes would absolutely require a > new ACE prefix, Coming up with the necessary and sufficient conditions will be tricky, but now that you've got me thinking about it, I think I can supply one sufficient condition: If the only changes you make are to add characters to the prohibited table, I don't think you need to change the ACE prefix. This would cause some valid IDN labels under the old spec to become invalid under the new spec, and would cause some valid ACE labels under the old spec to become bogo-ACE labels under the new space. (The bogo-ACE phenomenon already exists: there are labels that begin with the ACE prefix but don't validate during ToUnicode and therefore display as literal ASCII strings.) It would not cause anything to encode or decode to something different than it used to. But I don't advocate making such a change (see my argument above about rudeness). AMC
- [idn] related work Erik van der Poel
- [idn] Unicode categories Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] something a little lighter for the week… Doug Ewell
- Re: [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: process Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] process Paul Hoffman
- Re: [idn] Re: character tables YAO Jiankang
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] punctuation tedd
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Adam M. Costello
- [idn] Re: character tables John C Klensin
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability John C Klensin
- [idn] Re: Unicode categories John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] character tables Erik van der Poel
- Re: [idn] Re: character tables John C Klensin
- Re: [idn] Re: stability Mark Davis
- Re: [idn] Re: stringprep: PRI #29 Erik van der Poel
- [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] process Adam M. Costello
- Re: [idn] Re: character tables William Tan
- Re: [idn] Re: process James Seng
- [idn] Re: stability Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] Re: stringprep: PRI #29 Adam M. Costello
- Re: [idn] punctuation tedd
- [idn] Re: dichotomies Erik van der Poel
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] process JFC (Jefsey) Morfin
- [idn] Re: stability Simon Josefsson
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- [idn] Re: stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] process John C Klensin
- Re: [idn] Re: Unicode categories Mark Davis
- Re: [idn] process Doug Ewell
- Re: [idn] Re: stability Adam M. Costello
- Re: [idn] process Erik van der Poel
- [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation tedd
- [idn] punctuation Erik van der Poel
- Re: [idn] Re: stability James Seng
- [idn] Re: stability Simon Josefsson
- [idn] something a little lighter for the weekend Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] process Gervase Markham
- [idn] Re: character tables Cary Karp
- [idn] Mozilla? JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation Erik van der Poel
- [idn] Re: Unicode categories Erik van der Poel
- [idn] Re: stability Simon Josefsson
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- [idn] Re: process Stephane Bortzmeyer
- Re: [idn] process Erik van der Poel
- Re: [idn] punctuation Jaap Akkerhuis
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: process James Seng
- [idn] stringprep mailing list Erik van der Poel
- Re: [idn] Re: dichotomies Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] stability Martin v. Löwis
- [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] Re: stability Simon Josefsson
- [idn] process Erik van der Poel
- [idn] stringprep: existing profiles and string pr… Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] dichotomies Erik van der Poel
- Re: [idn] stability JFC (Jefsey) Morfin
- [idn] Re: character tables Cary Karp
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep mailing list Simon Josefsson
- Re: [idn] Re: Unicode categories Martin v. Löwis
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] something a little lighter for the week… John C Klensin
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson