Re: [idn] process
John C Klensin <klensin@jck.com> Fri, 25 February 2005 17:34 UTC
Received: from psg.com (mailnull@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA11107 for <idn-archive@lists.ietf.org>; Fri, 25 Feb 2005 12:34:09 -0500 (EST)
Received: from majordom by psg.com with local (Exim 4.44 (FreeBSD)) id 1D4jI7-0002pf-5Y for idn-data@psg.com; Fri, 25 Feb 2005 17:30:15 +0000
Received: from [209.187.148.211] (helo=bs.jck.com) by psg.com with esmtp (Exim 4.44 (FreeBSD)) id 1D4jI4-0002pM-4c for idn@ops.ietf.org; Fri, 25 Feb 2005 17:30:12 +0000
Received: from [209.187.148.215] (helo=scan.jck.com) by bs.jck.com with esmtp (Exim 4.34) id 1D4jI2-0002sx-Us; Fri, 25 Feb 2005 12:30:11 -0500
Date: Fri, 25 Feb 2005 12:29:58 -0500
From: John C Klensin <klensin@jck.com>
To: Doug Ewell <dewell@adelphia.net>, idn@ops.ietf.org
cc: Erik van der Poel <erik@vanderpoel.org>
Subject: Re: [idn] process
Message-ID: <A574CA1BE87BFDA3C2A1AC0E@scan.jck.com>
In-Reply-To: <00a401c51af3$7863aae0$030aa8c0@DEWELL>
References: <421B8484.3070802@vanderpoel.org> <20050223072837.GA21463~@nicemice.net> <D872CCF059514053ECF8A198@scan.jck.com> <421D8411.9030006@vanderpoel.org> <p06210208be4390618c81@[192.168.0.101]> <421E0D0C.2000309@vanderpoel.org> <p06210202be43c3888991@[192.168.0.101]> <E07CE813AD23B2D95DA0C740@scan.jck.com> <421E30F2.1040408@vanderpoel.org> <0E7F74C71945B923C52211F3@scan.jck.com> <421EA0C9.1010500@vanderpoel.org> <00a401c51af3$7863aae0$030aa8c0@DEWELL>
X-Mailer: Mulberry/3.1.6 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on psg.com
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit
--On Thursday, 24 February, 2005 20:35 -0800 Doug Ewell <dewell@adelphia.net> wrote: > Erik van der Poel <erik at vanderpoel dot org> wrote: > >> 1. Is this the right time to start working on Internet Drafts >> leading up to new version(s) of the IDNA RFC(s)? If not, when? >> ... > > I don't know about anyone else, but something seems badly > wrong here. > > Is it really possible that we spent a year and a half, two > years on putting together an IDN architecture, and during all > that time nobody ever gave the slightest thought to the > possibility of someone using IDNs for spoofing purposes, and > now that one or two well-publicized spoofing examples have > appeared, we are ready to start all over again with a new and > probably incompatible version of the architecture? I certainly hope not. And certainly we knew about these issues. Even the potential problem with symbols and box-drawing characters was identified, although not in the lurid detail of some of the recent example. There was discussion around the WG about what to do about those issues, and how completely to describe them. I think the consensus at that time was to not write a lot of these issues up in detail for fear of discouraging IDNA implementations. That consensus was, IMO, reached in a WG in which many of the participants were, for various reasons, just anxious to get finished and not paying much attention to the finer details. While I advocated at least one radically different architecture while the IDN WG work was going on, I think, personally, that looking at a new and incompatible architecture would be pretty close to insane. As I tried to explain in my remarks on Erik's proposal that we reverse the presentation order of domain names, I just don't think it is possible to go there. And, even if we wanted to, there is no reason to believe that any other architecture would work better: these homograph problems are the inevitable consequence of the relationships among the scripts themselves: it is unlikely that even dumping Unicode and switching to something else would help very much. And there isn't any "something else". However, the decision to adopt two philosophical principles, and one strong assumption, went into the design of IDNA and its supporting tables. The assumption has not turned out to be completely valid and we may need to look harder at the implications of its failure. I suggest that either or both of the philosophical principles could be reviewed and, if necessary, changed in the light of experience and that neither change would be fatal to IDNs or IDNA, or even especially disruptive if not fairly soon. This is _not_ a suggestion that those changes should be made, only that it would be plausible for us to review the decisions and reach some conclusions about whether they are still appropriate in the light of experience. I hope that those who wrote the IDNA specs will agree with the statement of those principles I'm about to make, or at least that they are close... they may not. (1) To the extent possible, we should accommodate all Unicode characters, excluding as little as possible. This position was reinforced by the view that, at the time, the Unicode classifications of characters were considered a little soft and a general conviction that the IETF should not be making character-by-character decisions. A counter-principle, now if not then, is that we should permit a relatively narrow extension of the "letter-digit-hyphen" rule, i.e., permitting, only letters (in any alphabet or script), perhaps local digits, and the hyphen, but no other punctuation, symbols, drawing characters, or other non-letter characters. Adam has argued for that revised principle recently; several people argued for it when IDNA was being produced. We could probably still impose it, and, in any event, it would not require a change in the basic architecture (see below). (2) When code points had been identified by UTC as the same as, or equivalent to, others, we tended to map them together, rather than picking one and prohibiting the others. This has caused more problems than most of us expected, with people being surprised when they register or query using one character and the result that comes back uses another. It also creates a near-homograph problem that we haven't "discovered" in the last couple of weeks: If we have character X mapping to character Y, but X looks vaguely like Z, then there may be no Y-Z homograph, but there may be an X-Z one. That could make display decisions, etc., quite critical and, unless applications got it entirely right, we might end up with a new family of attacks. Again, that decision could be reviewed. Perhaps there are groups of characters that should be prohibited from being included in a lookup or registration operation, not just mapped to something more reasonable. And, again, this would be a tuning of tables, not a change in the basis architecture. The assumption I referred to above was that ICANN would take a strong role in determining which characters were really appropriate for registration and under what circumstances, that they would institute and enforce appropriate rules, and that everyone relevant would pay attention to whatever they said. Every element of that assumption has turned out to be false: they haven't taken that role; their guidelines are weak, ambiguous, and at least partially wrong; and some registries have just ignored the rules that do exist without any penalty. If there is a problem, either we are going to need to solve it, or we are going to risk different solutions in different applications that, taken together, compromise interoperability. > Is this sending the kind of "stability" message that was > considered so important two or three years ago? It is sending the "get it right and get it interoperable" message that is supposed to dominate IETF decision-making, especially with Proposed Standards. > Is there even enough solid information to begin writing > anything, or just a general feeling that Something Needs To Be > Done? I think it is time for us to ask the questions that are suggested above, and to ask them explicitly. If doing so produces the answer that it is time to make changes --table changes, not architectural changes-- I think we should do so. Perhaps we could combine that table review process with an upgrade to Unicode 4.x, which would accommodate several scripts we can't handle today. Could this be done compatibly? Not quite. For starters, we would have to address more squarely the question that the first principle identified above bypassed: does someone have the _right_ to register a particular sequence of Unicode characters? If the answer is that, because I can draw out a symbol that represents my business, or my religion, or my location, then I have the "right" to register it, then we are in trouble: someone out there will organize the Church of the Holy Right-Slash and prohibiting it will discriminate against that religion, especially if left-slashes and vertical bars are permitted. If we can get past "right to register", we need to look at the experience of the browser implementers who have already concluded that, registered or not, they really don't want to recognize or process domain names containing such characters. And then we need to present the transition problem of eliminating any such domains that may exist to ICANN and say "you were unable or ineffective at preventing these problems from occurring, so, as a prize, you get to figure out how to retire those names and are now prohibited by the updated standard". Curiously, if we followed existing precedents, we could even more IDNA from Proposed to Draft and change the tables to eliminate many mappings and characters: no change to the algorithm, just elimination of some features that didn't work in practice. That is not a proposal, just an observation :-) john
- [idn] related work Erik van der Poel
- [idn] Unicode categories Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] something a little lighter for the week… Doug Ewell
- Re: [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: process Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] process Paul Hoffman
- Re: [idn] Re: character tables YAO Jiankang
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] punctuation John C Klensin
- Re: [idn] punctuation tedd
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Gervase Markham
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Adam M. Costello
- [idn] Re: character tables John C Klensin
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability John C Klensin
- [idn] Re: Unicode categories John C Klensin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] character tables Erik van der Poel
- Re: [idn] Re: character tables John C Klensin
- Re: [idn] Re: stability Mark Davis
- Re: [idn] Re: stringprep: PRI #29 Erik van der Poel
- [idn] stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] process Adam M. Costello
- Re: [idn] Re: character tables William Tan
- Re: [idn] Re: process James Seng
- [idn] Re: stability Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stability Martin v. Löwis
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] Re: stringprep: PRI #29 Adam M. Costello
- Re: [idn] punctuation tedd
- [idn] Re: dichotomies Erik van der Poel
- Re: [idn] Re: stability Martin v. Löwis
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] process JFC (Jefsey) Morfin
- [idn] Re: stability Simon Josefsson
- Re: [idn] nameprep2 and the slash homograph issue JFC (Jefsey) Morfin
- [idn] Re: stringprep: PRI #29 Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Adam M. Costello
- Re: [idn] process John C Klensin
- Re: [idn] Re: Unicode categories Mark Davis
- Re: [idn] process Doug Ewell
- Re: [idn] Re: stability Adam M. Costello
- Re: [idn] process Erik van der Poel
- [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation tedd
- [idn] punctuation Erik van der Poel
- Re: [idn] Re: stability James Seng
- [idn] Re: stability Simon Josefsson
- [idn] something a little lighter for the weekend Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] process Gervase Markham
- [idn] Re: character tables Cary Karp
- [idn] Mozilla? JFC (Jefsey) Morfin
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] punctuation Erik van der Poel
- [idn] Re: Unicode categories Erik van der Poel
- [idn] Re: stability Simon Josefsson
- Re: [idn] Re: character tables JFC (Jefsey) Morfin
- [idn] Re: process Stephane Bortzmeyer
- Re: [idn] process Erik van der Poel
- Re: [idn] punctuation Jaap Akkerhuis
- Re: [idn] Re: character tables Gervase Markham
- Re: [idn] Re: process Jaap Akkerhuis
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: process James Seng
- [idn] stringprep mailing list Erik van der Poel
- Re: [idn] Re: dichotomies Erik van der Poel
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: character tables Erik van der Poel
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] punctuation Erik van der Poel
- Re: [idn] stability Martin v. Löwis
- [idn] stringprep: PRI #29 Erik van der Poel
- Re: [idn] Re: character tables Paul Hoffman
- Re: [idn] nameprep2 and the slash homograph issue Erik van der Poel
- [idn] Re: stability Simon Josefsson
- [idn] process Erik van der Poel
- [idn] stringprep: existing profiles and string pr… Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] dichotomies Erik van der Poel
- Re: [idn] stability JFC (Jefsey) Morfin
- [idn] Re: character tables Cary Karp
- Re: [idn] Re: process Erik van der Poel
- [idn] Re: stringprep mailing list Simon Josefsson
- Re: [idn] Re: Unicode categories Martin v. Löwis
- Re: [idn] Re: stability JFC (Jefsey) Morfin
- Re: [idn] something a little lighter for the week… John C Klensin
- Re: [idn] something a little lighter for the week… Adam M. Costello
- Re: [idn] Re: dichotomies JFC (Jefsey) Morfin
- Re: [idn] Re: stability Erik van der Poel
- Re: [idn] Re: stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson
- Re: [idn] stability Erik van der Poel
- [idn] Re: stringprep: PRI #29 Simon Josefsson