[idn] Stepping back and taking another look (long)
John C Klensin <klensin@jck.com> Tue, 29 May 2001 09:25 UTC
Received: from psg.com (exim@[147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with SMTP id FAA27770 for <idn-archive@lists.ietf.org>; Tue, 29 May 2001 05:25:09 -0400 (EDT)
Received: from lserv by psg.com with local (Exim 3.16 #1) id 154fVw-000I2H-00 for idn-data@psg.com; Tue, 29 May 2001 02:10:08 -0700
Received: from [209.187.148.217] (helo=P2) by psg.com with esmtp (Exim 3.16 #1) id 154fVv-000I1u-00 for idn@ops.ietf.org; Tue, 29 May 2001 02:10:07 -0700
Date: Tue, 29 May 2001 05:09:50 -0400
From: John C Klensin <klensin@jck.com>
To: idn@ops.ietf.org
Subject: [idn] Stepping back and taking another look (long)
Message-ID: <3270512440.991112990@localhost>
X-Mailer: Mulberry/2.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit
IDN WG participants, Hi. I have been struggling about whether to send this note, or something like it, for several weeks now. Some of you have inferred parts of the content from remarks I have made earlier; it may come as a surprise to others. The request from Marc and James for a consensus conclusion/ straw poll on IDNA and ACE and its results precipitates the note: as discussed in detail below, I do not agree that they are the right way to go, but feel a need to offer an alternatives view, however incomplete. It was also driven by the recent (and ongoing) "ACE versus UTF-8" discussion: as the text (and the two I-Ds) I hope make clear, I see the long-term solution to the real problem as lying outside the DNS entirely, bringing a wider range of coding options with it, and its being less important what, if anything, we do to (in?) the DNS in the shorter term. Hence I also don't feel that the ACE versus UTF-8 discussion is very important, although I believe that many of those issues (although not necessarily the arguments) would be basically the same for any approach we take, even the more extreme ones suggested here. I apologize in advance for the length of this note. I've attempted to throughly explain the issues as I see them, rather than engage in the many short notes and back-and-forth sniping we've seen about less complex topics before the WG. Think of it as nearly an I-D that doesn't deserve even that level of status, at least in this informal form. One observation up front (to anticipate parts of section 3): Section 3 argues that the existence of a short-term solution that works much of the time for some fraction of the network will block a long-term, more effective, soution from ever deploying. Despite what appears there, that argument cannot be proven but is a matter of belief. If one believes it, then this document is an argument for radically reconsidering the course IDN is on today and then heading off in another direction. If one does not, some of it, and much of another document [LSearch] is a roadmap for where we (IETF and the Internet community) should be going after we finish the current work (and perhaps an argument that we should get started on it even before the current work is finished). Some of it, too, can be taken as an argument for an extremely conservative approach to DNS changes (e.g., see [NEWCLASS] and a forthcoming document which I understand David Lawrence is working on and that should fill in many of the details my "new class" one leaves out), rather than attempting to sneak new functionality into the existing machinery. Finally, the word "patent" doesn't appear below or in the new drafts which it introduces. If the approach I'm suggesting permits getting around intellectual property rights claims, so much the better. But I don't think that should be (or needs to be) a major motivation. The terminology in this note is consistent with the definitions in [DIRDEF]. 1. Context of WG efforts and decision-making Part of what has delayed these notes has been a debate, not about technical content, but about the behavior of members of the WG. It saddens me that the IETF has reached the point where decisions sometimes get made on that basis; but the issues here seem important enough to take the risks. So, before I get to substance, I want to make an appeal to the WG: Since its inception, the work of the IDN WG has been made more complex by the presence and active (loud) participation of at least two factions: (i) A small collection of parties with a strong financial investment in particular outcomes and more commitment to those outcomes than to finding a solution that works well for the Internet as a whole. It has appeared from time to time that a subset of these parties would prefer to see the WG fail completely than to have it adopt a solution other than their own -- at least, that way, they would not be competing against an agreed-upon standard. (ii) A few individuals or groups who are taking what seems to be an "a simple approach works for me, and for people whose languages use the same code tables as mine (or ones I can predict), and therefore it is fine". Sometimes these are "80% solution" arguments instead, but the impact is the same: some portion of the present or future Internet user population is being written off in the interest of a simplistic and straightforward solution. There have been several arguments made that I should not send this note because it will immediately play into the hands of those two groups: they will "win" and the rest of the Internet will "lose". I have more confidence in the ability of the WG membership to be able to filter out the noise (and the impulses to get _some sort_ of solution, "right now") and address the real issues carefully and thoughtfully. I hope I'm right. If I'm not, I think we are all in serious trouble (whether or not the substantive comments below are correct). 2. Layering of solutions It has been suggested that introducing a "directory" or "keywords" into, or above, the DNS could be used as a solution to the problem. Probing those statements often quickly demonstrates that their advocates don't agree on exactly what they mean. Some aspects of the desired solutions are clear: they permit matching that is at least somewhat imprecise, so that, unlike the DNS, it becomes possible to expose near-matches to users and let the human beings, rather than overly-precise computer systems, make the decisions. And many of them are intended to permit a certain amount of localization (and localization is very difficult, if not impossible, to provide in the DNS without creating worse problems). The motivation for those approaches is discussed in a revised version of the "DNS Role" document [DNSROLE] and a way of integrating them is outlined in a new document [LSearch]. Informal discussions with the relevant ADs seems to indicate that the technical strategy implied by those documents, and the one discussed in [NEWCLASS], are not IDN work items. But the IDN WG should be aware of them because they may --I believe should-- inform the IDN efforts. See section 4. 3. Short-term as a blocking mechanism for long-term and related issues 3.1 Internet applications history and its implications Whether or not it is a good thing, there is one piece of internet history that should be understood as IDN options are considered. Our history of replacing, rather than trying to incrementally improve, applications that turn out to be defective is just miserable. Basically, we have almost never done it successfully unless one or both of two conditions are met: * The application being replaced is widely viewed as being incompetent or as having failed in very significant ways. * The new application is perceived as filling a vacuum or niche where no application (or even the perceived need for one) existed before. Based upon this, there is strong reason to fear that the deployment of a DNS-based solution that solves even a small fraction of the perceived problems will prevent a non-DNS solution, or a solution layered on top of the DNS, from ever being developed and effectively deployed at least until that solution is seen as failing in ways that cause serious problems that cannot be ignored. On the other hand, there is reason to believe that some of the proposed (inside and outside the WG) solutions that rely on local conventions or that solve the wrong problem will fall into that "fail significantly" criterion and that we will be able to apply a broader and more targeted approach some years down the line. But, if a solution based on a searching-capable system were to be deployed eventually, some of the DNS modification options now being considered would become obsolete, facing us with the problem of determining whether to remove them (and improve efficiency at the cost of backward compatibility) or leave them there and incur the inefficiencies and complexities forever (see 4.2 below). 3.2 "A directory is only a few years away". In fairness, our history with variants on the proposed solution may be as bad as that outlined above for applications replacement. We have been hearing that good, directory-based solutions to one problem or another, including comprehensive "white pages" services, are nearly here and should take over the DNS problems in "a couple of years", for a very long time now. Arguably, the problem has been a combination of bickering over small details (and too many options to bicker over) and insufficient real demand, but it may be more fundamental. I believe that a good-quality, worldwide, multilingual naming solution is important enough to create the needed demand (especially when combined with some of the other issues discussed in the "DNS role" document), but I know opinions differ on that subject. 3.3 The sense of time pressure One of the things that is driving many people in the WG is the feeling that there are enough commercial, and often local, solutions being developed that, if the IETF does not produce a solution quickly, we will become irrelevant and people will just go their own ways. They may be right. On the other hand, we may have already lost that war: it is arguably already too late for "quickly". And producing a solution that does not really fix the problem, especially a late one, does not really help us --or the IETF's credibility in this area-- either. All of this makes the "real" definition of the problem --something I said a few words about in Minneapolis-- very important. 3.4 The real problem One of the often-hidden debates in the WG about what problem we are trying to solve. I think there are at least three versions of the answer: * "Just get non-ASCII names into the DNS; let the users figure out what matches and adopt adequate conventions". If this is the problem, then nameprep is probably unnecessary, or can be lightened considerably. * "Just get non-ASCII names into the DNS, but make sure that identifiers that users will think match given that they know the relevant script, etc., do match". If this is the problem, then we need nameprep, and we can (and I think must) continue to argue about whether it is good enough. * "People are really looking for support for names in natural languages, not just identifiers". The law of least astonishment applies and we must keep our expectations of the degree to which cultures and assumptions will change to meet the Internet's needs quite moderate. Mechanisms for dealing with ambiguity are necessary (probably even within the nameprep framework, e.g., to deal with look-alike characters). And DNS-based solutions are probably, inevitably, inadequate. And those issues are, of course, independent of the ACE versus other things debate, the DNS class issues, etc. 4. Where we should go next 4.1 The question facing the IDN WG is what to do next. I believe that there is a case to be made (and I hope I'm making it) for more or less declaring success and stopping. That would involve (i) Finishing and publishing the "requirements" and "nameprep" documents, the latter as a guide to canonicalization and matching in the context of Unicode/ 10646 strings. IMO, the "nameprep" group should, however, take a careful look at ISO/IEC 14651 to see if it provides a useful alternate or supplemental model. It would also be extremely useful to review the code points impacted by nameprep to permit moving away from a binary model and toward a "clearly map", "clearly do not map", and "ambiguous" (or "sometimes") one. That third category could help focus our thinking and, to the extent that we move toward a search-based model, would identify important areas for interactive user choice or the requirement for additional semantics or heuristics. (ii) Evaluating and selecting among the various ACE-based encoding techniques and then publishing the results. Even though I do not believe these should be deployed as changes in how the DNS itself is populated and used, the techniques are almost certain to be useful in other contexts. And then handing either the "DNS searching" problem or the "new DNS class" one, or both, over to other working groups. 4.2 If we can deploy this type of multilevel base, should we still change the DNS? If a search-based model is adopted, most of the multilingual "action" would occur above the DNS layer. Technically, there would be no requirement that any changes (actual or conceptual) at all be made to the DNS or its applications interfaces, i.e., we could go back to treating DNS labels as protocol elements with rather restrictive, applications- driven, format and content rules. On the other hand, the marketing events and pressures of the last year may argue for making DNS changes to accomodate at least some non-ASCII strings, if only to provide mneumonic identifiers for languages not utilizing Roman-based alphabets. Of course, this runs some risk of increased complexity and unanticipated damage (at least unless a "new class" solution is adopted), but the tradeoff is worth considering. As we consider it, it is also worth remembering that the most important function of the DNS is tied to long-term stable identifiers. We just don't have any other satisfactory way to do them and they are very important. Things that put that use at risk should, arguably, be dealt with in other ways. 4.4 Deployment against DNS base As with the "new class" approach to DNS changes [NEWCLASS], the approach outlined here does not require any changes to the existing installed DNS base. But, like all solutions to the multilingual name issues, it requires changes to all relevant applications. The notion of moving from lookup to searching does imply that we will need, not merely to change the code that calls the name resolution system, but to rethink the UIs of those applications. References (I-D names and numbers current or submitted at the time of this writing.) [DIRDEF] Alvestrand, Harald. "Definitions for talking about directories", work in progress, draft-alvestrand-directory-defs-02.txt [DNSROLE] Klensin, John. "Role of the Domain Name System", work in progress, draft-klensin-dns-role-01.txt [LSearch] Klensin, John. "A Search-based access model for the DNS", work in progress, draft-klensin-dns-search-00.txt. [NEWCLASS] Klensin, John, "Internationalizing the DNS -- A New Class", work in progress, draft-klensin-i18n-newclass-00.txt
- [idn] Stepping back and taking another look (long) John C Klensin