[idn] Stepping back and taking another look (long)

John C Klensin <klensin@jck.com> Tue, 29 May 2001 09:25 UTC

Received: from psg.com (exim@[147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with SMTP id FAA27770 for <idn-archive@lists.ietf.org>; Tue, 29 May 2001 05:25:09 -0400 (EDT)
Received: from lserv by psg.com with local (Exim 3.16 #1) id 154fVw-000I2H-00 for idn-data@psg.com; Tue, 29 May 2001 02:10:08 -0700
Received: from [209.187.148.217] (helo=P2) by psg.com with esmtp (Exim 3.16 #1) id 154fVv-000I1u-00 for idn@ops.ietf.org; Tue, 29 May 2001 02:10:07 -0700
Date: Tue, 29 May 2001 05:09:50 -0400
From: John C Klensin <klensin@jck.com>
To: idn@ops.ietf.org
Subject: [idn] Stepping back and taking another look (long)
Message-ID: <3270512440.991112990@localhost>
X-Mailer: Mulberry/2.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

IDN WG participants, 

Hi.  I have been struggling about whether to send this note,
or something like it, for several weeks now.  Some of you have
inferred parts of the content from remarks I have made
earlier; it may come as a surprise to others.  The request
from Marc and James for a consensus conclusion/ straw poll on
IDNA and ACE and its results precipitates the note: as
discussed in detail below, I do not agree that they are the
right way to go, but feel a need to offer an alternatives
view, however incomplete.   

It was also driven by the recent (and ongoing) "ACE versus
UTF-8" discussion: as the text (and the two I-Ds) I hope make
clear, I see the long-term solution to the real problem as
lying outside the DNS entirely, bringing a wider range of
coding options with it, and its being less important what, if
anything, we do to (in?) the DNS in the shorter term.  Hence I
also don't feel that the ACE versus UTF-8 discussion is very
important, although I believe that many of those issues
(although not necessarily the arguments) would be basically
the same for any approach we take, even the more extreme ones
suggested here.

I apologize in advance for the length of this note.  I've
attempted to throughly explain the issues as I see them,
rather than engage in the many short notes and back-and-forth
sniping we've seen about less complex topics before the WG.
Think of it as nearly an I-D that doesn't deserve even that
level of status, at least in this informal form.

One observation up front (to anticipate parts of section 3):
Section 3 argues that the existence of a short-term solution
that works much of the time for some fraction of the network
will block a long-term, more effective, soution from ever
deploying.  Despite what appears there, that argument cannot
be proven but is a matter of belief.  If one believes it, then
this document is an argument for radically reconsidering the
course IDN is on today and then heading off in another
direction.  If one does not, some of it, and much of another
document [LSearch] is a roadmap for where we (IETF and the
Internet community) should be going after we finish the
current work (and perhaps an argument that we should get
started on it even before the current work is finished).

Some of it, too, can be taken as an argument for an extremely
conservative approach to DNS changes (e.g., see [NEWCLASS] and
a forthcoming document which I understand David Lawrence is
working on and that should fill in many of the details my "new
class" one leaves out), rather than attempting to sneak new
functionality into the existing machinery.

Finally, the word "patent" doesn't appear below or in the new
drafts which it introduces.  If the approach I'm suggesting
permits getting around intellectual property rights claims, so
much the better.  But I don't think that should be (or needs
to be) a major motivation.

The terminology in this note is consistent with the
definitions in [DIRDEF].



1. Context of WG efforts and decision-making

Part of what has delayed these notes has been a debate, not
about technical content, but about the behavior of members of
the WG.  It saddens me that the IETF has reached the point
where decisions sometimes get made on that basis; but the
issues here seem important enough to take the risks.  So,
before I get to substance, I want to make an appeal to the WG:

Since its inception, the work of the IDN WG has been made more
complex by the presence and active (loud) participation of at
least two factions:

   (i) A small collection of parties with a strong financial
   investment in particular outcomes and more commitment to
   those outcomes than to finding a solution that works well
   for the Internet as a whole.  It has appeared from time to
   time that a subset of these parties would prefer to see the
   WG fail completely than to have it adopt a solution other
   than their own -- at least, that way, they would not be
   competing against an agreed-upon standard.

   (ii) A few individuals or groups who are taking what seems
   to be an "a simple approach works for me, and for people
   whose languages use the same code tables as mine (or ones I
   can predict), and therefore it is fine".  Sometimes these
   are "80% solution" arguments instead, but the impact is the
   same: some portion of the present or future Internet user
   population is being written off in the interest of a
   simplistic and straightforward solution.

There have been several arguments made that I should not send
this note because it will immediately play into the hands of
those two groups: they will "win" and the rest of the Internet
will "lose".  I have more confidence in the ability of the WG
membership to be able to filter out the noise (and the
impulses to get _some sort_ of solution, "right now") and
address the real issues carefully and thoughtfully.  I hope
I'm right.  If I'm not, I think we are all in serious trouble
(whether or not the substantive comments below are correct).


2. Layering of solutions

It has been suggested that introducing a "directory" or
"keywords" into, or above, the DNS could be used as a solution
to the problem.  Probing those statements often quickly
demonstrates that their advocates don't agree on exactly what
they mean.  Some aspects of the desired solutions are clear:
they permit matching that is at least somewhat imprecise, so
that, unlike the DNS, it becomes possible to expose
near-matches to users and let the human beings, rather than
overly-precise computer systems, make the decisions.  And many
of them are intended to permit a certain amount of
localization (and localization is very difficult, if not
impossible, to provide in the DNS without creating worse
problems).

The motivation for those approaches is discussed in a revised
version of the "DNS Role" document [DNSROLE] and a way of
integrating them is outlined in a new document [LSearch].
Informal discussions with the relevant ADs seems to indicate
that the technical strategy implied by those documents, and
the one discussed in [NEWCLASS], are not IDN work items.  But
the IDN WG should be aware of them because they may --I
believe should-- inform the IDN efforts.  See section 4.


3. Short-term as a blocking mechanism for long-term and
related issues

3.1 Internet applications history and its implications

Whether or not it is a good thing, there is one piece of
internet history that should be understood as IDN options are
considered.  Our history of replacing, rather than trying to
incrementally improve, applications that turn out to be
defective is just miserable.  Basically, we have almost never
done it successfully unless one or both of two conditions are
met:

 * The application being replaced is widely viewed as being
 incompetent or as having failed in very significant ways.

 * The new application is perceived as filling a vacuum or
 niche where no application (or even the perceived need for
 one) existed before.

Based upon this, there is strong reason to fear that the
deployment of a DNS-based solution that solves even a small
fraction of the perceived problems will prevent a non-DNS
solution, or a solution layered on top of the DNS, from ever
being developed and effectively deployed at least until that
solution is seen as failing in ways that cause serious
problems that cannot be ignored.  On the other hand, there is
reason to believe that some of the proposed (inside and
outside the WG) solutions that rely on local conventions or
that solve the wrong problem will fall into that "fail
significantly" criterion and that we will be able to apply a
broader and more targeted approach some years down the line.

But, if a solution based on a searching-capable system were to
be deployed eventually, some of the DNS modification options
now being considered would become obsolete, facing us with the
problem of determining whether to remove them (and improve
efficiency at the cost of backward compatibility) or leave
them there and incur the inefficiencies and complexities
forever (see 4.2 below).

3.2 "A directory is only a few years away".

In fairness, our history with variants on the proposed
solution may be as bad as that outlined above for applications
replacement.  We have been hearing that good, directory-based
solutions to one problem or another, including comprehensive
"white pages" services, are nearly here and should take over
the DNS problems in "a couple of years", for a very long time
now.  Arguably, the problem has been a combination of
bickering over small details (and too many options to bicker
over) and insufficient real demand, but it may be more
fundamental.  I believe that a good-quality, worldwide,
multilingual naming solution is important enough to create the
needed demand (especially when combined with some of the other
issues discussed in the "DNS role" document), but I know
opinions differ on that subject.

3.3 The sense of time pressure

One of the things that is driving many people in the WG is the
feeling that there are enough commercial, and often local,
solutions being developed that, if the IETF does not produce a
solution quickly, we will become irrelevant and people will
just go their own ways.  They may be right.  On the other
hand, we may have already lost that war: it is arguably
already too late for "quickly".  And producing a solution that
does not really fix the problem, especially a late one, does
not really help us --or the IETF's credibility in this area--
either.  All of this makes the "real" definition of the
problem --something I said a few words about in Minneapolis--
very important.

3.4 The real problem

One of the often-hidden debates in the WG about what problem
we are trying to solve.  I think there are at least three
versions of the answer:

* "Just get non-ASCII names into the DNS; let the users
  figure out what matches and adopt adequate conventions".  If
  this is the problem, then nameprep is probably unnecessary,
  or can be lightened considerably.

* "Just get non-ASCII names into the DNS, but make sure that
  identifiers that users will think match given that they know
  the relevant script, etc., do match".  If this is the
  problem, then we need nameprep, and we can (and I think
  must) continue to argue about whether it is good enough.

* "People are really looking for support for names in natural
  languages, not just identifiers".  The law of least
  astonishment applies and we must keep our expectations of
  the degree to which cultures and assumptions will change to
  meet the Internet's needs quite moderate.  Mechanisms for
  dealing with ambiguity are necessary (probably even within
  the nameprep framework, e.g., to deal with look-alike
  characters).  And DNS-based solutions are probably,
  inevitably, inadequate. 

And those issues are, of course, independent of the ACE versus
other things debate, the DNS class issues, etc.



4. Where we should go next

4.1 The question facing the IDN WG is what to do next.  I believe
that there is a case to be made (and I hope I'm making it) for
more or less declaring success and stopping.   That would
involve

 (i) Finishing and publishing the "requirements" and
    "nameprep" documents, the latter as a guide to
    canonicalization and matching in the context of Unicode/
    10646 strings.  IMO, the "nameprep" group should, however,
	take a careful look at ISO/IEC 14651 to see if it provides
	a useful alternate or supplemental model.

    It would also be extremely useful to review the code
	points impacted by nameprep to permit moving away from a
	binary model and toward a "clearly map", "clearly do not
	map", and "ambiguous" (or "sometimes") one.  That third
	category could help focus our thinking and, to the extent
	that we move toward a search-based model, would identify
	important areas for interactive user choice or the
	requirement for additional semantics or heuristics.

 (ii) Evaluating and selecting among the various ACE-based 
    encoding techniques and then publishing the results.  Even
    though I do not believe these should be deployed as
    changes in how the DNS itself is populated and used, the
    techniques are almost certain to be useful in other
    contexts.

And then handing either the "DNS searching" problem or the
"new DNS class" one, or both, over to other working groups.


4.2 If we can deploy this type of multilevel base, should we
still change the DNS?

If a search-based model is adopted, most of the multilingual
"action" would occur above the DNS layer.  Technically, there
would be no requirement that any changes (actual or
conceptual) at all be made to the DNS or its applications
interfaces, i.e., we could go back to treating DNS labels as
protocol elements with rather restrictive, applications-
driven, format and content rules.

On the other hand, the marketing events and pressures of the
last year may argue for making DNS changes to accomodate at
least some non-ASCII strings, if only to provide mneumonic
identifiers for languages not utilizing Roman-based alphabets.
Of course, this runs some risk of increased complexity and
unanticipated damage (at least unless a "new class" solution
is adopted), but the tradeoff is worth considering.

As we consider it, it is also worth remembering that the most
important function of the DNS is tied to long-term stable
identifiers.  We just don't have any other satisfactory way to
do them and they are very important.  Things that put that use
at risk should, arguably, be dealt with in other ways.

4.4 Deployment against DNS base

As with the "new class" approach to DNS changes [NEWCLASS], the
approach outlined here does not require any changes to the
existing installed DNS base.  But, like all solutions to the
multilingual name issues, it requires changes to all relevant
applications.  The notion of moving from lookup to searching
does imply that we will need, not merely to change the code
that calls the name resolution system, but to rethink the UIs
of those applications.


References

(I-D names and numbers current or submitted at the time of
this writing.)

[DIRDEF] Alvestrand, Harald. "Definitions for talking about
directories", work in progress,
draft-alvestrand-directory-defs-02.txt

[DNSROLE] Klensin, John. "Role of the Domain Name System",
work in progress, draft-klensin-dns-role-01.txt

[LSearch] Klensin, John.  "A Search-based access model for the
DNS", work in progress, draft-klensin-dns-search-00.txt.

[NEWCLASS] Klensin, John, "Internationalizing the DNS -- A New
Class", work in progress, draft-klensin-i18n-newclass-00.txt