Re: [idn] I fear I cannot use IDN in the next 10 years

"Eric A. Hall" <ehall@ehsco.com> Tue, 09 October 2001 01:57 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA06573 for <idn-archive@lists.ietf.org>; Mon, 8 Oct 2001 21:57:03 -0400 (EDT)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 15qlrG-000Gss-00 for idn-data@psg.com; Mon, 08 Oct 2001 18:38:58 -0700
Received: from goose.ehsco.com ([207.65.203.98]) by psg.com with esmtp (Exim 3.33 #1) id 15qlrE-000Gsm-00 for idn@ops.ietf.org; Mon, 08 Oct 2001 18:38:57 -0700
Received: from [24.252.219.84] (account ehall HELO ehsco.com) by goose.ehsco.com (CommuniGate Pro SMTP 3.4.8) with ESMTP-TLS id 46312 for idn@ops.ietf.org; Sun, 07 Oct 2001 20:38:36 -0500
Message-ID: <3BC25528.FC918BB4@ehsco.com>
Date: Mon, 08 Oct 2001 20:38:48 -0500
From: "Eric A. Hall" <ehall@ehsco.com>
Organization: EHS Company
X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: idn@ops.ietf.org
Subject: Re: [idn] I fear I cannot use IDN in the next 10 years
References: <200110040700.f9470bj18628@valinor.malmo.trab.se> <p0510030eb7e24eb9f210@[165.227.249.20]> <5.1.0.14.2.20011004213749.02478cd0@dcrocker.net> <5.1.0.14.2.20011004235936.023e1500@dcrocker.net> <5.1.0.14.2.20011005141903.04649150@dcrocker.net> <3BBE467C.A9797B21@ehsco.com> <p0510033ab7e3fc24f02c@[165.227.249.20]>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

Paul,

First of all, I'd like to thank you for taking the time to enumerate your
many perceived technical issues with the UDNS model. These points deserve
to be fully debated and discussed.

However, I think that your comments also represent a larger issue which
needs to be resolved first. We need to come to a group consensus on the
question of whether or not a UTF-8 namespace is necessary, desirable, or
neither of those, and this needs to happen before the technical points can
be debated in appropriate context. Without consensus on this underlying
point, debates over relative costs will be debates over half-full vs
half-empty, and will get us nowhere.

For example:

> UDNS gives us all of these problems for the limited benefit that some
> applications don't have to implement one additional encoding for
> sending host names on the wire. Does that seem like a good balance?

The above is a comparison of UDNS' perceived cost relative to a benefit,
but we haven't fully discussed the benefits as of yet. Clearly, ACE has
many costs, some of which are quite high AND ongoing, although most of us
agree that some form of ACE is necessary, and is therefore worth whatever
cost may be required. What we do not agree on is the benefit that UDNS (or
a similar mechanism) would provide. Making a decision based on cost alone
is misrepresentative of the true value of UDNS, just as a decision on ACE
which was based solely on cost would have misrepresented the true value of
its benefits (being necessary, it can have almost any cost).

Personally, I believe that UDNS is also necessary, and is therefore worth
almost any cost (although as I will show, I think the cost is somewhat
lower than you do). I base this on a few key arguments:

 * BCP18's is "Official Internet Policy" which requires support for
   UTF-8 in all new protocols, and in modifications to all existing
   protocols: "lack of an ability to use UTF-8 is a violation of this
   policy; such a violation would need a variance procedure". Simply
   put, policy requires this WG to devise some kind of support for
   UTF-8, unless it can be proven unreasonable. Nobody has proven it
   to be unreasonable.

 * Without a UTF-8 DNS interface, no new protocols or applications
   can be developed that are UTF-8 clean. Instead, they will be
   UTF-8 for everything EXCEPT domain names, and in some cases this
   will be fatal. One example we have already discussed for this is
   mapping between LDAPv3 distinguished names and DNS domain names
   (mapping dc= RDNs to DNS). Failure to support UTF-8 is a heavy
   blow to such efforts, requiring tremendous amounts of development
   effort and infrastructure oversight, likely hindering the use
   and development of LDAP in the Internet.

 * Global public networking is only in its infancy; there will
   likely be many thousands of new protocols and applications which
   are developed over the next few decades (some in the IETF WGs,
   most in business, educational or personal networks, and most will
   be developed outside of the US). If those applications follow the
   Official Internet Policy, they will be UTF-8 only or use it as the
   preferred encoding. However, those applications will also be
   saddled with ACE conversion wherever they have to interact with
   domain names, meaning they cannot be UTF-8 clean. This will be
   extremely acute in non-US development environments.

 * Towards the above point, UDNS provides an optional, user-driven
   transition path from ASCII to UTF-8. UDNS allows applications to
   be written so that they only function in UTF-8 environments.
   While this may not seem a reasonable objective of this WG, this
   should be the long-term vision of what we are trying to achieve.
   We should be enabling the development of a truly international
   Internet infrastructure which is UTF-8 clean throughout, and
   UDNS provides this transitional path. Conversely, ACE does not
   provide this transition. In fact, the goal of seamless backwards
   compatibility actively hinders migration.

   Let me restate this point, since it is somewhat complex. Although
   UDNS supports a dual-mode model, once UDNS (or something similar)
   were approved, developers could begin to work on applications
   which ONLY used UDNS, without any code for ASCII or ACE. At first
   this would be small private apps, but over the course of a couple
   of decades, it would likely be most of the new apps. Without UDNS
   (or something similar), it would still be ASCII at the end of that
   timeframe. In short, we need UDNS if we are ever to drop ASCII.

 * The modernization and internationalization of legacy protocols,
   applications and formats will almost certainly require a UTF-8
   DNS eventually. It will not be possible to build and deploy
   UTF-8 extensions to SMTP without a UTF-8 DNS infrastructure. As
   with the above points, once a UTF-8 approach has gained some
   critical mass, these extensions become feasible. Without the
   infrastructure, SMTP will continue to be bound to ASCII.
  
 * UDNS is optional and transitional, on a per-domain basis. This
   allows a transition to occur as users see fit, without requiring
   a replacement of the existing DNS infrastructure with an
   alternate naming service. While this is technically not a
   "requirement" per se, the seamless user-driven transitional
   aspects should be requirements. Something very much like UDNS
   will be required for this. A "new naming service" is unlikely to
   reach critical mass without some kind of backwards compatibility,
   which UDNS provides.

   Furthermore, the cost of UDNS is incremental to that of ACE,
   since they share many common features and functions. If a
   developer is going to add ACE support to an application (which
   they must do in the short term), it is incremental to add UDNS
   support at the same time. On the other hand, adding ACE now and
   then going back to glue on something else is a second development
   effort, with greater costs.

 * UTF-8 is infinitely more manageable and serviceable than ACE.
   By being able to use UTF-8 tools and services, the Internet can
   be kept running much easier than users having to transliterate
   ACE operations. We should not be contributing to frailty. When
   a network is already broken, ACE obfuscates the problem by
   displaying aq--gobbledygook in traceroute, netstat, tcpdump and
   all of the other tools. ACE libs can certainly address this part
   of the issue, the extent of the support for tasks like importing
   trace data into a spreadsheet or viewing it in an editor is not
   as compelling. The massive number of UTF-8 tools are extremely
   compelling in terms of general manageability and serviceability
   of the global DNS.

 * Finally, there are some problems that ACE cannot solve, which
   UDNS can. The clipboard problem practically goes away, if not
   directly then indirectly, simply because we can facilitate the
   use of UTF-8 everywhere, rather than having to transliterate
   between application- and protocol-specific encodings. EG, if we
   ALLOW for the consistent use of UTF-8, then it is more likely
   to happen than if we actively PROHIBIT it by mandating yet
   another encoding which MUST be accounted for in every operation.

So that's the "benefit" side of the argument. If we can agree that these
are important considerations, and that some kind of support for UTF-8 is a
requirement (as per item #1), then we can debate costs and features as
they compare to that benefit.

Here is my evaluation of your perceived costs.

> - UDNS causes some strings that are legal in one encoding to be
> illegal in the other, and vice versa, meaning that some host names
> will be illegal part of the time, unpredictably

This issue would have to be resolved if we were ever to move beyond ACE.
It is beneficial to design for such limits up front, rather than revoking
names later which will be incompatible. This is only a cost if ACE and
some other service are developed at different times, and is NOT a cost if
they are developed together. This is, in fact, a motivating factor for
beginning work on UDNS *NOW*.

> - UDNS requires more work to be done by authoritative DNS servers

Yes, UDNS requires authoritative servers to maintain name mapping
information. However, I see this as being a transitional cost. In the
beginning, few clients will support UDNS, but over the course of several
years, most clients will support UDNS. At some point (two decades out?
shorter?) it should be possible to switch entirely to UDNS, meaning that
there will no longer be a need for servers to store both versions. The
longer we delay beginning such a transition, the longer it will be before
we can complete that transition. If we never start it, we will never
complete it.

> - UDNS UTF-8 queries that fail will cause more load on the DNS

Yes, it will cause one additional lookup. This is equitable with requiring
one extra delegation server in the path. This is not a great cost to begin
with. Furthermore, seeing as how it is also transitional, it will only be
a cost for the short-term, and will not be a cost after a few years.

> - UDNS is not compatible with DNS security

I'm not sure I understand this point. Also, DNSSEC has many issues, and
appears to be destined for a redesign at this point.

> - UDNS requires that applications have the logic to send a new DNS query
> format

Yes, an alternative API is required to prevent clashes with legacy
applications (this was proven by DJB's pi test). However, this is not a
cost burden which is solely related to UDNS. For example, when hostnames
are extended beyond the RFC972/1123 rules, an alternative wide() API will
be required for several functions. Applications which transliterate ACE at
relatively deep levels (such as for DHCP) will also require this. For
these reasons, I consider this a cost of internationalization rather than
a cost for UDNS.

> - Applications that implement the UTF-8 part of UDNS have to handle
> the inevitable errors that come from queries that bounce, have to
> recast those queries in ACE (assuming that the name is even legal in
> ACE, which some won't be), and then have to emit those queries again,
> causing more DNS traffic

As with the other issues, this is a transitional cost which dissipates as
deployment goes up. It is a cost in the beginning, but it is not a
significant cost after a few years.

> - The errors caused by UTF-8 queries are the same as other legitimate
> DNS errors, meaning that applications will have to have their own
> (probably-nonstandard) logic for differentiating between expected
> errors and real errors

Because UDNS is an applicaton of EDNS, it uses the EDNS error codes. My
experience is that these codes are farily simple to work with.

From my perspective, you have raised four transitional costs (one of which
is nearly permanent but which does have an end), no perpetual costs, and
one cost I don't grok. That is not very expensive considering the benefits
and requirements, in my opinion. Conversely, ACE-only does have at least
one ongoing cost which is quite high (client-side name mutations), and
several perpetual costs in terms of usability and internationalization of
the global Internet.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/