Re: [idn] I fear I cannot use IDN in the next 10 years

Paul Hoffman / IMC <phoffman@imc.org> Tue, 09 October 2001 03:55 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id XAA09714 for <idn-archive@lists.ietf.org>; Mon, 8 Oct 2001 23:55:38 -0400 (EDT)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 15qnqh-000LsP-00 for idn-data@psg.com; Mon, 08 Oct 2001 20:46:31 -0700
Received: from above.proper.com ([208.184.76.39]) by psg.com with esmtp (Exim 3.33 #1) id 15qnqg-000LsJ-00 for idn@ops.ietf.org; Mon, 08 Oct 2001 20:46:30 -0700
Received: from [165.227.249.20] (165-227-249-20.client.dsl.net [165.227.249.20]) by above.proper.com (8.11.6/8.11.3) with ESMTP id f993kPD24417 for <idn@ops.ietf.org>; Mon, 8 Oct 2001 20:46:25 -0700 (PDT)
Mime-Version: 1.0
X-Sender: phoffman@mail.imc.org
Message-Id: <p05100303b7e8186c0b81@[165.227.249.20]>
In-Reply-To: <3BC25528.FC918BB4@ehsco.com>
References: <200110040700.f9470bj18628@valinor.malmo.trab.se> <p0510030eb7e24eb9f210@[165.227.249.20]> <5.1.0.14.2.20011004213749.02478cd0@dcrocker.net> <5.1.0.14.2.20011004235936.023e1500@dcrocker.net> <5.1.0.14.2.20011005141903.04649150@dcrocker.net> <3BBE467C.A9797B21@ehsco.com> <p0510033ab7e3fc24f02c@[165.227.249.20]> <3BC25528.FC918BB4@ehsco.com>
Date: Mon, 08 Oct 2001 20:44:43 -0700
To: idn@ops.ietf.org
From: Paul Hoffman / IMC <phoffman@imc.org>
Subject: Re: [idn] I fear I cannot use IDN in the next 10 years
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Sender: owner-idn@ops.ietf.org
Precedence: bulk

At 8:38 PM -0500 10/8/01, Eric A. Hall wrote:
>First of all, I'd like to thank you for taking the time to enumerate your
>many perceived technical issues with the UDNS model. These points deserve
>to be fully debated and discussed.

They have been; I was repeating what had already been said.

>However, I think that your comments also represent a larger issue which
>needs to be resolved first. We need to come to a group consensus on the
>question of whether or not a UTF-8 namespace is necessary, desirable, or
>neither of those, and this needs to happen before the technical points can
>be debated in appropriate context.

First, what the heck is a "UTF-8 namespace"? We're talking about 
Internet protcols, not namespaces. Second, the debate of "is there an 
advantage to carrying UTF-8 in the DNS protocol" has already been 
debated extensively. It was decided that debating in the abstract was 
not likely to produce a tangible result (namely a protocol); 
discussing specific proposals might lead to a good result. To date, 
there have been no UTF8-based proposals without major technical flaws.

>  Without consensus on this underlying
>point, debates over relative costs will be debates over half-full vs
>half-empty, and will get us nowhere.

We weren't debating them; again, I was restating what had already be 
said over and over. If you really feel that there may be a way to do 
UTF-8 in the DNS protocol, write an Internet Draft and submit it. 
"I'm too busy to write a proposal, but I have lots of time to argue 
about the basis for it" won't sway many people.

>For example:
>
>>  UDNS gives us all of these problems for the limited benefit that some
>>  applications don't have to implement one additional encoding for
>>  sending host names on the wire. Does that seem like a good balance?
>
>The above is a comparison of UDNS' perceived cost relative to a benefit,
>but we haven't fully discussed the benefits as of yet.

Some people would disagree with your definition of "fully".

>  Clearly, ACE has
>many costs, some of which are quite high AND ongoing, although most of us
>agree that some form of ACE is necessary, and is therefore worth whatever
>cost may be required.

Could you be specific about which parts of IDNA have "high" costs? 
The conversion from whatever local encoding to ACE is not much higher 
than converting to UTF-8. That's a pretty minor cost for all 
protocols that allow anything other than UTF-8.

>  What we do not agree on is the benefit that UDNS (or
>a similar mechanism) would provide.

That's one of the things. You skipped over the parts of my list that 
would make UDNS impossible to deploy for end users, namely the fact 
that some names would be illegal or legal in different parts of the 
Internet. That's a show-stopper.

>  Making a decision based on cost alone
>is misrepresentative of the true value of UDNS, just as a decision on ACE
>which was based solely on cost would have misrepresented the true value of
>its benefits (being necessary, it can have almost any cost).
>
>Personally, I believe that UDNS is also necessary, and is therefore worth
>almost any cost (although as I will show, I think the cost is somewhat
>lower than you do). I base this on a few key arguments:
>
>  * BCP18's is "Official Internet Policy" which requires support for
>    UTF-8 in all new protocols, and in modifications to all existing
>    protocols: "lack of an ability to use UTF-8 is a violation of this
>    policy; such a violation would need a variance procedure". Simply
>    put, policy requires this WG to devise some kind of support for
>    UTF-8, unless it can be proven unreasonable. Nobody has proven it
>    to be unreasonable.

This has been gone over many times; it is a red herring. Read the archives.

>  * Without a UTF-8 DNS interface, no new protocols or applications
>    can be developed that are UTF-8 clean. Instead, they will be
>    UTF-8 for everything EXCEPT domain names, and in some cases this
>    will be fatal. One example we have already discussed for this is
>    mapping between LDAPv3 distinguished names and DNS domain names
>    (mapping dc= RDNs to DNS). Failure to support UTF-8 is a heavy
>    blow to such efforts, requiring tremendous amounts of development
>    effort and infrastructure oversight, likely hindering the use
>    and development of LDAP in the Internet.

It is interesting that you are the only one who feels this way; none 
of the people active on LDAP protocol development or deployment have 
expressed the same concerns.

>  * Global public networking is only in its infancy; there will
>    likely be many thousands of new protocols and applications which
>    are developed over the next few decades (some in the IETF WGs,
>    most in business, educational or personal networks, and most will
>    be developed outside of the US). If those applications follow the
>    Official Internet Policy, they will be UTF-8 only or use it as the
>    preferred encoding. However, those applications will also be
>    saddled with ACE conversion wherever they have to interact with
>    domain names, meaning they cannot be UTF-8 clean. This will be
>    extremely acute in non-US development environments.

Again, this is based on a misreading of BCP18. Read the archives, 
particularly the messages from the author of BCP18. This is a dead 
horse, but feel free to beat it some more.

>  * Towards the above point, UDNS provides an optional, user-driven
>    transition path from ASCII to UTF-8.

"User-driven"? UDNS requires changes to name servers. How is that 
"user-driven"?

>  UDNS allows applications to
>    be written so that they only function in UTF-8 environments.

Not true at all. Every application would have to have an ACE fallback mode.

>    While this may not seem a reasonable objective of this WG, this
>    should be the long-term vision of what we are trying to achieve.
>    We should be enabling the development of a truly international
>    Internet infrastructure which is UTF-8 clean throughout, and
>    UDNS provides this transitional path.

Either you are assuming that all current protocols that allow user 
choice of charsets will disappear, or they will will require only 
UTF-8 in the future. Both of those do a severe disservice to the 
users.

>  Conversely, ACE does not
>    provide this transition. In fact, the goal of seamless backwards
>    compatibility actively hinders migration.

Hogwash. It has nothing to do with migration to UTF-8.

>
>    Let me restate this point, since it is somewhat complex. Although
>    UDNS supports a dual-mode model, once UDNS (or something similar)
>    were approved, developers could begin to work on applications
>    which ONLY used UDNS, without any code for ASCII or ACE.

You are not talking about the UDNS that is before the WG. That 
protocol *requires* an ACE fallback.

>  At first
>    this would be small private apps, but over the course of a couple
>    of decades, it would likely be most of the new apps. Without UDNS
>    (or something similar), it would still be ASCII at the end of that
>    timeframe. In short, we need UDNS if we are ever to drop ASCII.

That's too bad, because UDNS it fatally flawed, as described before.

>  * The modernization and internationalization of legacy protocols,
>    applications and formats will almost certainly require a UTF-8
>    DNS eventually. It will not be possible to build and deploy
>    UTF-8 extensions to SMTP without a UTF-8 DNS infrastructure. As
>    with the above points, once a UTF-8 approach has gained some
>    critical mass, these extensions become feasible. Without the
>    infrastructure, SMTP will continue to be bound to ASCII.

Some people reading that paragraph might assume that you are, or have 
been, active in the development of the protocols you are writing 
about. Of course, that isn't true. What is true is that most of the 
people who have been active in developing those protocols are pushing 
IDNA (or are at least rejecting UDNS as fatally flawed).

>    * UDNS is optional and transitional, on a per-domain basis. This
>    allows a transition to occur as users see fit, without requiring
>    a replacement of the existing DNS infrastructure with an
>    alternate naming service.

Again, this is simply wrong. Users don't control name servers.

>  While this is technically not a
>    "requirement" per se, the seamless user-driven transitional
>    aspects should be requirements. Something very much like UDNS
>    will be required for this. A "new naming service" is unlikely to
>    reach critical mass without some kind of backwards compatibility,
>    which UDNS provides.
>
>    Furthermore, the cost of UDNS is incremental to that of ACE,
>    since they share many common features and functions. If a
>    developer is going to add ACE support to an application (which
>    they must do in the short term),

So, if you admit that here, why did you contradict it above?

>  it is incremental to add UDNS
>    support at the same time. On the other hand, adding ACE now and
>    then going back to glue on something else is a second development
>    effort, with greater costs.

You haven't shown any need to go back and glue.

>  * UTF-8 is infinitely more manageable and serviceable than ACE.
>    By being able to use UTF-8 tools and services, the Internet can
>    be kept running much easier than users having to transliterate
>    ACE operations.

Users don't have to do that. Red herring.

>  We should not be contributing to frailty. When
>    a network is already broken, ACE obfuscates the problem by
>    displaying aq--gobbledygook in traceroute, netstat, tcpdump and
>    all of the other tools.

So, you are saying that those tools today correctly display UTF-8? Yeah, right.

>  ACE libs can certainly address this part
>    of the issue, the extent of the support for tasks like importing
>    trace data into a spreadsheet or viewing it in an editor is not
>    as compelling. The massive number of UTF-8 tools are extremely
>    compelling in terms of general manageability and serviceability
>    of the global DNS.

"Massive", eh? Could you catalog those?

>  * Finally, there are some problems that ACE cannot solve, which
>    UDNS can. The clipboard problem practically goes away, if not
>    directly then indirectly, simply because we can facilitate the
>    use of UTF-8 everywhere, rather than having to transliterate
>    between application- and protocol-specific encodings. EG, if we
>    ALLOW for the consistent use of UTF-8, then it is more likely
>    to happen than if we actively PROHIBIT it by mandating yet
>    another encoding which MUST be accounted for in every operation.

This again takes the view that all current protocols will eventually 
go to UTF-8 only. That is insulting to the people throughout the 
world who have a strong preference to their own character sets.

>So that's the "benefit" side of the argument. If we can agree that these
>are important considerations, and that some kind of support for UTF-8 is a
>requirement (as per item #1), then we can debate costs and features as
>they compare to that benefit.

So far, no agreement.

>  > - UDNS causes some strings that are legal in one encoding to be
>>  illegal in the other, and vice versa, meaning that some host names
>>  will be illegal part of the time, unpredictably
>
>This issue would have to be resolved if we were ever to move beyond ACE.
>It is beneficial to design for such limits up front, rather than revoking
>names later which will be incompatible. This is only a cost if ACE and
>some other service are developed at different times, and is NOT a cost if
>they are developed together. This is, in fact, a motivating factor for
>beginning work on UDNS *NOW*.

Sorry, but I missed the part where you explain how to fix the 
problem. Without a fix, UDNS is dead in the water.

>  > - UDNS requires more work to be done by authoritative DNS servers
>
>Yes, UDNS requires authoritative servers to maintain name mapping
>information. However, I see this as being a transitional cost.

And you demand that all of the other people running name servers will 
agree with you.

>  In the
>beginning, few clients will support UDNS, but over the course of several
>years, most clients will support UDNS. At some point (two decades out?
>shorter?) it should be possible to switch entirely to UDNS, meaning that
>there will no longer be a need for servers to store both versions. The
>longer we delay beginning such a transition, the longer it will be before
>we can complete that transition. If we never start it, we will never
>complete it.

In other words, you don't think that making the com/net/org servers 
and the root servers do normalization and mapping is a problem. I 
take it you don't run any of those.

>  > - UDNS UTF-8 queries that fail will cause more load on the DNS
>
>Yes, it will cause one additional lookup. This is equitable with requiring
>one extra delegation server in the path. This is not a great cost to begin
>with. Furthermore, seeing as how it is also transitional, it will only be
>a cost for the short-term, and will not be a cost after a few years.

You blithely double the load on the DNS during the "transition". Nice.

>  > - UDNS is not compatible with DNS security
>
>I'm not sure I understand this point. Also, DNSSEC has many issues, and
>appears to be destined for a redesign at this point.

Yeah, that's a popular argument.

>  > - UDNS requires that applications have the logic to send a new DNS query
>>  format
>
>Yes, an alternative API is required to prevent clashes with legacy
>applications (this was proven by DJB's pi test). However, this is not a
>cost burden which is solely related to UDNS. For example, when hostnames
>are extended beyond the RFC972/1123 rules, an alternative wide() API will
>be required for several functions. Applications which transliterate ACE at
>relatively deep levels (such as for DHCP) will also require this. For
>these reasons, I consider this a cost of internationalization rather than
>a cost for UDNS.

IDNA internationalizes just as well as UDNS, at lower cost.

>  > - Applications that implement the UTF-8 part of UDNS have to handle
>>  the inevitable errors that come from queries that bounce, have to
>>  recast those queries in ACE (assuming that the name is even legal in
>>  ACE, which some won't be), and then have to emit those queries again,
>>  causing more DNS traffic
>
>As with the other issues, this is a transitional cost which dissipates as
>deployment goes up.

Wrong. It is with us as long as there are current applications running.

>  It is a cost in the beginning, but it is not a
>significant cost after a few years.

First you said "twenty", now it's down to "a few". Some of us think 
there is a difference.

>  > - The errors caused by UTF-8 queries are the same as other legitimate
>>  DNS errors, meaning that applications will have to have their own
>>  (probably-nonstandard) logic for differentiating between expected
>>  errors and real errors
>
>Because UDNS is an applicaton of EDNS, it uses the EDNS error codes. My
>experience is that these codes are farily simple to work with.

You have misread UDNS, then. The query errors from un-upgraded 
servers will not be EDNS errors: they will be format errors from the 
current DNS.

--Paul Hoffman, Director
--Internet Mail Consortium