Re: [idn] Reality Check

Martin Duerst <duerst@w3.org> Tue, 10 July 2001 15:38 UTC

Message-Id: <4.2.0.58.J.20010710234410.05ed44e0@sh.w3.mag.keio.ac.jp>
Date: Wed, 11 Jul 2001 00:23:48 +0900
To: Dan Oscarsson <Dan.Oscarsson@trab.se>, idn@ops.ietf.org
From: Martin Duerst <duerst@w3.org>
Subject: Re: [idn] Reality Check
Cc: Marc Blanchet <Marc.Blanchet@viagenie.qc.ca>, James Seng/Personal <jseng@POBOX.ORG.SG>
In-Reply-To: <200107101101.f6AB1EP18634@malmo.trab.se>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Sender: owner-idn@ops.ietf.org
Precedence: bulk

Hello Dan, others,

At 13:04 01/07/10 +0200, Dan Oscarsson wrote:
>Has this working group lost the contact with the real world?
>
>All discussion is about ACE/nameprep/IDNA/backward compatibility!

>I have tried to think of how I as a programmer, developer, editor and
>user would handle DNS names.
>An ACE-solution will result in e-mails containing lines like:
>To: =?quoted-printable-encoded?= 
><ACE-like-user-name@ACE-domain.ACE-domain.com>
>A line with a lot of different mixed encoded forms. To handle this
>as a programmer you have to idetify the different parts of the line and then
>for each part identify what encoding is used, and the decode each before the
>line can be displayed to a user or handled in a program.
>As of today I have fewer programs than I want that handles e-mail - this
>is because the difficulty in programmatically handling MIME.
>With ACE in domain names and something like it in user names it will be
>even worse. Why could we not use something simple like having the
>entire line encoded in UTF-8? Instead of having a difficult time to
>parse data and a lot of different decoders I could use decode UTF-8 into my
>local character set.

This is a subject I have partially discussed with Keith earlier.
I think it's indeed very serious. There are thousands of different
file formats and occasions where domain names turn up. If ACE leaks
into only a small percentage of them, a lot of people have a lot
of problems. They may complain about ACE more than Keith is
complaining about NAT.

Practical books about a programming language often contain examples
of how to process your mail, and so on. All these examples are based
on the fact that the characters can be parsed directly. Grepping
for a word in the subject headers in a mail spool file is easy as long
as the subject is in ASCII. For anything beyond ASCII, it just fails.

I remember Ulrich Drepper, responsible for gclib, saying at a dinner
here in Japan that UTF-8 was the right way to go, because it would
allow people like him to provide the base for internationalization
(rather than just doing nothing at all), and would allow others,
more familiar with their own language, to build on it. In some sense,
RFC 2277 is based on a similar assumption. ACE severely breaks this
assumption.

Internationalization doesn't come for free. Most people who think
ACE is a good idea just don't see that work. In the short term,
having to update a DNS server (even if it's just with an 8-bit
clean version) seems a lot of work. But it's actually extremely
easy, compared to updating applications. Also, passing ACE in
application protocols seems extremely easy. But it just means
that these application protocols aren't really internationalized
yet, and that a lot of work is waiting out there.

>Many edits their html files with a text editor, or writes documents
>with embedded DNS names and URLs. The only way you can expect people
>to enter DNS names and URLs in those files is by using the same character
>set as the rest of the text and they will not convert them into
>ACE, %-encoding or other unnatural form.

Yes indeed. Please see
http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-07.txt
for URIs (working on updating it) and
http://www.ietf.org/internet-drafts/draft-ietf-idn-uri-00.txt
for the DNS part in URIs (will resubmit it to keep it alive;
I don't expect this to be discussed in London, but I haven't
seen any other proposals for how to solve this part of the
problem, and I guess the WG should deal with it once the
important issues are dealt with, and it would look silly
to resubmit as a personal contribution and later again
as a WG draft [sorry for this lengthy sentence]).

>Does the IDNA (ACE in application) solution that appears be the only
>focus of this working group match the real needs of people in the
>current and future world?

No. People will suffer from ACE for years to come. When MIME
was created, Unicode barely existed, so there was some excuse.
But there is no excuse for ACE. I wish I had patented it in
December 1996; all bad ideas should be patented ;-(.

Regards,   Martin.

RE: [idn] Reality Check Brian W. Spolarich
Re: [idn] Reality Check Keith Moore
Re: [idn] IDN security and ACE leakage Soobok Lee
[idn] Reality Check Dan Oscarsson
Re: [idn] Reality Check Eric A. Hall
Re: [idn] Reality Check John C Klensin
Re: [idn] Reality Check D. J. Bernstein
Re: [idn] Reality Check liana.ydisg
Re: [idn] UTF-8 as the long-term IDN solution James Seng/Personal
RE: [idn] Reality Check Martin Duerst
Re: [idn] Reality Check John C Klensin
Re: [idn] Reality Check Adam M. Costello
RE: [idn] Reality Check Rick H Wesson
Re: [idn] Reality Check Adam M. Costello
RE: [idn] Reality Check Patrik Fältström
Re: [idn] Reality Check Edmon
RE: [idn] Reality Check Martin Duerst
RE: [idn] Reality Check Brian W. Spolarich
Re: [idn] Reality Check Edmon
[idn] Re: UTF-8 as the long-term IDN solution Dave Crocker
Re: [idn] Reality Check Martin Duerst
Re: [idn] Reality Check Edmon
Re: Just send UTF-8 with nameprep (was: RE: [idn]… Keith Moore
Re: [idn] IDN security and ACE leakage Martin Duerst
RE: [idn] Reality Check Russ Rolfe
Re: [idn] Reality Check Keith Moore
Re: [idn] Reality Check Keith Moore
RE: [idn] Reality Check Brian W. Spolarich
[idn] UTF-8 as the long-term IDN solution D. J. Bernstein
[idn] IDN security and ACE leakage Soobok Lee
Re: [idn] Reality Check Dan Oscarsson
Re: [idn] Reality Check Keith Moore
Re: [idn] Reality Check Eric A. Hall
Re: [idn] Reality Check Adam M. Costello
RE: [idn] Reality Check Russ Rolfe
Re: [idn] Reality Check Eric A. Hall
Re: [idn] Reality Check Adam M. Costello
RE: [idn] Reality Check Erik Nordmark
Re: [idn] IDN security and ACE leakage Martin Duerst
Re: [idn] Reality Check Adam M. Costello
Re: [idn] Reality Check Adam M. Costello
Re: [idn] IDN security and ACE leakage Soobok Lee
Re: [idn] Reality Check Adam M. Costello
Re: [idn] UTF-8 as the long-term IDN solution James Seng/Personal
Re: [idn] Reality Check Martin Duerst
Re: [idn] UTF-8 as the long-term IDN solution D. J. Bernstein
Re: [idn] Reality Check Edmon
Re: [idn] Reality Check Adam M. Costello
Re: [idn] Reality Check Adam M. Costello
Re: [idn] Reality Check Mats Dufberg
RE: [idn] Reality Check John C Klensin
Re: [idn] Reality Check Eric A. Hall
Just send UTF-8 with nameprep (was: RE: [idn] Rea… Martin Duerst
Re: [idn] Reality Check D. J. Bernstein
Re: [idn] Reality Check Eric A. Hall
Re: [idn] Reality Check Keith Moore
Re: [idn] Reality Check John C Klensin
Re: [idn] Reality Check Adam M. Costello
Re: [idn] Reality Check Eric A. Hall
Re: [idn] Reality Check Adam M. Costello
RE: [idn] Reality Check Russ Rolfe
Re: [idn] IDN security and ACE leakage Soobok Lee