Re: [idn] Reality Check

Martin Duerst <duerst@w3.org> Tue, 10 July 2001 15:38 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with SMTP id LAA04133 for <idn-archive@lists.ietf.org>; Tue, 10 Jul 2001 11:38:38 -0400 (EDT)
Received: from lserv by psg.com with local (Exim 3.31 #1) id 15JzRK-000PBS-00 for idn-data@psg.com; Tue, 10 Jul 2001 08:28:42 -0700
Received: from sh.w3.mag.keio.ac.jp ([133.27.194.41]) by psg.com with esmtp (Exim 3.31 #1) id 15JzRJ-000PBJ-00 for idn@ops.ietf.org; Tue, 10 Jul 2001 08:28:41 -0700
Received: from enoshima (i205162.ppp.asahi-net.or.jp [61.125.205.162]) by sh.w3.mag.keio.ac.jp (8.9.3/3.7W) with ESMTP id AAA12432; Wed, 11 Jul 2001 00:25:17 +0900 (JST)
Message-Id: <4.2.0.58.J.20010710234410.05ed44e0@sh.w3.mag.keio.ac.jp>
X-Sender: duerst@sh.w3.mag.keio.ac.jp
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58.J
Date: Wed, 11 Jul 2001 00:23:48 +0900
To: Dan Oscarsson <Dan.Oscarsson@trab.se>, idn@ops.ietf.org
From: Martin Duerst <duerst@w3.org>
Subject: Re: [idn] Reality Check
Cc: Marc Blanchet <Marc.Blanchet@viagenie.qc.ca>, James Seng/Personal <jseng@POBOX.ORG.SG>
In-Reply-To: <200107101101.f6AB1EP18634@malmo.trab.se>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Sender: owner-idn@ops.ietf.org
Precedence: bulk

Hello Dan, others,

At 13:04 01/07/10 +0200, Dan Oscarsson wrote:
>Has this working group lost the contact with the real world?
>
>All discussion is about ACE/nameprep/IDNA/backward compatibility!

>I have tried to think of how I as a programmer, developer, editor and
>user would handle DNS names.
>An ACE-solution will result in e-mails containing lines like:
>To: =?quoted-printable-encoded?= 
><ACE-like-user-name@ACE-domain.ACE-domain.com>
>A line with a lot of different mixed encoded forms. To handle this
>as a programmer you have to idetify the different parts of the line and then
>for each part identify what encoding is used, and the decode each before the
>line can be displayed to a user or handled in a program.
>As of today I have fewer programs than I want that handles e-mail - this
>is because the difficulty in programmatically handling MIME.
>With ACE in domain names and something like it in user names it will be
>even worse. Why could we not use something simple like having the
>entire line encoded in UTF-8? Instead of having a difficult time to
>parse data and a lot of different decoders I could use decode UTF-8 into my
>local character set.

This is a subject I have partially discussed with Keith earlier.
I think it's indeed very serious. There are thousands of different
file formats and occasions where domain names turn up. If ACE leaks
into only a small percentage of them, a lot of people have a lot
of problems. They may complain about ACE more than Keith is
complaining about NAT.

Practical books about a programming language often contain examples
of how to process your mail, and so on. All these examples are based
on the fact that the characters can be parsed directly. Grepping
for a word in the subject headers in a mail spool file is easy as long
as the subject is in ASCII. For anything beyond ASCII, it just fails.

I remember Ulrich Drepper, responsible for gclib, saying at a dinner
here in Japan that UTF-8 was the right way to go, because it would
allow people like him to provide the base for internationalization
(rather than just doing nothing at all), and would allow others,
more familiar with their own language, to build on it. In some sense,
RFC 2277 is based on a similar assumption. ACE severely breaks this
assumption.

Internationalization doesn't come for free. Most people who think
ACE is a good idea just don't see that work. In the short term,
having to update a DNS server (even if it's just with an 8-bit
clean version) seems a lot of work. But it's actually extremely
easy, compared to updating applications. Also, passing ACE in
application protocols seems extremely easy. But it just means
that these application protocols aren't really internationalized
yet, and that a lot of work is waiting out there.


>Many edits their html files with a text editor, or writes documents
>with embedded DNS names and URLs. The only way you can expect people
>to enter DNS names and URLs in those files is by using the same character
>set as the rest of the text and they will not convert them into
>ACE, %-encoding or other unnatural form.

Yes indeed. Please see
http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-07.txt
for URIs (working on updating it) and
http://www.ietf.org/internet-drafts/draft-ietf-idn-uri-00.txt
for the DNS part in URIs (will resubmit it to keep it alive;
I don't expect this to be discussed in London, but I haven't
seen any other proposals for how to solve this part of the
problem, and I guess the WG should deal with it once the
important issues are dealt with, and it would look silly
to resubmit as a personal contribution and later again
as a WG draft [sorry for this lengthy sentence]).


>Does the IDNA (ACE in application) solution that appears be the only
>focus of this working group match the real needs of people in the
>current and future world?

No. People will suffer from ACE for years to come. When MIME
was created, Unicode barely existed, so there was some excuse.
But there is no excuse for ACE. I wish I had patented it in
December 1996; all bad ideas should be patented ;-(.


Regards,   Martin.