Re: "Difficult Characters" draft (in URLs)

"Martin J. Duerst" <mduerst@ifi.unizh.ch> Mon, 12 May 1997 10:59 UTC

Received: from cnri by ietf.org id aa01100; 12 May 97 6:59 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa06778; 12 May 97 6:59 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id GAA10809 for uri-out; Mon, 12 May 1997 06:23:58 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id GAA10804 for <uri@services.bunyip.com>; Mon, 12 May 1997 06:23:53 -0400 (EDT)
Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id GAA10008 for <uri@bunyip.com>; Mon, 12 May 1997 06:23:48 -0400 (EDT)
Received: from enoshima.ifi.unizh.ch by josef.ifi.unizh.ch with SMTP (PP) id <08481-0@josef.ifi.unizh.ch>; Mon, 12 May 1997 12:20:35 +0200
Date: Mon, 12 May 1997 12:20:19 +0200
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Larry Masinter <masinter@parc.xerox.com>
cc: Alain LaBont/e'/ <alb@sct.gouv.qc.ca>, URI mailing list <uri@bunyip.com>
Subject: Re: "Difficult Characters" draft (in URLs)
In-Reply-To: <337615DC.7C2F@parc.xerox.com>
Message-ID: <Pine.SUN.3.96.970512120323.245P-100000@enoshima>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk

On Sun, 11 May 1997, Larry Masinter wrote:

> Martin,
> 
> > "Keyboards exist" is not very helpful. If the market penetration of such
> > keyboards is 10%, we better leave out UCAL; if it is 95%, we don't
> > have to worry much.
> 
> It surprises me to see you fall into the same kind of position
> that -- in the larger scale -- was the argument for keeping
> URLs to "ASCII-only".

It shouldn't surprise you. URL transcribability is definitely
an issue. The main point is to realize that when evaluating
transcribability, it should be weighted with the number of
potential users.

So saying that Chinese will have difficulties to enter
uppercase accented letters (UCAL) is irrelevant to whether
they should appear in URLs intended for a French audience.
Discussing whether and to what degree French will be able
to input such letters, on the other hand, is very relevant.


> What is the scope of "the market"? If "the market" is "Alain",
> then the market penetration is 100%. If "the market" is "all
> keyboards on the planet" then, of course, the "market
> penetration" of keyboards that can type anything other
> than simple ASCII is still quite small.

The "market" may be very different for different URLs.
But there will probably be a large number of URLs mainly
addressing people in France (and other French-speaking
areas), mainly due to the fact that the corresponding
resources are written in French.

Except for those codepoint sequences normalized away be
the algorithms described in the draft, it is ultimately
the responsibility of the URL creator to care about his/her
market. The idea of the draft is to point out areas
where for various reasons, there may be problems.


> We are really talking about "character entry method"
> rather than "keyboard" since, as has been pointed out,
> with the "right software" it's possible to enter almost
> any kind of character from almost any kind of terminal;

True. If that's pen-based input or whatever, never mind.
But there is a big difference between various methods
in entry speed, keyboard entry in many cases being the
fastest. If it were the case that a French user on average
would take five minutes or more to enter an UCAL (the
information from Alain indicates that the average is
much lower), it would definitely be better to warn
against such letters in French URLs. However, if it
would take a Japanese user an average of five minutes
to enter such a letter, that wouldn't bother us much
(except for the unnecessary recommendation to not
include UCAL in Japanese URLs).


> and "market penetration" might want to be clarified
> as to whether you're interested in the percentage of
> "things that are being sold in the marketplace now"
> or "existing, installed, usable", or at least some
> forecast of the latter.

It's definitely "existing, installed, usable" that
counts. For URLs that you expect to stay longer, you
can also take into account the future development.
And for the draft, we of course should take into
account the future development, because the draft
should be reasonably valid for a certain time.

Regards,	Martin.