Facts about URL Internationalization

Chris Newman <Chris.Newman@innosoft.com> Sat, 22 February 1997 02:32 UTC

Received: from cnri by ietf.org id aa26371; 21 Feb 97 21:32 EST
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa29727; 21 Feb 97 21:32 EST
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id TAA18002 for uri-out; Fri, 21 Feb 1997 19:52:35 -0500 (EST)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id TAA17997 for <uri@services.bunyip.com>; Fri, 21 Feb 1997 19:52:33 -0500 (EST)
Received: from THOR.INNOSOFT.COM by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA08816 (mail destined for uri@services.bunyip.com); Fri, 21 Feb 97 19:50:24 -0500
Received: from elvira.innosoft.com by INNOSOFT.COM (PMDF V5.1-8 #8694) with SMTP id <01IFOMBW3IEGB4T9S8@INNOSOFT.COM> for uri@bunyip.com; Fri, 21 Feb 1997 16:04:11 PST
Date: Fri, 21 Feb 1997 16:04:29 -0800
From: Chris Newman <Chris.Newman@innosoft.com>
Subject: Facts about URL Internationalization
To: IETF URI list <uri@bunyip.com>
Message-Id: <Pine.SOL.3.95.970221154233.4232E-100000@elvira.innosoft.com>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk

I think there are observable technical facts in this debate which we can
all agree on:

1) URLs are often distributed internationally in hardcopy form.  For
maximum global usability, such URLs must be restricted to the safe
characters of the US-ASCII character set.

2) Regardless of what the standard says, people do and will continue to
construct URLs containing unencoded octets above 0x7f.  (As evidence, look
at violations of 7-bit restrictions in RFC 822, SMTP, NNTP, etc).

3) URLs may have a character mapping for octets above 0x7f already
defined by context.  For example, a URL in a MIME part labelled
"text/plain; charset=iso-8859-1" will have a character mapping.

4) URLs may not have a character mapping for octets above 0x7f already
defined by context.

5) A character mapping for octets represented with the %HH notation is
currently undefined.

6) One key purpose of Internet Standards is to maximize global 
interoperability.

7) Were the URL standard to specify an interpretation for octet values
above 0x7f, it should be an international solution.


Taking all of these into account, I believe Martin Duerst's proposal is
on the right track.