Chris Newman <Chris.Newman@innosoft.com> Fri, 02 May 1997 18:32 UTC
Received: from cnri by ietf.org id aa00201; 2 May 97 14:32 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa17833; 2 May 97 14:32 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id OAA08198 for uri-out; Fri, 2 May 1997 14:06:50 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [188.8.131.52]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id OAA08193 for <firstname.lastname@example.org>; Fri, 2 May 1997 14:06:47 -0400 (EDT)
Received: from THOR.INNOSOFT.COM (SYSTEM@THOR.INNOSOFT.COM [184.108.40.206]) by mocha.bunyip.com (8.8.5/8.8.5) with ESMTP id OAA22698 for <email@example.com>; Fri, 2 May 1997 14:06:45 -0400 (EDT)
Received: from eleanor.innosoft.com by INNOSOFT.COM (PMDF V5.1-8 #8694) with SMTP id <01IIE4AVJ0YY99GK0T@INNOSOFT.COM> for firstname.lastname@example.org; Fri, 2 May 1997 11:05:36 PDT
Date: Fri, 02 May 1997 11:07:04 -0700
From: Chris Newman <Chris.Newman@innosoft.com>
Subject: Re: draft-fielding-url-syntax-05.txt
To: Larry Masinter <email@example.com>
Cc: IETF URI list <firstname.lastname@example.org>
Content-type: TEXT/PLAIN; charset="US-ASCII"
Originator-Info: login-id=chris; server=thor.innosoft.com
On Fri, 2 May 1997, Larry Masinter wrote: > 2. URL Characters and Escape Sequences > > URLs consist of a restricted set of characters, primarily chosen to > aid transcribability and usability both in computer systems and in > non-computer communications. Characters used conventionally as > delimiters around URLs were excluded. The restricted set of > characters consists of digits, letters, and a few graphic symbols > were chosen from those common to most of the character encodings > and input facilities available to Internet users. > > Within a URL, characters are either used as delimiters, or to > represent strings of data (octets) within the delimited portions. > Octets are either represented directly by a character (using the > US-ASCII character for that octet) or by an escape encoding. This > representation is elaborated below. > > 2.1 URLs and non-ASCII characters > > While URLs are sequences of characters and those characters are > used (within delimited sections) to represent sequences of octets, > in some cases those sequences of octets are used (via a 'charset' > or character encoding scheme) to represent sequences of characters: > > URL char. sequence <-> octet sequence <-> original char. sequence > > In cases where the original character sequence contains characters > that are strictly within the set of characters defined in the > US-ASCII character set, the mapping is simple: each original > character is translated into the US-ASCII code for it, and > subsequently represented either as the same character, or as an > escape sequence. > > In general practice, many different character encoding schemes are > used in the second mapping (between sequences of represented > characters and sequences of octets) and there is generally no > representation in the URL itself of which mapping was used. While > there is a strong desire to provide for a general and uniform > mapping between more general scripts and URLs, the standard for > such use is outside of the scope of this document. I find this much too wishy-washy. I think we should explicitly forbid the use of 8-bit characters and hex-encoded 8-bit characters, except as defined by the future I18N URL standard. We need to make it very clear that programs sending 8-bit URLs over the wire are broken (unless they use UTF8 according to the future standard).