Re: revised "generic syntax" internet draft

"Martin J. Duerst" <> Sat, 19 April 1997 16:16 UTC

Received: from cnri by id aa12244; 19 Apr 97 12:16 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa28890; 19 Apr 97 12:16 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id MAA23152 for uri-out; Sat, 19 Apr 1997 12:02:23 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with SMTP id MAA23139 for <>; Sat, 19 Apr 1997 12:02:15 -0400 (EDT)
Received: from by with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA13888 (mail destined for; Sat, 19 Apr 97 12:02:13 -0400
Received: from by with SMTP (PP) id <>; Sat, 19 Apr 1997 18:02:05 +0200
Date: Sat, 19 Apr 1997 18:01:56 +0200
From: "Martin J. Duerst" <>
To: Chris Newman <>
Cc: "Roy T. Fielding" <>, IETF URI list <>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <>
Message-Id: <Pine.SUN.3.96.970419175423.708Y-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Precedence: bulk

On Fri, 18 Apr 1997, Chris Newman wrote:

> That problem statement is a bit verbose, but accurate.

Sorry. Because I am a fast typer (DVORAK keyboard, you know),
I tend to be verbose.

> On Fri, 18 Apr 1997, Roy T. Fielding wrote:

> > I think there is a way to define UTF-8 preference for URL encoding
> > such that it won't break existing services, by forbidding transcoding
> > of already-encoded octets.  However, I won't bother to explain that
> > until there is broad agreement on what needs to be solved.
> Yes, if you forbid transcoding of %80-%FF, and that representation were
> actually used in the filesystem, then the charset (or lack thereof) in the
> filesystem isn't a problem.

Transcoding %80-%FF, i.e. suddenly changing %80 into %83 (or whatever)
for whatever reasons, is definitely not part of the plan. Whenever
we see something like %HH, we know that we have to take it as an
encoded octet. What some application might do, for the user's
convenience, is to convert it into actual characters. But in order
for this to work, we have to agree on a single (or at least a
preferential) character->octet encoding.

Real characters, on the other hand, transported in some documents,
will always be transcoded with the document as a whole (e.g.
from EUC to JIS for mail in Japan) but they keep their character
identity. The same applies to "%", "8", "0",... if we take
into account EBCDIC.

Regards,	Martin.