Re: [I18ndir] Fwd: Re: Working Group Last Call: Structured Headers for HTTP

John C Klensin <john-ietf@jck.com> Thu, 06 February 2020 01:23 UTC

Date: Wed, 05 Feb 2020 20:23:02 -0500
From: John C Klensin <john-ietf@jck.com>
To: Asmus Freytag <asmusf@ix.netcom.com>, i18ndir@ietf.org
Message-ID: <A942D88A37437ED525455FD6@PSB>
In-Reply-To: <a7652163-6815-457b-b6b4-96affe237a32@ix.netcom.com>
References: <fd66eb72-2777-3f34-026b-00f4084b88ea@ix.netcom.com> <a7652163-6815-457b-b6b4-96affe237a32@ix.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/Dt46CZCUO88C1YTvtFhxQGDtl3I>
Subject: Re: [I18ndir] Fwd: Re: Working Group Last Call: Structured Headers for HTTP
Precedence: list

Asmus,

In at least my case and I assume in Patrik's and John Levine's,
when I said "ASCII", I might better have said "the ASCII graphic
subset of Unicode", or "the Basic Latin repertoire of Unicode",
or "Unicode code points in the range U+0030 (or perhaps U+0020)
through U+007E (or maybe +007A), i.e., a repertoire, not a
specific coding standard.  Given that all of the documents to
this particular discussion are pushing UTF-8 and that a Unicode
string from that repertoire in UTF-8 is indistinguishable at the
octet level from a ASCII string encoded by right-justifying each
seven-bit ASCII code point in an eight bit byte (octet) with a
leading zero bit, the shortcut should be obvious.  But you are
correct in that we were talking about repertoire, not encoding
or choice of character set.  

That said, it may be worth remembering that, independent of what
operating systems may or may not do, early Web specifications
were written assuming ISO 8859-1 and that there are almost
certainly some applications out there that assume that CCS when
they see octets with the leading bit on.  The ASCII repertoire
as described above is still a proper subset, but, because 8859-1
(and 8859-x more generally) are not UTF-8-compatible, the
ability to define the CCS and encoding in use may still be
necessary even if the necessity is waning.

best,
  john

--On Wednesday, February 5, 2020 14:50 -0800 Asmus Freytag
<asmusf@ix.netcom.com> wrote:

> (didn't go out to the list when I first sent it)
> 
> When I read the word "exception" I always think of this:
> 
> When we built the first Unicode-enabled OS (Windows NT), we
> had a long discussion of which "strings" in the OS needed to
> be Unicode.
> 
> Some thought that there was a clear dividing line between data
> and what would be called "protocol values" in another context.
> 
> Some of the latter did look like they were easily limited to
> ASCII; but everywhere we found "exceptions". There might be a
> set of enumerable tokens, but it allowed extended values that
> were network or file identifiers.
> 
> After exhaustively researching everything, the conclusion was
> that every single string in the OS had to be Unicode (and
> making any exceptions was either not possible, or not worth
> the effort).
> 
> However, while all strings were encoded in Unicode, not all
> string values were allowed. While file names could be
> localized (within the limits of file system syntax), some of
> the enumerated strings were left limited to the ASCII set in
> repertoire (even if encoded in Unicode).
> 
> Reading this discussion (and I'm sorry I don't have the time
> right now to properly delve into the details) it seems that a
> natural recommendation would be to require Unicode for any
> native representation and, if necessary (or possible), limit
> the repertoire.
> 
> This also requires a definition of the matching protocol for
> all strings that are to be matched as part of the protocol (or
> should be searchable). For any format, that would cover issues
> of casing, white space handling etc., but for Unicode, by
> necessity, that also requires defining the normalization form
> to be used.
> 
> A./
> 
> PS: given how few systems these days natively operate in any
> character set other than Unicode, I am always astonished at
> the length to which people go to justify not making something
> native Unicode. They just pick up conversion issues when they
> use platform libraries to do any work or display.
>

[I18ndir] Fwd: Working Group Last Call: Structure… Martin J. Dürst
Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
Re: [I18ndir] Fwd: Working Group Last Call: Struc… John Levine
Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
Re: [I18ndir] Fwd: Working Group Last Call: Struc… John R Levine
Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
Re: [I18ndir] Fwd: Working Group Last Call: Struc… John R Levine
Re: [I18ndir] Fwd: Working Group Last Call: Struc… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
Re: [I18ndir] Working Group Last Call: Structured… John R Levine
Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… John R Levine
Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
Re: [I18ndir] Working Group Last Call: Structured… John R Levine
Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… Patrik Fältström
Re: [I18ndir] Working Group Last Call: Structured… John R Levine
Re: [I18ndir] Working Group Last Call: Structured… John C Klensin
Re: [I18ndir] Working Group Last Call: Structured… Asmus Freytag
[I18ndir] Fwd: Re: Working Group Last Call: Struc… Asmus Freytag
Re: [I18ndir] Fwd: Re: Working Group Last Call: S… John C Klensin
Re: [I18ndir] Fwd: Re: Working Group Last Call: S… Asmus Freytag (c)
Re: [I18ndir] Fwd: Re: Working Group Last Call: S… John C Klensin
Re: [I18ndir] Fwd: Re: Working Group Last Call: S… Asmus Freytag (c)