Re: [apps-discuss] [Editorial Errata Reported] RFC6365 (4005)

This errata may represent the way you would like things to be, it's factually
incorrect and doesn't represent the way things are.

"US-ASCII" is the registered charset name for what's defined in ANSI X3.4-1986.
Not "ASCII". It's the charset various important media types default to, and
it's the name that appears  throughout RFC 2046 as well as many other RFCs. And
if you use any other name for this charset, including but not limited to plain
"ASCII", you're not going to like the results.  As such, there absolutely is
such a thing a "US-ASCII" and it's far more than an "IETF artefact" at this
point.

Les jeux sont faits, rein ne va plus.

I note in passing that the text in RFC 6365 is incorrect in another way: It
effectively says that "charset" and "character encoding scheme" are equivalent.
They aren't. A charset is, as RFC 6365 correctly says, a mapping from a
sequence of octets to a sequence of characters. A character encoding scheme
(CES), OTOH, is a means of translating a sequence of integers into a sequence
of octets. And a coded character set (CCS) is a 1:1 mapping between a set of
characters and a set of integers.

When you combine one or more  CCSes and a CES you obtain a means of translating 
from characters to octets. This is the inverse of a charset, more or less.

Charset was defined the way it was because we very consciously and deliberately
*rejected* the ISO approach to this stuff when MIME was specified. The main
problem with the CCS/CES thing is that it's inherently ambiguous: For example,
if even a subset of ISO 2022 is part of your CES, you now have a one-to-many
mapping - and an interoperability mess in the making.

Charset, on the other hand, is clean, simple and precise. It focuses on
interpretation rather than creation of the octet stream, which is what you want
when interoperability is your primary goal.

As always, the proof is in the pudding. There are any number of examples of the
suckage of the ISO approach to this problem - generaltext anyone? - but the one
I like to use is the combination of RFC 1468 and RFC 2047. In brief, RFC 1468
says that you can't have two adjacent double byte segments. RFC 2047 then says
that any such segment must be entirely enclosed in an encoded-word. It also
says that spaces (which would count as a single byte segement) between encoded
words are to be removed. Which, unless you go to a lot of trouble to remove the
single-byte-seq double-byte-seq pairs, results in illegal iso-2022-jp.

This is what happens when you use an overly complex approach built on decades
of accreted bad design.

Now, as you might have already guessed, my sympathy level for the poor ISOers
who are confused by the approach the IETF took to this problem is low.
Actually, make that nonexistant. The character set space is a huge mess in
large part due to the way the ISO has handled it.

And like it or not, failure has consequences. The "US-ASCII" label is one of
them.

				Ned

> John, your argument may well be absolutely correct, but this is also
> absolutely *not* an error in the specification.  You and Paul
> certainly *meant* to say "US-ASCII", and, while in retrospect perhaps
> you shouldn't have, it's not an issue for an errata report.

> Am I wrong here?

> Barry

> On Wed, Jun 4, 2014 at 4:32 PM, RFC Errata System
> <rfc-editor@rfc-editor.org> wrote:
> > The following errata report has been submitted for RFC6365,
> > "Terminology Used in Internationalization in the IETF".
> >
> > --------------------------------------
> > You may review the report below and at:
> > http://www.rfc-editor.org/errata_search.php?rfc=6365&eid=4005
> >
> > --------------------------------------
> > Type: Editorial
> > Reported by: John Klensin <john=ietf@jck.com>
> >
> > Section: GLOBAL
> >
> > Original Text
> > -------------
> > US-ASCII
> >
> > Corrected Text
> > --------------
> > ASCII
> >
> > Notes
> > -----

> > The term "US-ASCII" is an IETF artifact, left over from some
misunderstandings about what "ASCII" referred to (and the complete absence of
CSCII or CASCII, MSCII or MXSCII, BRSCII, ARSCII, and other "American" coded
character sets).  It is a source of confusion for people who come to IETF
specifications with a background in coded character sets and terminology from
other areas or standards bodies and has been warned against multiple times.  It
should not have appeared in this document except possibly with a warning
against its use (and the use of other bogus terms like "ASCII7").  The second
author, who is normally sensitive to the issue, has no idea how this got past
him, even in text picked up from other documents, but supposes this is what
errata are for.

> >

> > In any event, there is no such thing as "US-ASCII": the term is an
erroneous and misleading synonym/ substitute for "ASCII".  The reference for
the latter is correct, but the citation anchor should probably be corrected as
well.

> >
> > Instructions:
> > -------------
> > This errata is currently posted as "Reported". If necessary, please
> > use "Reply All" to discuss whether it should be verified or
> > rejected. When a decision is reached, the verifying party (IESG)
> > can log in to change the status and edit the report, if necessary.
> >
> > --------------------------------------
> > RFC6365 (draft-ietf-appsawg-rfc3536bis-06)
> > --------------------------------------
> > Title               : Terminology Used in Internationalization in the IETF
> > Publication Date    : September 2011
> > Author(s)           : P. Hoffman, J. Klensin
> > Category            : BEST CURRENT PRACTICE
> > Source              : Applications Area Working Group
> > Area                : Applications
> > Stream              : IETF
> > Verifying Party     : IESG

> _______________________________________________
> apps-discuss mailing list
> apps-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/apps-discuss