Re: Status of RFC 20 (was: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09)

John C Klensin <> Thu, 18 December 2014 16:09 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 567DC1A8AD0 for <>; Thu, 18 Dec 2014 08:09:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.61
X-Spam-Status: No, score=-2.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Ckyzrt-763Sh for <>; Thu, 18 Dec 2014 08:09:28 -0800 (PST)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 04C591A8AB3 for <>; Thu, 18 Dec 2014 08:09:28 -0800 (PST)
Received: from ([] by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1Y1dde-000378-60; Thu, 18 Dec 2014 11:09:18 -0500
Date: Thu, 18 Dec 2014 11:09:13 -0500
From: John C Klensin <>
To: Julian Reschke <>, Barry Leiba <>, Stephen Farrell <>
Subject: Re: Status of RFC 20 (was: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09)
Message-ID: <>
In-Reply-To: <>
References: <20141206170611.39377.qmail@ary.lan> <> <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Cc: John Levine <>, IETF discussion list <>, Stewart Bryant <>
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 18 Dec 2014 16:09:34 -0000

--On Thursday, December 18, 2014 15:16 +0100 Julian Reschke
<> wrote:

> So RFC 20 says it defines a "coded character set". However, in
> current specs (at least in APPS) we frequently talk about
> "character encoding schemes"
> (<>), in general
> mapping Unicode code points to octet sequences.
> So does RFC 20 define a CES as well? If it does not, should we
> have an additional document taking care of this?

With the understanding that RFC 20's being used successfully for
45 years continues to be a very strong argument that it doesn't
need changes or supplemental materials, and noting that many
IETF participants were not reading ANSI/USASA standards (or much
of anything else) 45 years ago,

(1) The terms "coded character set" and "[character] code for
information interchange" were in use for long before Unicode and
its multiple encodings/ representation forms started to redefine
it/them.   In this context, "long" is measured in decades, not

(2) Early versions of ASCII did not specify what we would now
call "encoding" information.  It just specified repertoire and
associated 7 bit CCS.  Late ones, IIR, do specify encoding
information.  That type of difference is one of the reasons we
need to be careful about version numbers or dates when
referencing other people's standards (and why stable references
are important).  For ASCII, the result was that we ended up with
at least two different ways to put those 7 bit characters into
an 8 bit "byte" and at least two different ways to put them into
a 36 bit word.

(3) I assume partially because the encoding issues mentioned in
(2) had most people working on anything resembling applications
on the network to be familiar with the issues, RFC 20 does
specify an on-the-wire encoding for ASCII.  That is one of the
things that makes it more useful than a reference to ASCII
alone: it specifies what we started calling a "charset" in the
early MIME days, i.e., a combination between a CCS and a CES.

So, AFICT, nothing else needed in this area other than getting
on with it and ceasing to embarrass ourselves by needing to drag
out this discussion of a 45-year-old spec of something we've
used heavily and for which there has never been a problem for
what is now called the Basic Latin repertoire.