Re: Status of RFC 20 (was: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09)

ned+ietf@mauve.mrochek.com Thu, 18 December 2014 16:02 UTC

Return-Path: <ned+ietf@mauve.mrochek.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 495091A8AA1 for <ietf@ietfa.amsl.com>; Thu, 18 Dec 2014 08:02:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.912
X-Spam-Level:
X-Spam-Status: No, score=-1.912 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gZA2DzmxtWJ2 for <ietf@ietfa.amsl.com>; Thu, 18 Dec 2014 08:02:09 -0800 (PST)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.159.242.17]) by ietfa.amsl.com (Postfix) with ESMTP id 6AE5C1A1A68 for <ietf@ietf.org>; Thu, 18 Dec 2014 08:02:09 -0800 (PST)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01PG9461JP2O008KGP@mauve.mrochek.com> for ietf@ietf.org; Thu, 18 Dec 2014 07:57:12 -0800 (PST)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: TEXT/PLAIN; CHARSET=us-ascii; format=flowed
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01PG76POFJA800005K@mauve.mrochek.com> (original mail from NED@mauve.mrochek.com) for ietf@ietf.org; Thu, 18 Dec 2014 07:57:04 -0800 (PST)
From: ned+ietf@mauve.mrochek.com
Message-id: <01PG94604LNK00005K@mauve.mrochek.com>
Date: Thu, 18 Dec 2014 07:22:09 -0800 (PST)
Subject: Re: Status of RFC 20 (was: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09)
In-reply-to: "Your message dated Thu, 18 Dec 2014 15:16:17 +0100" <5492E1B1.2040600@gmx.de>
References: <20141206170611.39377.qmail@ary.lan> <54833B14.7010104@cs.tcd.ie> <CAC4RtVC=yJs1p8Ei2AFSfSiqo9OwXkWhioXPWYU_JKRCSo0VKw@mail.gmail.com> <5492E1B1.2040600@gmx.de>
To: Julian Reschke <julian.reschke@gmx.de>
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf/exMPJ5VCcyzoEMJcEfi7jYp_5Gg
Cc: Stewart Bryant <stbryant@cisco.com>, Barry Leiba <barryleiba@computer.org>, John Levine <johnl@taugh.com>, IETF discussion list <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Dec 2014 16:02:15 -0000

> On 2014-12-07 01:19, Barry Leiba wrote:
> >> PS: If Barry or anyone else wants to do this instead that's
> >> fine by me.
> >
> > I already have; the status-change document is in Last Call Requested state.
> >
> > Barry

> So RFC 20 says it defines a "coded character set". However, in current
> specs (at least in APPS) we frequently talk about "character encoding
> schemes" (<http://tools.ietf.org/html/rfc6365#section-2>), in general
> mapping Unicode code points to octet sequences.

Actually, we try and talk about charsets, which are mappings from a series of
octets to a series of characters, and avoid all of the complicated ISO
bafflegab.

A CCS is a mapping from characters to integers. A CES is a mapping from one or
more sets of integers to octets. The combination of one or more CCSs with a CES
produces something that is usually (but not always) the same as a charset. The
distinction lies in the fact that a CCS/CES doesn't always fully specify the
meaning of all octet sequences, whereas a charset does.

The supposed utility of the complex CCS/CES approach lay in it's ability to
accomodate very complex CESs. This was thought to be the right way to do things
back in the days when no universal charset existed and thus multiple CCSs had
to be allowed in a single stream of octets.

For example, in X.400 you had the generaltext body part, which used ISO 2022 as
the CES combined with an essentially arbitrary set of CCSs that were specified
both inline and out of band.

But instead we ended up with multiple CESs, which were either profiled subsets
of ISO 2022 or schemes where the hi bit was essentially a CCS flag. It's much
more straightforward to handle such things as charsets, so that's what we did.

See RFC 2978 for additional details.

> So does RFC 20 define a CES as well?

No. Of course there's an obvious CES to associate with it: The mapping of the
128 integer values it defines to octets with the same value. Do that and you
essentially have the US-ASCII charset.

> If it does not, should we have an
> additional document taking care of this?

I don't see why. The utility of RFC 20 lies in its specification of the meaning
of various characters. If you're using it as a specification for the US-ASCII
charset, you're using it incorrectly because like it or not, it doesn't specify
that. RFC 2046 does that by referencing ANSI X3.4-1986. No doubt it would
have been better to reference RFC 20 for the CCS part, but it wasn't online
at the time.

				Ned