Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Nico Williams <nico@cryptonector.com> Mon, 13 March 2017 21:51 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4F252129477; Mon, 13 Mar 2017 14:51:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.401
X-Spam-Level:
X-Spam-Status: No, score=-1.401 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6VvGHGjp-jOJ; Mon, 13 Mar 2017 14:51:16 -0700 (PDT)
Received: from homiemail-a70.g.dreamhost.com (sub4.mail.dreamhost.com [69.163.253.135]) (using TLSv1.1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7D99B129BD4; Mon, 13 Mar 2017 14:50:55 -0700 (PDT)
Received: from homiemail-a70.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a70.g.dreamhost.com (Postfix) with ESMTP id 39D51A004B16; Mon, 13 Mar 2017 14:50:55 -0700 (PDT)
Received: from localhost (cpe-70-123-158-140.austin.res.rr.com [70.123.158.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a70.g.dreamhost.com (Postfix) with ESMTPSA id B8F5CA004B13; Mon, 13 Mar 2017 14:50:54 -0700 (PDT)
Date: Mon, 13 Mar 2017 16:50:52 -0500
From: Nico Williams <nico@cryptonector.com>
To: Carsten Bormann <cabo@tzi.org>
Message-ID: <20170313215051.GF543@localhost>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <3B3F2181-6C5D-43C0-BCD9-8D4BA05E6C03@tzi.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <3B3F2181-6C5D-43C0-BCD9-8D4BA05E6C03@tzi.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/BgWF1P40fj7KcopcOB4j7gQl4Zw>
Cc: draft-ietf-jsonbis-rfc7159bis.all@ietf.org, Matthew Miller <linuxwolf+ietf@outer-planes.net>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2017 21:51:17 -0000

On Mon, Mar 13, 2017 at 10:12:09PM +0100, Carsten Bormann wrote:
> > 1) Does the working group think adding any text on how to detect the
> > encoding worthwhile?
> 
> No, that would be a regression into maintaining the fiction that
> UTF-16 and UTF-32 versions of JSON are being used in interchange.

I would support saying that JSON texts SHOULD be UTF-8.  I might even
support that they MUST be UTF-8, but I suspect there won't be consensus
for that, though IF there is, then I agree with you.

> Not sure if I’m allowed to note that after saying no above, but not

Why wouldn't you be?

> all JSON documents have four bytes.

True!  The empty string at the top-level, with no whitespace around it,
is a two-byte text when encoded in UTF-8.

Indeed, just two bytes will do for some cases; at most four are needed.
Byte order can be detected as well, also in two to four bytes, but byte
order also is pointless to think about if UTF-8 were REQUIRED (which it
isn't [yet]).

Nico
--