Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Julian Reschke <julian.reschke@gmx.de> Tue, 14 March 2017 06:44 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5B0E1293EC; Mon, 13 Mar 2017 23:44:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zWWH7ABuYQcb; Mon, 13 Mar 2017 23:44:52 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2B3351289B0; Mon, 13 Mar 2017 23:44:52 -0700 (PDT)
Received: from [192.168.178.20] ([93.217.112.97]) by mail.gmx.com (mrgmx102 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MZOan-1cWzBs1lEr-00LGsf; Tue, 14 Mar 2017 07:44:41 +0100
To: Tim Bray <tbray@textuality.com>, Carsten Bormann <cabo@tzi.org>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <b3cb2651-2d9f-d68d-2191-814e8dd5f5e2@gmx.de> <4B0A7371-9D85-4BEF-BC3C-14175E563178@tzi.org> <98ba10a0-6e44-9ff0-5993-f7ec9c66d74b@gmx.de> <E30CE52F-CE3E-4888-99D8-58899D3652EB@tzi.org> <CAHBU6ivb1meRgGZ8QPcicQY7awq1FSVCUNB2zkXGq2WJ6bsspQ@mail.gmail.com>
From: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <9c0dee1b-341c-e783-a30b-1afeb841e693@gmx.de>
Date: Tue, 14 Mar 2017 07:44:40 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <CAHBU6ivb1meRgGZ8QPcicQY7awq1FSVCUNB2zkXGq2WJ6bsspQ@mail.gmail.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:mlzJHNpxyjaj+Ql1GXt1po9nWZfGbgOmipU6UqAXK+NHVTuR4Ds 4xI/JNWlhXQEsVPPzd5z09dzxgpSiWsKiEPUTGd3vsOPI2+dqq67J++ufxGleg8k7wOmYXn skoaZqr9LJtP1ahOcRvgO+8v46Mn8NwuHAziZWu7jRX+qy9X4sPuI/YDgSC1gZ4q5iQcaWE FnOLPVzz8+nBd+wqvYKTA==
X-UI-Out-Filterresults: notjunk:1;V01:K0:mC9o8oeE2Xc=:du0FztUhyMqxYUbUfEEiqo t/e4gfGzg/tFqFSzZJCRslk30tcQ+6P47x1ZcV2xO3xHqZLJiumdyngE9y9eU/kHdl325hYMt Pbl+A7hp96pROJ0P5NaOXB9Y1RcMqmJQ1lpKQX41iMCS3/1zwBcHLvWw0Jk3lLHs+AGvI54L+ 8WyDYTXw8y+jLLp5yhxWwEY2xHJuKFNJHzGIbBi3x+FQrG95HoVDzZO0REJN9t+rH+hyGVvIl EBR/1clFvGRis0IoLxcMguMOvp2ramUnoqkJaVzElZ42zV3B4BTkvQDrYw2h2wsIazqC1muOG 3RopoEDAvHLd/OFC2EGfC4LaUBjafvU4/YcSnWuWY0ZrcANhCL0V5+uf1jVnH+Sil1AALbsAH 31NkBIbTFqahhecWidd+d6gIuI+UYFJUeg+xxkM/9BMsw+mUQZGLXiNBCJPCOmk7w0whwIoJo JcckwFD+rF1+8w8y4zoB0suCEBqpdh3x2z9eiXovM4j6eZW4RxoiUYZETqSFi+qTpQ3NrErR1 Jo/JDNEk+NDVS47wGUgGJ1y32c2mVD/BLfMI7L1vAXyVdMs8YvXSpiUUcCnpxh7X53JSQ9bMO gL4Q+2CmN+svn6+kjpEiGpXzkxUbOUPZM5+z7DqMmGbwOCRCV6jGXQWh0YIUrLrf1+AL99L14 hfOTIMnwmgqdjcOrvK2bAe3b5ZGsq6yimK+xeOiRv9kbMFneTn9Z5PdL3a9z8b9bj4IZlFCuu eYwI1dA0oA2DsvYebMSY/uRiHaMFoGCIJNUcYafzcMbsap3SO3VzuC94NkHWryiI9Mqk50Lj8 Zx3KsRV
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/LwhozdFQtKJJSZXIFQ5Y8ModJfU>
Cc: draft-ietf-jsonbis-rfc7159bis.all@ietf.org, "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Mar 2017 06:44:53 -0000

On 2017-03-14 07:17, Tim Bray wrote:
> My position is identical to Carsten's. If you want guaranteed
> interoperability, use I-JSON. Let's not weaken this document with
> dubious handwaving that suggests anything but UTF-8 is a sane choice.
> ...

I'm sympathetic with the intent.

However, if the spec allows encodings other than UTF-8, it needs to give 
sufficient information that this needs to be done by inspecting the 
payload, not out-of-band data. Carsten has demonstrated that it's not 
simple, so let's either write down the exact steps, or at least give 
readers something to start with, such as:

"Character encoding detection can be done based on the fact that the 
first character is always US-ASCII, so the UTF-16/32 variants can be 
detected by inspecting the first octets for zeros."

Best regards, Julian