Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Julian Reschke <julian.reschke@gmx.de> Mon, 13 March 2017 21:50 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 59089129BC5; Mon, 13 Mar 2017 14:50:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FdgCXy5Yx92Z; Mon, 13 Mar 2017 14:50:46 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8F196129BBF; Mon, 13 Mar 2017 14:50:45 -0700 (PDT)
Received: from [192.168.178.20] ([93.217.107.79]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MNqcR-1ctJYP2JvH-007SRE; Mon, 13 Mar 2017 22:50:42 +0100
To: Matthew Miller <linuxwolf+ietf@outer-planes.net>, "json@ietf.org" <json@ietf.org>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net>
From: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <b3cb2651-2d9f-d68d-2191-814e8dd5f5e2@gmx.de>
Date: Mon, 13 Mar 2017 22:50:42 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:IYODodkWAhpivxZzUJz7Fq/dsV0P72/dtbjmzIEjWu3ndSbK3P7 DKA3UeDswl9ozVCvNxl8MHY9aPtOt3IHznDp/cEh6k2c94aEn7GQJx97ZBELbhCtEWWaV5g kYaMSC4Fci7ZFw0EwdWuLPDGe5d6VAOBYD+CODtdTGjeuGMr/hKQABMo/ltc05TRempilEZ YMs1FRZlL+7oK/JgsLqNg==
X-UI-Out-Filterresults: notjunk:1;V01:K0:AvNnp00AbP4=:QaJgWshmXincql2N0dutrm k1Eb3KFPBCERvcKJNZcyfth/i63dYK3MsG5Na6gy9t5A64U1Xi/nGvlVqAfxmlgGQn9ODRXoa GsXvbIXrRcjfwJSWST4rHFuBwoo9SsAal4r8e95m0vC2gl7PlkH4d9VS9TsdXaUOPSDJjRsSM /7dE6+movJGv5svzAFySmog62ZvIDfgtULwUAn9YjrvC4BPphnW3I5zZxh2Afi+DmNDBO5lar FWGBJaSeoWTharZB0PXkO/ACLN6B+CPcrjoFY/QcNBHjUXULL1+3qIQgCiW3Nn1Xdlj7SB1i7 /xhttizOQbnLZ4Tnph7i3W42KNBg9+VjubgzxwnNT5Lf4DeHkNuovHmwOyHJnfuV9z1zrWda1 KWr025AYf1zcBljRMKHbYLzH1I+Lrb4HTCvfgPA/iXiCeFS4ml5orRYgx33ZQPhzbqaflZ8i/ +0tvBxXox2v7xV21NeVejBGfMu3853KPCjfOmZynLobDflAir+mGnu8IfUD61ULSdHvx77c0p Lx69q7htH8nYUDnQ+0BT9M6Bo7xhLPD2w974iZd6xl82W4SW4BgkZd/rghuOHcpqDppCL2OQ3 Oq6kgNqKiApaJCkvGdRQe/RwQdbHfII9PxIMlGaGfoP78dKm8cEcPCDPcgXrd6xVCvqAcCcur zmW1fobNrmuu2JgHbkUHsJX6LvvGq71VNiaCts6CK0QQIxkBpIaQ/e8y9K+oKmfq3mVN5DUxe eb3Sjh2oSYEaNCmqp3Gj2685zBA/ZIxn5X5N7ZHc5yNUEHGc5xCNB+//7q6BYKDlNsPckYTCS ag9yMMm
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/IvhBgmgwfaML7VawgtOn_FZ9BZ4>
Cc: draft-ietf-jsonbis-rfc7159bis.all@ietf.org
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Mar 2017 21:50:47 -0000

On 2017-03-13 22:06, Matthew Miller wrote:
> Hello JSONbis,
>
> The security directorate review discussion has raised the issue of
> encoding detection.  The original table from RFC 4627 was removed from
> RFC 7159 due to a lack of consensus.  In this latest round, there have
> been a number of comments have been made supporting (and against) adding
> more guidance than is currently present.
>
> The chair asks for a call on the following from the working group:
>
> 1) Does the working group think adding any text on how to detect the
> encoding worthwhile?

Yes.

> 2a) If such text is worthwhile, is the following proposed text from Nico
> Williams acceptable (to be appended to Section 8.1)?
>
> """
>    Implementors MAY count the number of ASCII NULs in the first four

s/MAY/can/

>    bytes of any JSON text to detect which of UTF-8, UTF-16, or UTF-32
>    the text is encoded in:
>
>     - if the count is zero, then the text is encoded in UTF-8
>     - if the count is one or two, then the text is encoded in UTF-16
>     - if the count is three, then the text is encoded in UTF-32
>
>    This results from a) JSON texts having to start with an ASCII
>    character, b) no unescaped NULs being allowed in JSON strings, and c)
>    any type being allowed at the top-level, thus the first character may
>    be a double-quote and the second may be any permissible, unescaped
>    Unicode codepoint.  An ASCII character requires a NUL-valued byte in
>    UTF-16 encoding, three in UTF-32, and none in UTF-8.

...and add that if the number of octets is less than 4, it's UTF-8.

> """
> 2b) If such text is worthwhile but Nico's proposal is not worthwhile,
> what would be acceptable?
>
> Please respond by March 16.
> ...

I personally would prefer the table that I posted, as directly 
identifies byte ordering as well:

>            00 00 00 xx  UTF-32BE
>            00 xx 00 xx  UTF-16BE
>            xx 00 00 00  UTF-32LE
>            xx 00 xx 00  UTF-16LE
>            xx xx xx xx  UTF-8

Best regards, Julian