Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Pete Cordell <> Tue, 18 April 2017 13:01 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id B9CAE12EBBE for <>; Tue, 18 Apr 2017 06:01:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.92
X-Spam-Status: No, score=-0.92 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id I065_AAfelXv for <>; Tue, 18 Apr 2017 06:01:29 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 4086112EBBB for <>; Tue, 18 Apr 2017 06:01:29 -0700 (PDT)
Received: (qmail 15939 invoked from network); 18 Apr 2017 13:53:55 +0100
Received: from (HELO ? ( by with ESMTPSA (DHE-RSA-AES128-SHA encrypted, authenticated); 18 Apr 2017 13:53:55 +0100
To: =?UTF-8?Q?Martin_J._D=c3=bcrst?= <>, Carsten Bormann <>, "Matthew A. Miller" <>
References: <> <> <> <> <> <> <> <> <> <> <20170417175627.GK23461@localhost> <> <>
Cc: "" <>
From: Pete Cordell <>
Message-ID: <>
Date: Tue, 18 Apr 2017 14:01:28 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Archived-At: <>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 18 Apr 2017 13:01:32 -0000

On 18/04/2017 06:22, Martin J. Dürst wrote:
> On 2017/04/18 05:47, Carsten Bormann wrote:
>> On Apr 17, 2017, at 19:56, Nico Williams <> wrote:
>>>> Thinking about this more, putting an encoding detection algorithm as an
>>>> appendix seems like a reasonable compromise to me.  To start, how about
>>>> removing the detection text from Section 8.1 and have an appendix that
>>>> starts with that text plus the table?
>>> Or we could even just assert that such an algorithm is possible, and
>>> that implementors MAY implement one.
>> Indeed.
>> Broken record mode:
>> — writing up the algorithm sounds like encouraging implementation.
>>   We *don’t* want people to implement this!
>>   (The whole interminable non-UTF-8 saga probably just was a nod from
>> the RFC 4627 authors to the remnants of UTF-16 land, which mostly have
>> died off since.  Why resurrect?)
>> - there have been about 15 attempts to define this algorithm on the
>> mailing list.
>>   All were wrong.
>>   An Internet Standard should contain tried and true material, not
>> errata fodder.
>> - an implementer is in a much better position to get this right than
>> the standard, because they can write unit tests.
> I completely agree with Carsten. As far as I know, and as far as we have
> been told on this list, if some JSON isn't in UTF-8, then it simply will
> not interoperate.


If we do do this, I think we could add some example test messages to 
helps with the development, e.g.:

     "U+0100" (where U+0100 is the UTF form of the character, not ASCII)

> In my view, the only reason to still have a MAY for UTF-16/32 is that
> this will avoid questions like: "I have a JSON parser in language FOO,
> it can take a string or an input stream as an argument. In FOO, strings
> are UTF-16, but the JSON RFC doesn't seem to allow this. What should I do."

IMO the answer to that is, "that's why it says 'JSON text _SHOULD_ be 
encoded in UTF-8'".

I agree with John Cowen, that use of UTF-16/32 is purely for internal 
scenarios; not on the Internet.  As such, I believe the IETF is going 
beyond its remit to say you can use UTF-16/32 for your internal purposes 
that I know nothing about and care nothing about, but you're not allowed 
to use other encodings that maybe more natural for your system.

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers,
Read & write XML in C++,