Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Julian Reschke <> Tue, 28 March 2017 05:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9D315127342 for <>; Mon, 27 Mar 2017 22:34:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -5.396
X-Spam-Status: No, score=-5.396 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-2.796, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 3LNIIBRl0rwV for <>; Mon, 27 Mar 2017 22:33:58 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 4F14E128AB0 for <>; Mon, 27 Mar 2017 22:33:58 -0700 (PDT)
Received: from [] ([]) by (mrgmx102 []) with ESMTPSA (Nemesis) id 0MfBsk-1cV0sy3kVu-00OpUp; Tue, 28 Mar 2017 07:33:55 +0200
To: Tim Bray <>, "Matthew A. Miller" <>
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
Cc: "" <>
From: Julian Reschke <>
Message-ID: <>
Date: Tue, 28 Mar 2017 07:33:52 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:Dyxb9zaS4ayquyXa2grq09repleNjX9xXG5yd1eOkX8uSuwOsUI 5pLkJHwOqECC+XG+KZp0TNbq0mPpNHGa8RyP117W0RFktiTOnYlMvwraILIawSLCjaqHV5I JmExKEcYHtZyo5IZ9PQFYJNcidnjV9HJ81HGz/sb7yh3XVb0D4VEGogjwYbsW6kYq8BJtue VwLf1SXFNKs9bgdaaflsg==
X-UI-Out-Filterresults: notjunk:1;V01:K0:PfqfHrIqudo=:BX5bWHb3d67tT4P8KHtnzH CKrpZ42btgemV54XGVcGI+5A/ZHdyO4l+dA+meKT/nNAIdfptj1shN6Pe8btkoLct1EUAIoUm O/OXqTna8hHeAa9QBQhmjPJ2rpZF4N/NH9TX3Cr4/cuSPhZBvHgNe5eaEKdU6lbPLhCr19sRT VTY7karGZ6F2MDv50X3eJD2XbumuvRre1bKMdbcDRhuPOn7T6YoEn5s9YPJQxqQFDltLOhnqG Pg9TZSwezkSHUKFYa8JuO7njPc0cJ4Z+gxQatD2gjDwTjNGTQ7FhuGlxi6G5tZO1m3X2CONbV B+pqVrevsqJR6yZ0sneO+IS9lzBuyTAvTpgzYy2oXjgkYaPafqu5MurmDbd0JZ3pFFsRzJJQJ pwlAXtbehrmxPnUaIMCLgZYne1/ChuC9QoDZ++q7CbUzUbe/50OTlRiZmqoX6DIIwZyFrJL3W wTFYeMQYHih9pVHJ3MigXCc8QfsV37rc9RKM1/BJzSRTz+BxaLEdVFltBlkE9ZqSeMRFHKHVk crhCI15KNYN1haEHwnc1f+M2Uv+ZgipCkfzVv71erBSkOLsvpG1D3cx46XjJETaYBmj4VQZvX /X8+JS/ypK8gwfMF6keCm/mJTBUz/NPZD6E3Vop5FF1oyVfjfi6NyoRM1vh30QYywT/pvsyBj pk21WA5vojQvOUPxhcCLgvcrggHYVoSyi85oH87xAP5YJHsO5/VgTm9AhyFeNJQRwQbkRfFE6 sui7EbimUxLYt1p2xFpaISFtGT9OuaEntg+eGvCns4hyO6SxSd7x/KnjAjg1fO//gzC3V7Tfh noLePzv
Archived-At: <>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 28 Mar 2017 05:34:01 -0000

On 2017-03-28 06:48, Tim Bray wrote:
> First of all, let me say that I’m delighted with, and fully support, the
> promotion of the status of UTF-8 in the JSON RFC to MUST.  I suspect
> this steps way outside the JSONbis charter, but that’s a problem for
> chairs and ADs, not yr humble editor.
> Comments on Matt's proposed text:
> 1. How about a very short historical note, along the lines of: “Previous
> specifications of JSON, including the predecessor RFCs, have not
> required the use of UTF-8 for use with the application/json media type.
> However, implementors of JSON-based software have overwhelmingly chosen
> to use the UTF-8 encoding, to the extent that it is the only realistic
> way to achieve interoperability in software which generates or consumes
> JSON.”
> ... moving on...

If we do this, we'll have to add it to the "changes from 7159" section.

> ...
>     Recipients that wish to support Unicode encodings other than UTF-8
>     can do this using a detection mechanism that is based on the fact
>     that the first character will always have a Unicode code point
>     greater than 0 and less than 128, thus the UTF-16/32 variants can
>     be detected by inspecting the first octets for nulls.
> ​3. Is it just me, or does it feel really dorky to talk mysteriously
> about this detection mechanism without providing details?  On top of
> which, anyone who's writing the kind of software that might lead one to
> consult ​an RFC first shouldn't bloody well use anything but UTF-8.  If
> people really want to have this, I think we owe the world an outline of
> the algorithm, maybe in an appendix. I'll volunteer to make my best
> effort to draft it and try to get consensus that it's correct..  If we
> can't, that's a powerful symbol that we shouldn't have this language.
> But that's my fallback position; my real request to the group is that we
> just take this out.

That was proposed before; it seems some participants are opposed to 
saying "too much" about the detection, leading it to be implemented more 
than before.

Best regards, Julian