Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Julian Reschke <> Sun, 19 March 2017 10:29 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 25E30120724; Sun, 19 Mar 2017 03:29:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.895
X-Spam-Status: No, score=-4.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id GnMwDKZFzym3; Sun, 19 Mar 2017 03:29:38 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id BD5C81201FA; Sun, 19 Mar 2017 03:29:37 -0700 (PDT)
Received: from [] ([]) by (mrgmx102 []) with ESMTPSA (Nemesis) id 0MB1C4-1cxHKc0UmC-009xu5; Sun, 19 Mar 2017 11:29:34 +0100
To: "Matthew A. Miller" <>, "" <>
References: <> <> <> <> <>
Cc: The IESG <>
From: Julian Reschke <>
Message-ID: <>
Date: Sun, 19 Mar 2017 11:29:34 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:qX5bH9K6bTJS34FOz3rb/Nm6Zqu+O6HMcXn3hsCqP7coGG2v4wF J/M1XiQUD8faQbcSAzxXyBVvtZMEvtt2rJLYXmf6SdG5dzw8WMZdDY2bS2yLAUbCDucoNYh fv0NTklIqCfX/QfTEwB16tpRdB+mMs9fYCJQXxziacdZVkufdjGXhcb5p/boP1+0tNVyEYi fOY6NtiZylkAVpOjOgFrQ==
X-UI-Out-Filterresults: notjunk:1;V01:K0:NKxzqpMndvI=:Gv6bGp2mqgkx6n579xY/Zg X2eMhybLPXb0TeI9YZne1sRMxlfmi0iVpLMUro4s3poWGK5ymW8tOXXdMty64ek33NH5nhlwv YGt/8Z2u2G4lGkfXgvuhfnfHUFsgjC529fsiq/3Cr3ly3qg2sfl7AwmH6dU/VXu0CPJMxbWPw cFQ+HsVKbUyCQ6xHh8rKR1YsCIk1ZwFG7KkmQkwEvWOsDPSqU7UgCDPqXI7u7BmEVxucgedJF R4wIgXts3u+LkRBSflGz9fomEwnl+NgWVmHAqDSvuiMAb/urWdppcM2QArxFSb0LHwvxcreEv oe0FRUpYG2wzxlZlXbGT7vr+iXHs/RCspM6I0byLEoCYWPNCi5xDi5YQiiiigjyHtIjS6/k7R Zii5RZgBVOHEf27H0mqiajKgy6Z+Va74wmqmBSa1Xoc1fy+DOU6MmMXSEIL63dPaVHSdhzJbm i5911Up7aGETw0lQg/N9lwxKEBj7H2IvkL5LDc/cpgMYhnih1mlT701xF676rlMys+jex/l09 kluAZCAvdGrJ0Vq5DALhQMllUpBeuCUgqVErP3RSJhLujIgYbvbA833zjB4CqE1UHmlSDTWUq N4sSzg9FaaStvqtbgNikDLikLS0GpM/GLQ+xm8GdZbYThmK6deUWO0cEIBT3pvkAoFsqfVhyW xEG5kWO4GRWvTWj3u7XBTgKwIzkglEB65p9B387A7dc930srinauoI8wIo1FzsCUMWHXRO1gT DNbWt8wg98NYWaQVZrHX9x1vCa3ERSrF6iqNevUny17GHftahaHOisEbhhtpoSnrykAU14PyB ANht/2E
Archived-At: <>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 19 Mar 2017 10:29:40 -0000

...and here is a concrete proposal:

Original text:

> 8.1.  Character Encoding
>    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32 [UNICODE]
>    (Section 3).  The default encoding is UTF-8, and JSON texts that are

Change 1:

Say "MUST" instead of "SHALL", as it's the more common form of 
expressing this requirement.

Change 2:

Replace "[UNICODE] (Section 3)" by "Section 3 of [UNICODE]".

That said, this citation isn't as stable as it should, as [UNICODE] 
refers to <> and unless I'm 
missing something, there's no guarantee that future versions will have 
the relevant bits in Section 3.

>    encoded in UTF-8 are interoperable in the sense that they will be
>    read successfully by the maximum number of implementations; there are
>    many implementations that cannot successfully read texts in other
>    encodings (such as UTF-16 and UTF-32).

Change 3:

Add "Text encoded in character encodings other than UTF-8, UTF-16, or 
UTF-32 can not be used with the media type "application/json".

(this explains the implications of the SHALL/MUST)

>    Implementations MUST NOT add a byte order mark (U+FEFF) to the
>    beginning of a JSON text.  In the interests of interoperability,
>    implementations that parse JSON texts MAY ignore the presence of a
>    byte order mark rather than treating it as an error.

Finally, change 4:

Add a new paragraph:

"Recipients that wish to support Unicode encodings other than UTF-8 can 
do this using a detection mechanism that is based on the fact that the 
first character will always have a Unicode code point less or equal than 
127, thus the UTF-16/32 variants can be detected by inspecting the first 
octets for nulls."

I believe none of these changes affects anything normative, but that 
they absolutely clarify the spec. In particular, having them in the spec 
would have avoided this whole discussion we just had.

Best regards, Julian