Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Pete Cordell <petejson@codalogic.com> Tue, 28 March 2017 08:57 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2C521129463 for <json@ietfa.amsl.com>; Tue, 28 Mar 2017 01:57:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.92
X-Spam-Level:
X-Spam-Status: No, score=-0.92 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Iy9CYFCWghTY for <json@ietfa.amsl.com>; Tue, 28 Mar 2017 01:57:27 -0700 (PDT)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B1F3E120227 for <json@ietf.org>; Tue, 28 Mar 2017 01:57:26 -0700 (PDT)
Received: (qmail 4412 invoked from network); 28 Mar 2017 09:50:02 +0100
Received: from host31-51-57-157.range31-51.btcentralplus.com (HELO ?192.168.1.72?) (31.51.57.157) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (DHE-RSA-AES128-SHA encrypted, authenticated); 28 Mar 2017 09:50:02 +0100
To: Tim Bray <tbray@textuality.com>, "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <0E32A94D-CE12-4F52-9ED6-8743C49751B4@vpnc.org> <4d2f0fb3-a729-0c17-2394-bc1e005dd612@gmx.de> <d09f9a59-2411-45a0-470c-ea95072fe4fd@outer-planes.net> <dad91b19-e774-e239-36d2-9d086cca8e0d@gmx.de> <ac432615-ee84-3cdf-6b37-480626bd18c1@gmx.de> <804f9930-26a5-a565-0607-452b386cfeb5@outer-planes.net> <D89BCFAA-B81F-4EEB-8B3A-180BAAB9D16C@att.com> <e69d7c21-85cb-45f4-c0c2-34c624e63049@outer-planes.net> <14252631-AD76-4537-89BF-6368F4A8CDF4@att.com> <7e6af21f-16ea-a3bc-9c01-595ae8acebba@gmx.de> <05100401-88D4-4158-A3FF-3EF144D85449@att.com> <CAD2gp_T0bfpnsCA_t4BAMtEhr7p8JkZggjnY4F+m9-M2hWLfmw@mail.gmail.com> <1e94516c-9c82-8b0e-0d2d-7dbaa83b21bd@outer-planes.net> <40e3207f-e047-c898-1f0c-4422de1d597a@it.aoyama.ac.jp> <1b3ec14a-927a-8d46-e3d3-9807a9588437@outer-planes.net> <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com>
Cc: "json@ietf.org" <json@ietf.org>
From: Pete Cordell <petejson@codalogic.com>
Message-ID: <76b51f4f-42e5-19a5-f872-76536737a462@codalogic.com>
Date: Tue, 28 Mar 2017 09:57:25 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/e5No81l5zym8Yp5eZy211LzT4lU>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Mar 2017 08:57:29 -0000

On 28/03/2017 05:48, Tim Bray wrote:
> First of all, let me say that I’m delighted with, and fully support, the
> promotion of the status of UTF-8 in the JSON RFC to MUST.  I suspect
> this steps way outside the JSONbis charter, but that’s a problem for
> chairs and ADs, not yr humble editor.
>
> Comments on Matt's proposed text:
>
> 1. How about a very short historical note, along the lines of: “Previous
> specifications of JSON, including the predecessor RFCs, have not
> required the use of UTF-8 for use with the application/json media type.
> However, implementors of JSON-based software have overwhelmingly chosen
> to use the UTF-8 encoding, to the extent that it is the only realistic
> way to achieve interoperability in software which generates or consumes
> JSON.”
>
> ... moving on...


If I ruled the world, I'd strip it down even more.  Merging with Matt's 
text, I'd go with something like:

"""
JSON text SHOULD be encoded in UTF-8 (Section 3 of [UNICODE]).  When
used with media type "application/json" the JSON text MUST be encoded
as UTF-8.

     Previous specifications of JSON, including the predecessor RFCs,
     have not required the use of UTF-8 for use with the
     application/json media type. However, implementors of JSON-based
     software have overwhelmingly chosen to use the UTF-8 encoding, to
     the extent that it is the only realistic way to achieve
     interoperability in software which generates or consumes JSON.

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a JSON text.  In the interests of interoperability,
implementations that parse JSON texts MAY ignore the presence of a
byte order mark rather than treating it as an error.
"""

But I'm very happy with Matt's proposal, so please consider this more 
towards the "editorial typo" class than a blocker :-)

On:
 > ​2. Seriously, ... we shouldn't waste RFC space talking about
 > practices that are not remotely interoperable.  The I in IETF stands
 > for Internet, and JSON on the Internet is UTF-8, end of story.

The title of the RFC is "_The_ JavaScript Object Notation (JSON) Data 
Interchange Format".  To me that sounds like a fairly definitive 
definition of JSON.  As the IETF often writes stuff that's more general 
than just usage on the Internet, I think it's reasonable to believe that 
this text covers all JSON usages; and people would be justified to argue 
that.  If it is just for the Internet, we could tweak the title, add 
something to that effect to the Introduction, and just say MUST be UTF-8.

Thanks,

Pete Cordell
Codalogic Ltd