Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

"Matthew A. Miller" <linuxwolf+ietf@outer-planes.net> Mon, 17 April 2017 17:41 UTC

Return-Path: <linuxwolf+ietf@outer-planes.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3D0A31316FB for <json@ietfa.amsl.com>; Mon, 17 Apr 2017 10:41:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=outer-planes-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wfm7-Ze91yOL for <json@ietfa.amsl.com>; Mon, 17 Apr 2017 10:41:35 -0700 (PDT)
Received: from mail-oi0-x22b.google.com (mail-oi0-x22b.google.com [IPv6:2607:f8b0:4003:c06::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 396881316E8 for <json@ietf.org>; Mon, 17 Apr 2017 10:41:32 -0700 (PDT)
Received: by mail-oi0-x22b.google.com with SMTP id b187so149762042oif.0 for <json@ietf.org>; Mon, 17 Apr 2017 10:41:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outer-planes-net.20150623.gappssmtp.com; s=20150623; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to; bh=+U5xzfgQM2gJeFdXYHGnxX1YriZ/3SR3EMdyDUnR3T0=; b=Bl4tyl6iJ0IgEEolW0TS9BvPMQFDn5K/uqrGfrLyLmuCpDPpWUbOpwWlajDkLpEKK4 STauCdSXgbK/PYCOgbZLBPutx/WfeWjT6as4WmEU3C30GPZPzTo3GMVbSQQX3FcYa0wu NxYxmfRqbbrPLCD0BsRIBu406QtSzqffqmMhUIQxiW0inkwGRI+bmSPuZ+84DNIZGKHV nWxFXUyovrs2zWmrBG6AmqFeXoP/76VegBM2VmQnLvKICbWwXJB+jLCdDmCgaonA6/E5 VGIguqVdU17nHLxRAwHyrOO4w0eNTPF3emdt3AfkY7nux6PzRi1UYM9g3lWOtEXA4MbX Q/Lg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to; bh=+U5xzfgQM2gJeFdXYHGnxX1YriZ/3SR3EMdyDUnR3T0=; b=iOYDAoPifUQiYOHFQ9Z1hmX4QJ5vhy2PUfTNb2HjKK+mruVRVYRWuXfxTPxj9WBEDI tS1YJ3omIFrDqm0g+k01g3jFbRrQ7gj5ipDW8VoPWcSHdeBxfACzvvZ6PdtX3u6a18SG MOX1feqHPQZ3aA3SJlF1jJobt0XtO/uIR2fw9fz9nzd/J5w5sNxYBsfpGDN9ApY6qc3i R39Zxz2CcGb0uIVHXCDNIIcpix5Ab00oqwp2hylBHffLE8HeTIkfJOaEUNSg1M/W2Lmc cliwKnjupZet1gGQNbzc1aI65tzUOAxMBXdcogJ0oC/1aps1SxW5QUG757PE71OUSM+2 m7rA==
X-Gm-Message-State: AN3rC/7ryk5Z22k5P8yPS8oVGTcxMwfQnAuaUG/Nkl7aKhLZ3QOxl9// RTUukYERl4xVng==
X-Received: by 10.202.183.5 with SMTP id h5mr5741964oif.195.1492450891376; Mon, 17 Apr 2017 10:41:31 -0700 (PDT)
Received: from [10.6.23.170] ([128.177.113.102]) by smtp.gmail.com with ESMTPSA id h189sm5032858oic.37.2017.04.17.10.41.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Apr 2017 10:41:30 -0700 (PDT)
Sender: Matthew Miller <linuxwolf@outer-planes.net>
To: Tim Bray <tbray@textuality.com>
Cc: "json@ietf.org" <json@ietf.org>
References: <1fb5849e-8dbf-835d-65b7-2403686248f9@outer-planes.net> <0E32A94D-CE12-4F52-9ED6-8743C49751B4@vpnc.org> <4d2f0fb3-a729-0c17-2394-bc1e005dd612@gmx.de> <d09f9a59-2411-45a0-470c-ea95072fe4fd@outer-planes.net> <dad91b19-e774-e239-36d2-9d086cca8e0d@gmx.de> <ac432615-ee84-3cdf-6b37-480626bd18c1@gmx.de> <804f9930-26a5-a565-0607-452b386cfeb5@outer-planes.net> <D89BCFAA-B81F-4EEB-8B3A-180BAAB9D16C@att.com> <e69d7c21-85cb-45f4-c0c2-34c624e63049@outer-planes.net> <14252631-AD76-4537-89BF-6368F4A8CDF4@att.com> <7e6af21f-16ea-a3bc-9c01-595ae8acebba@gmx.de> <05100401-88D4-4158-A3FF-3EF144D85449@att.com> <CAD2gp_T0bfpnsCA_t4BAMtEhr7p8JkZggjnY4F+m9-M2hWLfmw@mail.gmail.com> <1e94516c-9c82-8b0e-0d2d-7dbaa83b21bd@outer-planes.net> <40e3207f-e047-c898-1f0c-4422de1d597a@it.aoyama.ac.jp> <1b3ec14a-927a-8d46-e3d3-9807a9588437@outer-planes.net> <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com>
From: "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
Message-ID: <b74cb31b-8e04-17d0-548a-fc164ce07c05@outer-planes.net>
Date: Mon, 17 Apr 2017 11:41:29 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.0
MIME-Version: 1.0
In-Reply-To: <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="27d8Ibhnlj0TI78usrHRuGfoWJkqTBGaa"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/rfyg77yOplcAWcuVfR28AX7VT3Q>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Apr 2017 17:41:36 -0000

On 17/03/27 22:48, Tim Bray wrote:
> First of all, let me say that I’m delighted with, and fully support, the
> promotion of the status of UTF-8 in the JSON RFC to MUST.  I suspect
> this steps way outside the JSONbis charter, but that’s a problem for
> chairs and ADs, not yr humble editor.
> 
> Comments on Matt's proposed text:
> 
> 1. How about a very short historical note, along the lines of: “Previous
> specifications of JSON, including the predecessor RFCs, have not
> required the use of UTF-8 for use with the application/json media type. 
> However, implementors of JSON-based software have overwhelmingly chosen
> to use the UTF-8 encoding, to the extent that it is the only realistic
> way to achieve interoperability in software which generates or consumes
> JSON.”
> 
> ... moving on...
> 
> ​​
> O
> ​​
> ​​
> n Mon, Mar 27, 2017 at 1:04 PM, Matthew A. Miller
> <linuxwolf+ietf@outer-planes.net
> <mailto:linuxwolf+ietf@outer-planes.net>> wrote:
> ​​
> 
>     ​​
>     JSON text SHOULD be encoded in UTF-8 (Section 3 of [UNICODE]); JSON
>     ​​
>     text MAY be encoded in UTF-16 or UTF-32 if the generator is certain
>     ​​
>     the intended recipients can process it. JSON text MUST NOT be encoded
>     ​​
>     in any encoding other than UTF-8, UTF-16, or UTF-32. When used with
>     ​​
>     media type "application/json" the JSON text MUST be encoded as UTF-8.
> 
> 
> ​2. Seriously, why the “JSON text MAY be encoded in… can process it ​”
> phrase?  It’s a distraction, and if people want to do that, we can’t
> stop them, but we shouldn't waste RFC space talking about practices that
> are not remotely interoperable.  The I in IETF stands for Internet, and
> JSON on the Internet is UTF-8, end of story.
> ​​
> 
>     Recipients that wish to support Unicode encodings other than UTF-8
>     can do this using a detection mechanism that is based on the fact
>     that the first character will always have a Unicode code point
>     greater than 0 and less than 128, thus the UTF-16/32 variants can
>     be detected by inspecting the first octets for nulls.
> 
> 
> ​3. Is it just me, or does it feel really dorky to talk mysteriously
> about this detection mechanism without providing details?  On top of
> which, anyone who's writing the kind of software that might lead one to
> consult ​an RFC first shouldn't bloody well use anything but UTF-8.  If
> people really want to have this, I think we owe the world an outline of
> the algorithm, maybe in an appendix. I'll volunteer to make my best
> effort to draft it and try to get consensus that it's correct..  If we
> can't, that's a powerful symbol that we shouldn't have this language. 
> But that's my fallback position; my real request to the group is that we
> just take this out.
> 

[ /me doffs hat ]

Thinking about this more, putting an encoding detection algorithm as an
appendix seems like a reasonable compromise to me.  To start, how about
removing the detection text from Section 8.1 and have an appendix that
starts with that text plus the table?

Assuming the above, what does everyone think of the following for
Section 8.1?

"""
JSON text SHOULD be encoded in UTF-8 (Section 3 of [UNICODE]). JSON
text MUST NOT be encoded in any encoding other than UTF-8, UTF-16,
or UTF-32. When used with media type "application/json" the JSON
text MUST be encoded as UTF-8.

Previous specifications of JSON have not required the use of UTF-8
with the "application/json" media type. However, the vast majority
of JSON-based software implementations have chosen to use the UTF-8
encoding, to the extent that it is the only encoding that achieves
interoperability.

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a JSON text.  In the interests of interoperability,
implementations that parse JSON texts MAY ignore the presence of a
byte order mark rather than treating it as an error.
"""


- m&m

Matthew A. Miller