Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

"Matthew A. Miller" <linuxwolf+ietf@outer-planes.net> Wed, 10 May 2017 17:13 UTC

Return-Path: <linuxwolf+ietf@outer-planes.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 557E312948B for <json@ietfa.amsl.com>; Wed, 10 May 2017 10:13:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=outer-planes-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4S6x3C_qFOKj for <json@ietfa.amsl.com>; Wed, 10 May 2017 10:13:09 -0700 (PDT)
Received: from mail-io0-x244.google.com (mail-io0-x244.google.com [IPv6:2607:f8b0:4001:c06::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7B1D3127275 for <json@ietf.org>; Wed, 10 May 2017 10:13:09 -0700 (PDT)
Received: by mail-io0-x244.google.com with SMTP id m4so752988ioe.0 for <json@ietf.org>; Wed, 10 May 2017 10:13:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outer-planes-net.20150623.gappssmtp.com; s=20150623; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to; bh=62de5DQDulr+lhrcTM9QHgznNXfRGI+4gIL0+GSYxx4=; b=I6OJthYDBgBZ4TnAujVHbYDmSjJuelwFEFEEQbf8BxfeXS2TpSAHWKO3n0WZpTt6eJ uN5mOcJ/bR01ktzsjbcB3bQ7IGAzkVJAEO2WhdlJgcYgI3KC3rKVC0uX2D9iiR1O8DQR eMZvGXwWcQIl5sTrgKcJTgAMlHANv8agWEO5MHPdGZtl5jDsI6OlM5kCMIabs2xPnxFO mzu9RfxBbp+qaxnxnQU1042SzJ/UnzjOjbbqqCWDaeQn81+CJOcuWnmvaTefgv8Dzah1 s/v4nyBY0p8E6BY7oO3wnB/Hwytgj2VWJwcczas25m2yik4Ra+/7jx6vxQ2KkbTBHh+d Xs0A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to; bh=62de5DQDulr+lhrcTM9QHgznNXfRGI+4gIL0+GSYxx4=; b=ECt6HwK654qNob60OG9zBaywo4+9JeOQvepwt73GRWTYAG4cm4/tMP/h+LOa1p/9BL qonU3NRGt6zGoZMro9jzKISddtpPNxQksi3JiFv4MAiaDZAEib1pISkoCPWu0BHy2lI2 ASCbfIm8/9+fxFtfhTWb1qGeTfgZMi4y+f/EWw3pr5yz8coRqay8YsscxSHXPrzxldZL AlYkCKGpSWKoZg421XB3wu5ahQ7vpf6xOx7hao+QlgY0j8+lEeqc9hL8hkJvfTlB2Cj9 nK8JQdLo4yrMMNfQ09vfynKHBx/UDhZ/GAmn+4N1PR0ohlcke+VY1TcQWbukqC7aE85E //Iw==
X-Gm-Message-State: AODbwcD77iorvdZDfCb+VcUMOyOwJlyvL0/KeuFfCaFOVPF68DFVA+wX l973P3BBRr4gDw==
X-Received: by 10.107.57.198 with SMTP id g189mr4304904ioa.123.1494436388688; Wed, 10 May 2017 10:13:08 -0700 (PDT)
Received: from [192.168.29.239] (c-73-217-32-196.hsd1.co.comcast.net. [73.217.32.196]) by smtp.gmail.com with ESMTPSA id h142sm1888064ith.31.2017.05.10.10.13.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 May 2017 10:13:07 -0700 (PDT)
Sender: Matthew Miller <linuxwolf@outer-planes.net>
To: Pete Cordell <petejson@codalogic.com>, Julian Reschke <julian.reschke@gmx.de>, "json@ietf.org" <json@ietf.org>
References: <e69d7c21-85cb-45f4-c0c2-34c624e63049@outer-planes.net> <1e94516c-9c82-8b0e-0d2d-7dbaa83b21bd@outer-planes.net> <40e3207f-e047-c898-1f0c-4422de1d597a@it.aoyama.ac.jp> <1b3ec14a-927a-8d46-e3d3-9807a9588437@outer-planes.net> <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com> <b74cb31b-8e04-17d0-548a-fc164ce07c05@outer-planes.net> <20170417175627.GK23461@localhost> <10B651F1-7FE0-484D-BD2E-FD146BC5FB04@tzi.org> <eabbccb0-8d15-d595-7cd0-37acc0621c57@it.aoyama.ac.jp> <6eb23f90-6623-7888-bc1c-6640a9dababc@codalogic.com> <61bfad2b-850d-a11f-e80b-d5ed9ccb4dc9@codalogic.com> <08a88696-65ef-da05-0d77-1a07d04ebfc8@outer-planes.net> <bb9fead6-23e7-8c1d-bc80-b60c81c4b89a@codalogic.com> <6f047d01-ad72-59ab-9d34-20a8177ab3af@outer-planes.net> <be4d9f12-a4be-3723-e52a-56a60722a75f@gmx.de> <a3805f67-620b-67f0-9c06-c865b71029e7@codalogic.com> <bb1ef6a8-506c-344b-b903-980ed50ad2d3@gmx.de> <44b4523a-5e4b-ccad-af96-931d8b9ad1c2@codalogic.com>
From: "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
Message-ID: <ac1d1b68-67e7-c19f-a556-280df73f465b@outer-planes.net>
Date: Wed, 10 May 2017 11:13:06 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Thunderbird/53.0
MIME-Version: 1.0
In-Reply-To: <44b4523a-5e4b-ccad-af96-931d8b9ad1c2@codalogic.com>
Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="QqjApumpEovQe0FCdGwlDUDuAj7OcM16T"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/Bu10Fq9l_kZJY8ecBnacX4EK0OU>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 May 2017 17:13:11 -0000

On 5/10/17 8:04 AM, Pete Cordell wrote:
> On 10/05/2017 14:08, Julian Reschke wrote:
>> I believe we should have separate names for JSON represented as a
>> sequence of characters (such as in a string variable in a programming
>> language) and for a JSON-shaped octet sequence inside a
>> "application/json"-typed (HTTP) message. For the latter, enforcing UTF-8
>> IMHO is attractive.
>>
>> I think it's ok for the spec to talk about both, but it really needs to
>> be clear what we are talking about in each section.
> 
> It's an interesting thought on JSON in programs.  It would be strange to
> be able to say it was not valid JSON if it was encoded in a string
> inside a Shift-JIS encoded Ruby program for example.
> 
> My ISO layers are rusty, but it looks like we can talk about JSON
> character sequences somewhere above the transport layer, and JSON
> encoded messages somewhere below the transport layer.  The latter
> possibly being transport specific.
> 
> To me it would seem discussion of "JSON" (without any further
> refinement) ought to be independent of the lower layer transport
> encoding aspects.
> 
> "application/json" would be one transport specific encoding (for which I
> think most are happy with only UTF-8).  JSON inside a Shift-JIS encoded
> Ruby program is in effect another form of transport for a JSON message.
> 
> Cheers,
> 
> Pete.
> 


I believe in essence it is within our purview to set expectations of
what transits a wire protocol.

That phrasing is broad and likely vague.  To get a more specific, I
believe it is within scope to cover instances where a media type is
specified and that media type is "application/json" (e.g. HTTP bodies),
but I think it is also within scope for this document to essentially say
"where a protocol says 'use JSON here' then it is encoded as".  I don't
believe it is within scope to dictate JSON be encoded in any particular
manner when placed into a storage medium, or when embedded within other
content (e.g., the Shift-JIS encoded Ruby program even if said program
were transmitted over a network protocol).

Assuming the Working Group finds that scope acceptable and finds UTF-8
only acceptable, here is a starting proposal for text:

"""
8.1.  Character Encoding

When transmitting over a network protocol, JSON text MUST be
encoded in UTF-8 (Section 3 of [UNICODE]).

Previous specifications of JSON have not required the use of UTF-8
when transmitting JSON text. However, the vast majority of
JSON-based software implementations have chosen to use the UTF-8
encoding, to the extent that it is the only encoding that achieves
interoperability.

Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a JSON text.  In the interests of interoperability,
implementations that parse JSON texts MAY ignore the presence of a
byte order mark rather than treating it as an error.
"""

If you find this acceptable, please indicate that.  Otherwise, please
provide suggested changes.



- m&m

Matthew A. Miller