Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

"Matthew A. Miller" <linuxwolf+ietf@outer-planes.net> Thu, 27 April 2017 16:23 UTC

Return-Path: <linuxwolf+ietf@outer-planes.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0E60D1270A7 for <json@ietfa.amsl.com>; Thu, 27 Apr 2017 09:23:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=outer-planes-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id km6OQZCxYP35 for <json@ietfa.amsl.com>; Thu, 27 Apr 2017 09:23:45 -0700 (PDT)
Received: from mail-io0-x242.google.com (mail-io0-x242.google.com [IPv6:2607:f8b0:4001:c06::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D60C7129646 for <json@ietf.org>; Thu, 27 Apr 2017 09:21:36 -0700 (PDT)
Received: by mail-io0-x242.google.com with SMTP id h41so4497089ioi.1 for <json@ietf.org>; Thu, 27 Apr 2017 09:21:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outer-planes-net.20150623.gappssmtp.com; s=20150623; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to; bh=pVEbxSbB/ymQ+IiH/TUQU+xXjv9QYpTpqy6PWzAolJU=; b=kJnqdSkzSyrVSIs54jZepZkMUtfg1Wh5j2p3aE9j7Bb+sYG4Ves9HnK9ggn/SFxFOq JYM9wRmsJwKozDQCm5GyHuqFz/it7GaYlYZq84XIm+k2h1exgMHR0VBMdqnJXkF4n2bP x/RnVuX6O5zqoqQ9drp4bXyppdWaKnkRDWgSomcoQA4/uTOB5p1rzIIBS3Kv4oNbz/qw e8pAqKHLRGr9RMNS86ppAC6D3x50LF4BwzV2xHj61Y5RRiZ3CLGWT8PqdV8dkhxLpG4k 9QZy9aoXbag+D4DG0DkTu3rbrzZ15K8HgNwaidp1jvnFisbEMGfBSShUV0dGKtBlBeu8 KC3g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to; bh=pVEbxSbB/ymQ+IiH/TUQU+xXjv9QYpTpqy6PWzAolJU=; b=lJ1n1hVqoML0iC2E0VD/GT6kMBq3GYXUE5idB6VM4PXQIF7OilsoLHMU/b2iRAk7pm 7mquP7Hy+4JAwhGFeQhz+s5ohJp6Yz6m1Qg4PxUhMfosrA+2v3Jkm1jQhzVTWoR3ZhTK vUaLASIK9ehFLbDkAE0qbysy8OL0Jou3H6ylcPhBFp+hxGxe/ERNmkEP6XgRjoi/iOV7 WETqNd/7LBysX6UO6BjUtyvwWt7hXbOJy1hnUtYnmjfo7AAXtYodiW5gIrmw7wDd4A2L Iv0ZtfnAFXlx8HBKuX0jT0LzkmXshnsgTOPr3ToB9ZxduOqo2+YNaH8+mJBo8E3dJGru 8/1A==
X-Gm-Message-State: AN3rC/5W3AmwhAQSeFx+CGkufIwQEAMIftC/y0VTk5iZ5gmtKmp65s3d suEouShUUyic4jtJfBE=
X-Received: by 10.157.47.173 with SMTP id r42mr418577otb.63.1493310095660; Thu, 27 Apr 2017 09:21:35 -0700 (PDT)
Received: from [10.6.23.170] ([128.177.113.102]) by smtp.gmail.com with ESMTPSA id a40sm1385419oic.11.2017.04.27.09.21.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Apr 2017 09:21:34 -0700 (PDT)
Sender: Matthew Miller <linuxwolf@outer-planes.net>
To: Pete Cordell <petejson@codalogic.com>, =?UTF-8?Q?Martin_J._D=c3=bcrst?= <duerst@it.aoyama.ac.jp>, Carsten Bormann <cabo@tzi.org>
Cc: "json@ietf.org" <json@ietf.org>
References: <e69d7c21-85cb-45f4-c0c2-34c624e63049@outer-planes.net> <14252631-AD76-4537-89BF-6368F4A8CDF4@att.com> <7e6af21f-16ea-a3bc-9c01-595ae8acebba@gmx.de> <05100401-88D4-4158-A3FF-3EF144D85449@att.com> <CAD2gp_T0bfpnsCA_t4BAMtEhr7p8JkZggjnY4F+m9-M2hWLfmw@mail.gmail.com> <1e94516c-9c82-8b0e-0d2d-7dbaa83b21bd@outer-planes.net> <40e3207f-e047-c898-1f0c-4422de1d597a@it.aoyama.ac.jp> <1b3ec14a-927a-8d46-e3d3-9807a9588437@outer-planes.net> <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com> <b74cb31b-8e04-17d0-548a-fc164ce07c05@outer-planes.net> <20170417175627.GK23461@localhost> <10B651F1-7FE0-484D-BD2E-FD146BC5FB04@tzi.org> <eabbccb0-8d15-d595-7cd0-37acc0621c57@it.aoyama.ac.jp> <6eb23f90-6623-7888-bc1c-6640a9dababc@codalogic.com> <61bfad2b-850d-a11f-e80b-d5ed9ccb4dc9@codalogic.com>
From: "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
Message-ID: <08a88696-65ef-da05-0d77-1a07d04ebfc8@outer-planes.net>
Date: Thu, 27 Apr 2017 10:21:33 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Thunderbird/53.0
MIME-Version: 1.0
In-Reply-To: <61bfad2b-850d-a11f-e80b-d5ed9ccb4dc9@codalogic.com>
Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Xj81pJuWNNW2MfxwgdlagLjKV9dpLjWuv"
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/eqkyJYwtCfOjpRsggNC_2NgpbxU>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Apr 2017 16:23:48 -0000

I see consensus for text in Section 8.1 pending an Appendix on encoding
detection, but nothing for the Appendix itself.

Looking deeper in the threads again, it appears to me that:

* There is consensus to say "just use UTF8" in many (most) scenarios
* There is rough consensus to say "always use UTF8", but concern this
change goes beyond the charter
* There is no consensus on a detection algorithm

While the charter calls for "absolute minimal changes", it calls out RFC
7159 (and its -bis) as documenting "interoperability concerns when
exchanging JSON over a network".  Documenting this interoperability is
one of its primary goals.

Therefore, the argument is made that a change to the effect of "JSON
text MUST be encoded as UTF-8" is considered an "absolute minimum
change" in order to achieve the goal of network interoperability.

To the working group:

* Is there strong objection to mandating only UTF-8?
* Does anyone have suggested text to that effect?


- m&m

Matthew A. Miller
JSONbis Chair

On 4/27/17 3:15 AM, Pete Cordell wrote:
> I seem to have killed this thread off again.  Sorry about that.
> 
> Any conclusions?
> 
> Cheers,
> 
> Pete.
> 
> On 18/04/2017 14:01, Pete Cordell wrote:
>> On 18/04/2017 06:22, Martin J. Dürst wrote:
>>> On 2017/04/18 05:47, Carsten Bormann wrote:
>>>> On Apr 17, 2017, at 19:56, Nico Williams <nico@cryptonector.com> wrote:
>>>>>
>>>>>> Thinking about this more, putting an encoding detection algorithm
>>>>>> as an
>>>>>> appendix seems like a reasonable compromise to me.  To start, how
>>>>>> about
>>>>>> removing the detection text from Section 8.1 and have an appendix
>>>>>> that
>>>>>> starts with that text plus the table?
>>>>>
>>>>> Or we could even just assert that such an algorithm is possible, and
>>>>> that implementors MAY implement one.
>>>>
>>>> Indeed.
>>>>
>>>> Broken record mode:
>>>>
>>>> — writing up the algorithm sounds like encouraging implementation.
>>>>   We *don’t* want people to implement this!
>>>>   (The whole interminable non-UTF-8 saga probably just was a nod from
>>>> the RFC 4627 authors to the remnants of UTF-16 land, which mostly have
>>>> died off since.  Why resurrect?)
>>>>
>>>> - there have been about 15 attempts to define this algorithm on the
>>>> mailing list.
>>>>   All were wrong.
>>>>   An Internet Standard should contain tried and true material, not
>>>> errata fodder.
>>>>
>>>> - an implementer is in a much better position to get this right than
>>>> the standard, because they can write unit tests.
>>>
>>> I completely agree with Carsten. As far as I know, and as far as we have
>>> been told on this list, if some JSON isn't in UTF-8, then it simply will
>>> not interoperate.
>>
>> +1
>>
>> If we do do this, I think we could add some example test messages to
>> helps with the development, e.g.:
>>
>>     {"Example":1}
>>     {}
>>     "Example"
>>     ""
>>     "U+0100" (where U+0100 is the UTF form of the character, not ASCII)
>>     1
>>
>>> In my view, the only reason to still have a MAY for UTF-16/32 is that
>>> this will avoid questions like: "I have a JSON parser in language FOO,
>>> it can take a string or an input stream as an argument. In FOO, strings
>>> are UTF-16, but the JSON RFC doesn't seem to allow this. What should I
>>> do."
>>
>> IMO the answer to that is, "that's why it says 'JSON text _SHOULD_ be
>> encoded in UTF-8'".
>>
>> I agree with John Cowen, that use of UTF-16/32 is purely for internal
>> scenarios; not on the Internet.  As such, I believe the IETF is going
>> beyond its remit to say you can use UTF-16/32 for your internal purposes
>> that I know nothing about and care nothing about, but you're not allowed
>> to use other encodings that maybe more natural for your system.
>>
>> Pete Cordell
>> Codalogic Ltd
>> C++ tools for C++ programmers, http://codalogic.com
>> Read & write XML in C++, http://www.xml2cpp.com
>>
>> _______________________________________________
>> json mailing list
>> json@ietf.org
>> https://www.ietf.org/mailman/listinfo/json