Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"

Pete Cordell <petejson@codalogic.com> Thu, 27 April 2017 09:15 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1CE13128896 for <json@ietfa.amsl.com>; Thu, 27 Apr 2017 02:15:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.919
X-Spam-Level:
X-Spam-Status: No, score=-0.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aT_QuimvLCN3 for <json@ietfa.amsl.com>; Thu, 27 Apr 2017 02:15:22 -0700 (PDT)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8EE721274D2 for <json@ietf.org>; Thu, 27 Apr 2017 02:15:21 -0700 (PDT)
Received: (qmail 24071 invoked from network); 27 Apr 2017 10:07:37 +0100
Received: from host109-156-176-225.range109-156.btcentralplus.com (HELO ?192.168.1.72?) (109.156.176.225) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (DHE-RSA-AES128-SHA encrypted, authenticated); 27 Apr 2017 10:07:37 +0100
To: =?UTF-8?Q?Martin_J._D=c3=bcrst?= <duerst@it.aoyama.ac.jp>, Carsten Bormann <cabo@tzi.org>, "Matthew A. Miller" <linuxwolf+ietf@outer-planes.net>
References: <e69d7c21-85cb-45f4-c0c2-34c624e63049@outer-planes.net> <14252631-AD76-4537-89BF-6368F4A8CDF4@att.com> <7e6af21f-16ea-a3bc-9c01-595ae8acebba@gmx.de> <05100401-88D4-4158-A3FF-3EF144D85449@att.com> <CAD2gp_T0bfpnsCA_t4BAMtEhr7p8JkZggjnY4F+m9-M2hWLfmw@mail.gmail.com> <1e94516c-9c82-8b0e-0d2d-7dbaa83b21bd@outer-planes.net> <40e3207f-e047-c898-1f0c-4422de1d597a@it.aoyama.ac.jp> <1b3ec14a-927a-8d46-e3d3-9807a9588437@outer-planes.net> <CAHBU6ivsq8+Z=MMkUH+=Q0uwc5NCtaJLYw5cp0Qg8eX2hQQ6sA@mail.gmail.com> <b74cb31b-8e04-17d0-548a-fc164ce07c05@outer-planes.net> <20170417175627.GK23461@localhost> <10B651F1-7FE0-484D-BD2E-FD146BC5FB04@tzi.org> <eabbccb0-8d15-d595-7cd0-37acc0621c57@it.aoyama.ac.jp> <6eb23f90-6623-7888-bc1c-6640a9dababc@codalogic.com>
Cc: "json@ietf.org" <json@ietf.org>
From: Pete Cordell <petejson@codalogic.com>
Message-ID: <61bfad2b-850d-a11f-e80b-d5ed9ccb4dc9@codalogic.com>
Date: Thu, 27 Apr 2017 10:15:11 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <6eb23f90-6623-7888-bc1c-6640a9dababc@codalogic.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/json/jHxZPsdOAKu9EdIPNTA1CkfhETk>
Subject: Re: [Json] Call for Consensus: Proposed Text for "8.1 Character Encoding"
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Apr 2017 09:15:24 -0000

I seem to have killed this thread off again.  Sorry about that.

Any conclusions?

Cheers,

Pete.

On 18/04/2017 14:01, Pete Cordell wrote:
> On 18/04/2017 06:22, Martin J. Dürst wrote:
>> On 2017/04/18 05:47, Carsten Bormann wrote:
>>> On Apr 17, 2017, at 19:56, Nico Williams <nico@cryptonector.com> wrote:
>>>>
>>>>> Thinking about this more, putting an encoding detection algorithm
>>>>> as an
>>>>> appendix seems like a reasonable compromise to me.  To start, how
>>>>> about
>>>>> removing the detection text from Section 8.1 and have an appendix that
>>>>> starts with that text plus the table?
>>>>
>>>> Or we could even just assert that such an algorithm is possible, and
>>>> that implementors MAY implement one.
>>>
>>> Indeed.
>>>
>>> Broken record mode:
>>>
>>> — writing up the algorithm sounds like encouraging implementation.
>>>   We *don’t* want people to implement this!
>>>   (The whole interminable non-UTF-8 saga probably just was a nod from
>>> the RFC 4627 authors to the remnants of UTF-16 land, which mostly have
>>> died off since.  Why resurrect?)
>>>
>>> - there have been about 15 attempts to define this algorithm on the
>>> mailing list.
>>>   All were wrong.
>>>   An Internet Standard should contain tried and true material, not
>>> errata fodder.
>>>
>>> - an implementer is in a much better position to get this right than
>>> the standard, because they can write unit tests.
>>
>> I completely agree with Carsten. As far as I know, and as far as we have
>> been told on this list, if some JSON isn't in UTF-8, then it simply will
>> not interoperate.
>
> +1
>
> If we do do this, I think we could add some example test messages to
> helps with the development, e.g.:
>
>     {"Example":1}
>     {}
>     "Example"
>     ""
>     "U+0100" (where U+0100 is the UTF form of the character, not ASCII)
>     1
>
>> In my view, the only reason to still have a MAY for UTF-16/32 is that
>> this will avoid questions like: "I have a JSON parser in language FOO,
>> it can take a string or an input stream as an argument. In FOO, strings
>> are UTF-16, but the JSON RFC doesn't seem to allow this. What should I
>> do."
>
> IMO the answer to that is, "that's why it says 'JSON text _SHOULD_ be
> encoded in UTF-8'".
>
> I agree with John Cowen, that use of UTF-16/32 is purely for internal
> scenarios; not on the Internet.  As such, I believe the IETF is going
> beyond its remit to say you can use UTF-16/32 for your internal purposes
> that I know nothing about and care nothing about, but you're not allowed
> to use other encodings that maybe more natural for your system.
>
> Pete Cordell
> Codalogic Ltd
> C++ tools for C++ programmers, http://codalogic.com
> Read & write XML in C++, http://www.xml2cpp.com
>
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json