Re: [Json] Allow any JSON value at the top level

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Wed, 12 June 2013 01:25 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7CE3C21F9AD1 for <json@ietfa.amsl.com>; Tue, 11 Jun 2013 18:25:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.568
X-Spam-Level:
X-Spam-Status: No, score=-103.568 tagged_above=-999 required=5 tests=[AWL=0.222, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JIU6Tp5SuooN for <json@ietfa.amsl.com>; Tue, 11 Jun 2013 18:25:51 -0700 (PDT)
Received: from scintmta02.scbb.aoyama.ac.jp (scintmta02.scbb.aoyama.ac.jp [133.2.253.34]) by ietfa.amsl.com (Postfix) with ESMTP id 82B7021F9AE1 for <json@ietf.org>; Tue, 11 Jun 2013 18:25:50 -0700 (PDT)
Received: from scmse02.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta02.scbb.aoyama.ac.jp (secret/secret) with SMTP id r5C1PlEG006865; Wed, 12 Jun 2013 10:25:47 +0900
Received: from (unknown [133.2.206.134]) by scmse02.scbb.aoyama.ac.jp with smtp id 30bc_9c5f_0267f670_d2ff_11e2_80c1_001e6722eec2; Wed, 12 Jun 2013 10:25:46 +0900
Received: from [IPv6:::1] (unknown [133.2.210.1]) by itmail2.it.aoyama.ac.jp (Postfix) with ESMTP id 79FDAC00D8; Wed, 12 Jun 2013 10:24:34 +0900 (JST)
Message-ID: <51B7CE0F.90104@it.aoyama.ac.jp>
Date: Wed, 12 Jun 2013 10:25:35 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: Tatu Saloranta <tsaloranta@gmail.com>
References: <255B9BB34FB7D647A506DC292726F6E1151B21F9A9@WSMSG3153V.srv.dir.telstra.com> <A2D3D8F3-1EB3-4CD6-A331-4EDCDB7F9798@tzi.org> <CAGrxA27z-tqgKWcyKNc7ojoUi3Z==hReETrddfYMVxTfVEAhhQ@mail.gmail.com> <DA9A52D2-6956-4E6C-AE96-7F1C05AE3E57@tzi.org> <CAGrxA278XnWEAnJ3WT2YHdYixcvHDPzx7365K6WCWh6ZtLiECA@mail.gmail.com>
In-Reply-To: <CAGrxA278XnWEAnJ3WT2YHdYixcvHDPzx7365K6WCWh6ZtLiECA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: Carsten Bormann <cabo@tzi.org>, "Manger, James H" <James.H.Manger@team.telstra.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Allow any JSON value at the top level
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Jun 2013 01:25:58 -0000

On 2013/06/12 3:41, Tatu Saloranta wrote:
> On Tue, Jun 11, 2013 at 2:25 AM, Carsten Bormann<cabo@tzi.org>  wrote:

>> This is irrelevant in practice as JSON is used with UTF-8 in practice.
>>
>
> My main concern is with UTF-16. My understanding is that for "Big 5"
> languages its use make sense, from efficiency perspective. I do not have
> data on this; in XML space document test sets had non-trivial amount of
> content in various encodings.
>
> If UTF-16 is not widely used then I can see why this would be considered of
> little significance.

There are a lot of scripts (and therefore languages) where a character 
takes 3 bytes in UTF-8 but only 2 bytes in UTF-16. In particular, this 
includes all the languages of East/South East/South Asia, a huge area 
with a huge population. It's the reason why UTF-16 was made mandatory 
for XML.

However, as predicted by some, and widely confirmed in practice, most 
actual data (including XML and of course JSON) contains a significant 
amount of characters from the ASCII repertoire (syntax such as []{}"", 
and space for JSON, plus many if not most names and many values). These 
characters take only one byte in UTF-8, but two bytes in UTF-16. As a 
result, content is very often shorter in UTF-8, and when it happens to 
be longer, it's usually not much longer.

Overall, the advantage of occasionally shorter data is clearly 
outweighted by the simplicity of a single encoding (even if not on the 
receiver side, then on the generating side).

As a result, virtually the whole Web ecosystem is moving towards using 
UTF-8 only for public interchange, at a surprising speed. (Of course, 
many other encodings will still remain for years, but in lower and lower 
numbers).

Regards,   Martin.