Re: [Json] Scope: Wire format or runtime format?

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Mon, 17 June 2013 04:18 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F3BEC21F9EFF for <json@ietfa.amsl.com>; Sun, 16 Jun 2013 21:18:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.426
X-Spam-Level:
X-Spam-Status: No, score=-101.426 tagged_above=-999 required=5 tests=[AWL=-1.636, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ff+s09YvLvxO for <json@ietfa.amsl.com>; Sun, 16 Jun 2013 21:18:00 -0700 (PDT)
Received: from scintmta01.scbb.aoyama.ac.jp (scintmta01.scbb.aoyama.ac.jp [133.2.253.33]) by ietfa.amsl.com (Postfix) with ESMTP id DAE4721F9EFB for <json@ietf.org>; Sun, 16 Jun 2013 21:17:59 -0700 (PDT)
Received: from scmse02.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta01.scbb.aoyama.ac.jp (secret/secret) with SMTP id r5H4HuKp017004; Mon, 17 Jun 2013 13:17:56 +0900
Received: from (unknown [133.2.206.134]) by scmse02.scbb.aoyama.ac.jp with smtp id 1750_dd99_e3300680_d704_11e2_82cc_001e6722eec2; Mon, 17 Jun 2013 13:17:56 +0900
Received: from [IPv6:::1] (unknown [133.2.210.1]) by itmail2.it.aoyama.ac.jp (Postfix) with ESMTP id 3471FC0234; Mon, 17 Jun 2013 13:16:38 +0900 (JST)
Message-ID: <51BE8DEA.7030307@it.aoyama.ac.jp>
Date: Mon, 17 Jun 2013 13:17:46 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: Norbert Lindenberg <ietf@lindenbergsoftware.com>
References: <6FC6B441-B74D-4B9F-B883-065C05890880@lindenbergsoftware.com>
In-Reply-To: <6FC6B441-B74D-4B9F-B883-065C05890880@lindenbergsoftware.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: json@ietf.org
Subject: Re: [Json] Scope: Wire format or runtime format?
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jun 2013 04:18:06 -0000

Hello Norbert,

On 2013/06/14 7:47, Norbert Lindenberg wrote:
> In looking over older messages on this list, I found a message that made clear to me why we're having this endless discussion about Unicode surrogates -
 >it's because we're not clear whether we're designing a wire format or 
a format that also for use at runtime:
> http://www.ietf.org/mail-archive/web/json/current/msg00355.html

This is kind of the first time I have heard the term "format for use at 
runtime" in this context. Of course there are formats used at runtime 
(internal representations of number and strings, for example), but I 
think the term "runtime format" only confuses our discussion. In my 
understanding, JavaScript does not have or use JSON as a runtime format. 
If that has changed, then please tell us.


> Some people are coming from the runtime point of view, especially ECMAScript, where it's accepted practice to use ill-formed UTF-16 or even non-text in strings.
 >At least the ill-formed UTF-16 is legitimized by section 2.7 of the 
Unicode standard.

"Accepted practice" is probably going a bit too far, and giving the 
wrong impression. My understanding is that the Unicode standard accepts 
this for efficiency reasons, not because it's in anyway inherently 
useful. For ECMAScript, we have to add history to efficiency, but still 
I hope it's considered bad practice to actually use ill-formed UTF-16.

> Other people are coming from the wire protocol point of view, where clean formats are expected,
 > in particular well-formed Unicode code unit sequences according to 
section 3.9 of the Unicode standard.
>
> So which one shall it be?

 > If we adopt the wire protocol point of view

To me, it's very clear that we are describing (we are not designing; 
that has happened a long time ago, and very implicitly) a wire format, 
or something close to a wire format (somebody mentioned JSON embedded in 
an (e.g.) EUC-JP HTML page (*)).


> and require well-formed code unit sequences,

For practical reasons, I think we shouldn't go there. But we should make 
it very clear (in the base spec, not relegated to some "best practices" 
document) that using lone surrogates is a bad idea, and that senders and 
receivers MAY reject such data (e.g. because of security concerns).


> then ECMAScript will have to define its own extension of JSON

I very much hope we can avoid that. I very much hope that ECMAScript can 
tolerate that lone surrogates are often if not always a bad idea, even 
if they may sometimes happen for historical and efficiency reasons.


> (which it has already by allowing JSON values at the top level).

At least on the IETF side, as far as I know, this is still under discussion.

> If we adopt the runtime point of view and allow all code points as in RFC 4627, then there probably should be separate verbiage defining a restricted version for use over the wire.

I'm still at loss about what you mean by "runtime point of view". It 
seems very clear to me that neither the IETF version nor the ECMAScript 
version (in case they end up to differ) will in any way describe 
ECMAScript strings (or other datatypes) at runtime.

Regards,   Martin.

(*) It's not impossible for me to imagine something like JSON embedded 
in an (e.g.) EUC-JP HTML page, but I'm seriously wondering how 
widespread this is in practice. A simple case would be that a script 
says "take the content of this element and interpret it as JSON". But 
how widespread is that? Of course, we have a lot of stuff inside 
ECMAScript, but as soon as I'm in ECMAScript, something like
measurement = { 'unit': 'mm', 'amount': 15 }
is no longer JSON, it's just ECMAScript literal notation.