Re: [Json] BOMs

Bjoern Hoehrmann <derhoermi@gmx.net> Thu, 21 November 2013 17:41 UTC

Return-Path: <derhoermi@gmx.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AC88F1AE19E for <json@ietfa.amsl.com>; Thu, 21 Nov 2013 09:41:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.225
X-Spam-Level:
X-Spam-Status: No, score=-1.225 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FROM=0.001, J_CHICKENPOX_14=0.6, J_CHICKENPOX_45=0.6, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.525, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HtcmTl3znbps for <json@ietfa.amsl.com>; Thu, 21 Nov 2013 09:41:22 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) by ietfa.amsl.com (Postfix) with ESMTP id E6E791AE061 for <json@ietf.org>; Thu, 21 Nov 2013 09:41:21 -0800 (PST)
Received: from netb.Speedport_W_700V ([91.35.32.80]) by mail.gmx.com (mrgmx102) with ESMTPA (Nemesis) id 0MP1PX-1VmaeJ22sB-006P9x for <json@ietf.org>; Thu, 21 Nov 2013 18:41:10 +0100
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: Allen Wirfs-Brock <allen@wirfs-brock.com>
Date: Thu, 21 Nov 2013 18:41:08 +0100
Message-ID: <qkfs89lqbec1g7qog6no9ukd23jpslparp@hive.bjoern.hoehrmann.de>
References: <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp> <020401cee50f$a2cdf5c0$4001a8c0@gateway.2wire.net> <528B46EA.4040503@it.aoyama.ac.jp> <43255615-2FC9-4726-99FD-1B13D6B1F033@wirfs-brock.com> <f5br4ackyqm.fsf@troutbeck.inf.ed.ac.uk> <528C5445.3050600@it.aoyama.ac.jp> <A20405C4-F7AA-4141-AE19-222708A096F7@wirfs-brock.com> <CANXqsR+KwYJyZgCLB+b7P6O3=EgY3io-XwvuBLsfWOQ8zbp8Ww@mail.gmail.com> <50CFBDEE-53A5-4159-93C4-348CF31EC8EF@wirfs-brock.com>
In-Reply-To: <50CFBDEE-53A5-4159-93C4-348CF31EC8EF@wirfs-brock.com>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:/p/FaV+USUSg7/+VFFX7xw+HaehHgtAz+dHPxB41DWwvuQ74Wod R0ES8Iiqrq2dlx5JxIOPyF1zMR/1I2haMfwikbS9lffoxLdr/Yx5RwmDGwvDVQkHgec8bYq FNOpMgulbKyGgp+eb5ltar1DkDfgjTFLSocnGdR3tVehyXtQ8L85oT92dggY5ijs+SaJQ1J df9sGFjQSnbxesvyGOewQ==
Cc: Henri Sivonen <hsivonen@hsivonen.fi>, es-discuss <es-discuss@mozilla.org>, IETF Discussion <ietf@ietf.org>, www-tag <www-tag@w3.org>, JSON WG <json@ietf.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 17:41:23 -0000

* Allen Wirfs-Brock wrote:
>On Nov 21, 2013, at 5:28 AM, Henri Sivonen wrote:
>> On Thu, Nov 21, 2013 at 7:53 AM, Allen Wirfs-Brock
>> <allen@wirfs-brock.com> wrote:
>>> Just to be clear about this.  My tests directly tested JavaScript built-in
>>> JSON parsers WRT to BOM support in three major browsers.  The tests directly
>>> invoked the built-in JSON.parse functions and directly passed to them a
>>> source strings that was explicitly constructed to contain a BOM code point .

>> It would be surprising if JSON.parse() accepted a BOM, since it
>> doesn't take bytes as input.
>
>ECMAScript's JSON.parse accepts an ECMAScript string value as its input.
>ECMAScript strings are sequences of 16-bit values.  JSON.parse (and most
>other ECMAScript functions) interpret those values  as Unicode code 
>units.  The value U+FEFF can appear at any position within a string. 
>When defining a string as an ECMAScript literal, a sequence like \ufeff 
>is an escape sequence that means place the code unit value 0xefff into 
>the string at this position in the sequence. Also note that the actual 
>strings passed below to JSON.parse contain the actual code point value 
>U+FEFF not the escape sequence that was used to express it.  To include 
>the actual escape sequence characters in the string it would have to be 
>expressed as '\\feff'.

A byte order mark indicates the order of bytes in a sequence of bytes.
An ecmascript string is not a sequence of bytes and therefore it cannot
have a byte order mark inside it. Your test is not for BOM support but
for an egregious semantic error in the implementation of JSON.parse.

  http://shadowregistry.org/js/misc/#t2ea25a961255bb1202da9497a1942e09

That is a similar test. It makes Firefox see UTF-8 BOMs in ecmascript
strings. Firefox is not supposed to look for UTF-8 BOMs in ecmascript
strings because ecmascript strings are not sequences of bytes at that
level of reasoning.

Is there any chance, by the way, to change `JSON.stringify` so it does
not output strings that cannot be encoded using UTF-8? Specifically,

  JSON.stringify(JSON.parse("\"\uD800\""))

would need to escape the surrogate instead of emitting it literally.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/