Re: [Json] BOMs

Bjoern Hoehrmann <derhoermi@gmx.net> Mon, 18 November 2013 12:35 UTC

Return-Path: <derhoermi@gmx.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 487B611E810B for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 04:35:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.942
X-Spam-Level:
X-Spam-Status: No, score=-4.942 tagged_above=-999 required=5 tests=[AWL=-3.243, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2ry3TdDwKAob for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 04:35:21 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) by ietfa.amsl.com (Postfix) with ESMTP id 5110E11E8450 for <json@ietf.org>; Mon, 18 Nov 2013 04:35:05 -0800 (PST)
Received: from netb.Speedport_W_700V ([91.35.16.135]) by mail.gmx.com (mrgmx103) with ESMTPA (Nemesis) id 0Lrw2c-1VYAyf01SY-013bc2 for <json@ietf.org>; Mon, 18 Nov 2013 13:35:04 +0100
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Date: Mon, 18 Nov 2013 13:35:00 +0100
Message-ID: <2tuj89hcus182t4f4rqqgi1dpabt11qak7@hive.bjoern.hoehrmann.de>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp>
In-Reply-To: <5289F974.9020709@it.aoyama.ac.jp>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:lPq8YKFZ3C6gZLdtAO5NGqXR/NWeZsxam1Z43WTEbOPX0bG7zJl MUx5/qdNXHpg0taOkq0Zo+Eb3ScCdXkTqCXNNmRm8KVQF/1jQepDaliIkvs79832EYj3uiA OulYqZB998vGKGctX4SnOT/ElpVL/SjEhGwII8d4BcVB5+Atm9gKIdB7HgPuQyhEufYXePa LxZJ2HLSMwOidfwj4JMJg==
Cc: Anne van Kesteren <annevk@annevk.nl>, es-discuss <es-discuss@mozilla.org>, IETF Discussion <ietf@ietf.org>, www-tag@w3.org, JSON WG <json@ietf.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 12:35:27 -0000

* Martin J. Dürst wrote:
>As for what to say about whether to accept BOMs or not, I'd really want 
>to know what the various existing parsers do. If they accept BOMs, then 
>we can say they should accept BOMs. If they don't accept BOMs, then we 
>should say that they don't.

Unicode signatures are not useful for application/json resources and are
likely to break exisiting and future code, it is not at all uncommon to
construct JSON text by concatenating, say, string literals with some web
service response without passing the data through a JSON parser. And as
RFC 4627 makes no mention of them, there is little reason to think that
implementations tolerate them.

Perl's JSON module gives me

  malformed JSON string, neither array, object, number, string
  or atom, at character offset 0 (before "\x{ef}\x{bb}\x{bf}[]")

Python's json module gives me

  ValueError: No JSON object could be decoded

Go's "encoding/json" module gives me

  invalid character 'ï' looking for beginning of value

http://shadowregistry.org/js/misc/#t2ea25a961255bb1202da9497a1942e09 is
another example of what kinds of bugs await us if we were to specify the
use of Unicode signatures for JSON, essentially

  new DOMParser().parseFromString("\uBBEF\u3CBF\u7979\u3E2F","text/xml")

Now U+BBEF U+3CBF U+7979 U+3E2F is not an XML document but Firefox and
Internet Explorer treat it as if it were equivalent to "<yy/>".
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/