Re: [Json] BOMs

Bjoern Hoehrmann <derhoermi@gmx.net> Mon, 18 November 2013 13:48 UTC

Return-Path: <derhoermi@gmx.net>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E04B611E8121 for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 05:48:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.986
X-Spam-Level:
X-Spam-Status: No, score=-4.986 tagged_above=-999 required=5 tests=[AWL=-2.387, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id d4Tbax4OPzon for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 05:48:28 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) by ietfa.amsl.com (Postfix) with ESMTP id 03E8311E81A6 for <json@ietf.org>; Mon, 18 Nov 2013 05:48:24 -0800 (PST)
Received: from netb.Speedport_W_700V ([91.35.16.135]) by mail.gmx.com (mrgmx103) with ESMTPA (Nemesis) id 0M3zG2-1VQSb12Z18-00rY2p for <json@ietf.org>; Mon, 18 Nov 2013 14:48:22 +0100
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: ht@inf.ed.ac.uk
Date: Mon, 18 Nov 2013 14:48:19 +0100
Message-ID: <626k89plqltbqd5uqgo15krutbn38qa909@hive.bjoern.hoehrmann.de>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp> <2tuj89hcus182t4f4rqqgi1dpabt11qak7@hive.bjoern.hoehrmann.de> <f5b61rpvpax.fsf@troutbeck.inf.ed.ac.uk>
In-Reply-To: <f5b61rpvpax.fsf@troutbeck.inf.ed.ac.uk>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V03:K0:l9wtXVHB4EitfiRGRQIn0z1WVttROrIvuzM9zDvsUc5f0+BW/bU yQ3aReQbrhGxhrjVaiggxTgAYfutLLwOWnHKMxO5Y2yA2bGIaLwK0HvlwwSaaFH3nuu8SGM dNi2DyOLfJpshHWrkD+PYAh6TCMwZaFqlRZAJYbzBAuptZyqOFf5GHw8HE/f0iluitnG+iT 5ar8VCkasmEAezft4iXSA==
Cc: IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 13:48:51 -0000

* Henry S. Thompson wrote:
>I'm curious to know what level you're invoking the parser at.  As
>implied by my previous post about the Python 'requests' package, it
>handles application/json resources by stripping any initial BOM it
>finds -- you can try this with
>
>>>> import requests
>>>> r=requests.get("http://www.ltg.ed.ac.uk/ov-test/b16le.json")
>>>> r.json()

The Perl code was

  perl -MJSON -MEncode -e
    "my $s = encode_utf8(chr 0xFEFF) . '[]'; JSON->new->decode($s)"

The Python code was

  import json
  json.loads(u"\uFEFF[]".encode('utf-8'))

The Go code was

  package main
  
  import "encoding/json"
  import "fmt"
  
  func main() {
    r := "\uFEFF[]"
  
    var f interface{}
    err := json.Unmarshal([]byte(r), &f)
    
    fmt.Println(err)
  }

In other words, always passing a UTF-8 encoded byte string to the byte
string parsing part of the JSON implementation. RFC 4627 is the only
specification for the application/json on-the-wire format and it does
not mention anything about Unicode signatures. Looking for certain byte
sequences at the beginning and treating them as a Unicode signature is
the same as looking for `/* ... */` and treating it as a comment.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/