Re: [Json] BOMs

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Tue, 19 November 2013 04:33 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B94131AE67E; Mon, 18 Nov 2013 20:33:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.916
X-Spam-Level:
X-Spam-Status: No, score=-1.916 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.525] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TtFTkf8h0GvZ; Mon, 18 Nov 2013 20:33:13 -0800 (PST)
Received: from scintmta02.scbb.aoyama.ac.jp (scintmta02.scbb.aoyama.ac.jp [133.2.253.34]) by ietfa.amsl.com (Postfix) with ESMTP id 3C88A1AE60F; Mon, 18 Nov 2013 20:33:12 -0800 (PST)
Received: from scmse02.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta02.scbb.aoyama.ac.jp (secret/secret) with SMTP id rAJ4WpjF026269; Tue, 19 Nov 2013 13:32:51 +0900
Received: from (unknown [133.2.206.134]) by scmse02.scbb.aoyama.ac.jp with smtp id 6167_bed3_a65addae_50d3_11e3_9b42_001e6722eec2; Tue, 19 Nov 2013 13:32:50 +0900
Received: from [IPv6:::1] (unknown [133.2.210.1]) by itmail2.it.aoyama.ac.jp (Postfix) with ESMTP id 70827BFBBE; Tue, 19 Nov 2013 13:32:50 +0900 (JST)
Message-ID: <528AE9E5.3000704@it.aoyama.ac.jp>
Date: Tue, 19 Nov 2013 13:32:37 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp> <2tuj89hcus182t4f4rqqgi1dpabt11qak7@hive.bjoern.hoehrmann.de> <f5b61rpvpax.fsf@troutbeck.inf.ed.ac.uk>
In-Reply-To: <f5b61rpvpax.fsf@troutbeck.inf.ed.ac.uk>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Cc: IETF Discussion <ietf@ietf.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 04:33:15 -0000

Okay, here are some more tests.

http://www.sw.it.aoyama.ac.jp/2013/pub/json_tests/test1_utf8_nobom.json
http://www.sw.it.aoyama.ac.jp/2013/pub/json_tests/test2_utf8_bom.json

They are self-describing JSON files served with application/json, the 
first without a BOM, and the second with a BOM.

They contain some Japanese, and a tiny bit of Spanish.

[see more below]

On 2013/11/18 21:59, Henry S. Thompson wrote:
> Bjoern Hoehrmann writes:
>
>> Perl's JSON module gives me
>>
>>    malformed JSON string, neither array, object, number, string
>>    or atom, at character offset 0 (before "\x{ef}\x{bb}\x{bf}[]")
>>
>> Python's json module gives me
>>
>>    ValueError: No JSON object could be decoded
>>
>> Go's "encoding/json" module gives me
>>
>>    invalid character 'ï' looking for beginning of value
>
> I'm curious to know what level you're invoking the parser at.  As
> implied by my previous post about the Python 'requests' package, it
> handles application/json resources by stripping any initial BOM it
> finds -- you can try this with
>
>>>> import requests
>>>> r=requests.get("http://www.ltg.ed.ac.uk/ov-test/b16le.json")
>>>> r.json()

I get a 404 on this example. I can put up UTF-16 examples, too.

Regards,   Martin.

> Signatures are not part of the text of a document, as the UNICODE spec
> makes clear, so asking what happens when you pass a string beginning
> with a BOM to a parser is not really the right question in this
> context, is it?
>
> As I tried to say in an earlier post, there's a distinction which
> needs to be carefully insisted on between, on the one hand, languages
> and their parsers, where I agree signatures/BOMs have no place, and,
> on the other hand, (media-typed) resources/entities/payloads and _their_
> processing, where a discussion of BOMs/signatures _is_ appropriate
> and, often, necessary.
>
> BTW I agree that the status of the UTF-8 BOM as signature is slightly
> hazy, but again the UNICODE spec itself [1] says
>
>    "this sequence can serve as signature for UTF-8 encoded text where
>     the character set is unmarked"
>
> ht
>
> [1] http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf