[Json] BOMs (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

"Pete Cordell" <petejson@codalogic.com> Mon, 18 November 2013 10:05 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5E7011E80F6 for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 02:05:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.236
X-Spam-Level: ***
X-Spam-Status: No, score=3.236 tagged_above=-999 required=5 tests=[AWL=-0.150, BAYES_50=0.001, FH_HOST_EQ_D_D_D_D=0.765, HELO_MISMATCH_COM=0.553, MIME_8BIT_HEADER=0.3, RDNS_DYNAMIC=0.1, SARE_HEAD_XUNSENT=1.666, STOX_REPLY_TYPE=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kjp6Uz6McNwQ for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 02:05:53 -0800 (PST)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) by ietfa.amsl.com (Postfix) with ESMTP id 9462411E8132 for <json@ietf.org>; Mon, 18 Nov 2013 02:05:48 -0800 (PST)
Received: (qmail 22840 invoked from network); 18 Nov 2013 10:05:30 +0000
Received: from host81-129-187-193.range81-129.btcentralplus.com (HELO codalogic) (81.129.187.193) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (RC4-MD5 encrypted, authenticated); 18 Nov 2013 10:05:30 +0000
Message-ID: <C37B2FE59C164DBCA982AC81A56A09AA@codalogic>
From: Pete Cordell <petejson@codalogic.com>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, "Henry S. Thompson" <ht@inf.ed.ac.uk>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org><f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp>
X-Unsent: 1
Date: Mon, 18 Nov 2013 10:05:07 -0000
MIME-Version: 1.0
x-vipre-scanned: 002D168A005BB6002D17D7
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: John Cowan <cowan@mercury.ccil.org>, IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
Subject: [Json] BOMs (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 10:05:54 -0000

Given the history below, would it be sensible to accept BOMs for UTF-8
encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs needed
and/or used in the wild for UTF-16 and UTF-32?

Maybe the text can say something like "SHOULD accept BOMs for UTF-8, and MAY 
accept BOMs for UTF-16 and / or UTF-32"?

Thanks,

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com
----- Original Message ----- 
From: ""Martin J. Dürst"" <duerst@it.aoyama.ac.jp>
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Cc: "John Cowan" <cowan@mercury.ccil.org>; "IETF Discussion"
<ietf@ietf.org>; "Paul Hoffman" <paul.hoffman@vpnc.org>; "JSON WG"
<json@ietf.org>; "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>; "Anne van
Kesteren" <annevk@annevk.nl>; <www-tag@w3.org>; "es-discuss"
<es-discuss@mozilla.org>
Sent: Thursday, November 14, 2013 11:14 AM
Subject: Re: [Json] JSON: remove gap between Ecma-404 and IETF draft


> Hello Henry, others,
>
> On 2013/11/14 18:44, Henry S. Thompson wrote:
>> John Cowan writes:
>>
>>> Joe Hildebrand (jhildebr) scripsit:
>>>
>>>> If 404 doesn't allow [a BOM], I don't see a strong need to add it.
>>>> Parsers can always be more forgiving of what they will parse than what
>>>> the spec says, particularly since section 9 says "A JSON parser MAY
>>>> accept non-JSON forms or extensions".
>>>
>>> It's not clear that 404 disallows it, since 404 is defined in terms of
>>> characters, and a BOM is not a character but an out-of-band signal.
>>
>> I think this is a crucial observation.
>
> Yes, and I think it's based on the experience with XML. But while this
> experience may be applicable to JSON, Anne's original comment about the
> BOM and XMLHttpRequest suggests that 404 actually currently does not
> tolerate a BOM, and that implementations (except for XMLHttpRequest) also
> don't.
>
> To give some historic background, the BOM for UTF-8 wasn't in the first
> edition of XML (http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing).
> It only later came in because Microsoft used it for notepad to be able to
> quickly distinguish between UTF-8 and the legacy system encoding. Because
> many people were writing some XML by hand, and some of them were using
> notepad, the pressure on XML to accept a BOM at the start of an UTF-8 file
> mounted, and it was included in the second edition of the XML
> Recommendation (http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing).
>
> Compared to XML, JSON may be much less edited by hand, or much less edited
> on notepad, or otherwise just have a different history from XML, but we
> have to make sure.
>
> Regards,   Martin.
>
>
>> I note that XML approaches
>> this problem in what might be a useful way.  The XML ABNF makes no
>> mention of BOM, it's not part of any XML document as such.  But it
>> _is_ allowed.  The relevant wording [1] is:
>>
>>    Entities ... may begin with the Byte Order Mark described by Annex H
>>    of [ISO/IEC 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH
>>    NO-BREAK SPACE character, #xFEFF). _This is an encoding signature,_
>>    _not part of either the markup or the character data of the XML_
>>    _document._ XML processors must be able to use this character to
>>    differentiate between UTF-8 and UTF-16 encoded documents. [emphasis
>>    added]
>>
>> ht
>>
>> [1] http://www.w3.org/TR/REC-xml/#charencoding
> _______________________________________________
> json mailing list
> json@ietf.org
> https://www.ietf.org/mailman/listinfo/json