Re: [Json] BOMs

"Pete Cordell" <petejson@codalogic.com> Tue, 19 November 2013 11:26 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1A0D21ADF0E for <json@ietfa.amsl.com>; Tue, 19 Nov 2013 03:26:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.237
X-Spam-Level: **
X-Spam-Status: No, score=2.237 tagged_above=-999 required=5 tests=[BAYES_05=-0.5, FH_HOST_EQ_D_D_D_D=0.765, HELO_MISMATCH_COM=0.553, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, STOX_REPLY_TYPE=0.439] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tmXBeH-DnoI2 for <json@ietfa.amsl.com>; Tue, 19 Nov 2013 03:26:12 -0800 (PST)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) by ietfa.amsl.com (Postfix) with ESMTP id 6CF371ADEDC for <json@ietf.org>; Tue, 19 Nov 2013 03:26:11 -0800 (PST)
Received: (qmail 7542 invoked from network); 19 Nov 2013 11:25:47 +0000
Received: from host81-129-187-193.range81-129.btcentralplus.com (HELO codalogic) (81.129.187.193) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (RC4-MD5 encrypted, authenticated); 19 Nov 2013 11:25:47 +0000
Message-ID: <07589B295EAC4DC59FD1FCE17F713C4E@codalogic>
From: Pete Cordell <petejson@codalogic.com>
To: Phillip Hallam-Baker <hallam@gmail.com>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org><CEA92854.2CC53%jhildebr@cisco.com><20131113224737.GI31823@mercury.ccil.org><f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk><5284B095.4070004@it.aoyama.ac.jp><C37B2FE59C164DBCA982AC81A56A09AA@codalogic><f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk><5289F974.9020709@it.aoyama.ac.jp><F8C2334E1B3B4A63875ECFCD151726CC@codalogic> <CAMm+LwiHVc0mDrUr8yCMKt9wChV1tvybTtxSQej7eDSVq3SOnA@mail.gmail.com>
X-Unsent: 1
Date: Tue, 19 Nov 2013 11:26:26 -0000
x-vipre-scanned: 00C9DABE005BDA00C9DC0B
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: JSON WG <json@ietf.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 11:26:14 -0000

Hi Philip,

Could you explain further how these spurious BOMs are getting added in by MS 
in .Net?  If you could come up with a small piece of C# code to demonstrate 
that would be great to help my understanding.

Thanks,

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com
----- Original Message ----- 
From: "Phillip Hallam-Baker" <hallam@gmail.com>
To: "Pete Cordell" <petejson@codalogic.com>
Cc: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>; "Henry S. Thompson" 
<ht@inf.ed.ac.uk>; "John Cowan" <cowan@mercury.ccil.org>; "IETF Discussion" 
<ietf@ietf.org>; "JSON WG" <json@ietf.org>; "Anne van Kesteren" 
<annevk@annevk.nl>; <www-tag@w3.org>; "es-discuss" <es-discuss@mozilla.org>
Sent: Monday, November 18, 2013 8:56 PM
Subject: Re: BOMs


On Mon, Nov 18, 2013 at 8:36 AM, Pete Cordell <petejson@codalogic.com>wrote:

> ----- Original Message ----- From: ""Martin J. Dürst"" <
> duerst@it.aoyama.ac.jp>
>
>  On 2013/11/18 20:11, Henry S. Thompson wrote:
>>
>>> Pete Cordell writes:
>>>
>>>  Given the history below, would it be sensible to accept BOMs for UTF-8
>>>> encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs
>>>> needed
>>>> and/or used in the wild for UTF-16 and UTF-32?
>>>>
>>>> Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
>>>> and MAY accept BOMs for UTF-16 and / or UTF-32"?
>>>>
>>>
>>> My sense is that you'll see more UTF-16 BOMs than anything else.
>>>
>>
>> Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire
>> UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are
>> discussing.)
>>
>
> The in-memory case is not entirely irrelevant because a number of JSON
> messages will be constructed in memory and then squirted to line.
>
> I did a little experiment with Visual Studio.  It will allow me to save in
> UTF-8 with or without a BOM (like thing).  Saving in UTF-16 (Or was it
> UCS2?) is always with a BOM.  There didn't seem to be a UTF-32 option.
>
> JSON doesn't need BOMs.  However, there are cases where people might hand
> edit messages, and if they choose to save in UTF-16 they will likely have 
> a
> BOM.
>
> Is it acceptable to tell people not to save hand editted files in UTF-16,
> suggesting UTF-8 (possibly with an encoded BOM) as an alternative?
>
> I would imagine that if someone did have a hand editted UTF-8 file on
> Windows then the allowance of a BOM would help their sanity immeasurably,
> but it's not something I have firsthand knowledge of.
>


I believe the opposite is true.

The failure of Windows to correctly process documents without BOM markers
is a constant pain trying to use .NET to parse XML.

The ability to compose a JSON message by wrapping another JSON message is
essential. That is, it has to be possible to write something like

printf ("{\"Object\", %s}", Text);


I use the .NET platform heavily. Please do not let Microsoft off the hook
here. The cost of doing so is having to write code to kick out spurious BOM
sequences occurring at any random point in the text. Which becomes really
painful when having to deal with strings where there might actually be a
reason to put the BOM in.

The benefit of not doing so is that it might encourage Microsoft to fix
their tools so that they don't insert spurious BOM sequences in documents
where doing so breaks them.


-- 
Website: http://hallambaker.com/