Re: [Json] BOMs

"Pete Cordell" <petejson@codalogic.com> Mon, 18 November 2013 13:36 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C8E2E11E80FC for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 05:36:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.285
X-Spam-Level: ***
X-Spam-Status: No, score=3.285 tagged_above=-999 required=5 tests=[AWL=-0.099, BAYES_50=0.001, FH_HOST_EQ_D_D_D_D=0.765, HELO_MISMATCH_COM=0.553, MIME_8BIT_HEADER=0.3, RDNS_DYNAMIC=0.1, SARE_HEAD_XUNSENT=1.666]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w3Jcg6CehV2x for <json@ietfa.amsl.com>; Mon, 18 Nov 2013 05:36:19 -0800 (PST)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) by ietfa.amsl.com (Postfix) with ESMTP id A1D3511E82CD for <json@ietf.org>; Mon, 18 Nov 2013 05:36:12 -0800 (PST)
Received: (qmail 26614 invoked from network); 18 Nov 2013 13:35:41 +0000
Received: from host81-129-187-193.range81-129.btcentralplus.com (HELO codalogic) (81.129.187.193) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (RC4-MD5 encrypted, authenticated); 18 Nov 2013 13:35:41 +0000
Message-ID: <F8C2334E1B3B4A63875ECFCD151726CC@codalogic>
From: Pete Cordell <petejson@codalogic.com>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, "Henry S. Thompson" <ht@inf.ed.ac.uk>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp>
X-Unsent: 1
x-vipre-scanned: 00EE5E27005BBA00EE5F74
Date: Mon, 18 Nov 2013 13:36:13 -0000
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="UTF-8"; reply-type="response"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: John Cowan <cowan@mercury.ccil.org>, IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 13:36:23 -0000

----- Original Message ----- 
From: ""Martin J. Dürst"" <duerst@it.aoyama.ac.jp>
> On 2013/11/18 20:11, Henry S. Thompson wrote:
>> Pete Cordell writes:
>>
>>> Given the history below, would it be sensible to accept BOMs for UTF-8
>>> encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs 
>>> needed
>>> and/or used in the wild for UTF-16 and UTF-32?
>>>
>>> Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
>>> and MAY accept BOMs for UTF-16 and / or UTF-32"?
>>
>> My sense is that you'll see more UTF-16 BOMs than anything else.
>
> Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire 
> UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are 
> discussing.)

The in-memory case is not entirely irrelevant because a number of JSON 
messages will be constructed in memory and then squirted to line.

I did a little experiment with Visual Studio.  It will allow me to save in 
UTF-8 with or without a BOM (like thing).  Saving in UTF-16 (Or was it 
UCS2?) is always with a BOM.  There didn't seem to be a UTF-32 option.

JSON doesn't need BOMs.  However, there are cases where people might hand 
edit messages, and if they choose to save in UTF-16 they will likely have a 
BOM.

Is it acceptable to tell people not to save hand editted files in UTF-16, 
suggesting UTF-8 (possibly with an encoded BOM) as an alternative?

I would imagine that if someone did have a hand editted UTF-8 file on 
Windows then the allowance of a BOM would help their sanity immeasurably, 
but it's not something I have firsthand knowledge of.

I believe Unix/Linux works with UTF-8 without BOMs.  Is this the case?

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com