Re: [Json] BOMs

Phillip Hallam-Baker <hallam@gmail.com> Mon, 18 November 2013 20:56 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 652701AE449; Mon, 18 Nov 2013 12:56:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lZVb45-eB0bV; Mon, 18 Nov 2013 12:56:18 -0800 (PST)
Received: from mail-lb0-x236.google.com (mail-lb0-x236.google.com [IPv6:2a00:1450:4010:c04::236]) by ietfa.amsl.com (Postfix) with ESMTP id 4B3E01AE446; Mon, 18 Nov 2013 12:56:18 -0800 (PST)
Received: by mail-lb0-f182.google.com with SMTP id u14so1936353lbd.27 for <multiple recipients>; Mon, 18 Nov 2013 12:56:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=We8IGwKRvXywLkT0upELyAO95ZPXTLtVdjITrkGNSTs=; b=ybazXpho0FG73CyvTx5GTS+3EGGfPVWH2/3pCotEOz6bL3x1FvLLUp4GsYtBffKBRy gAtxwjGT07s7AZH6cKZC7UDw7Z1X5qOdA1/8s3AWBJu6tiVpEJaaPP1/Sb/m97nINecj 2ymtLY9nRiLxZN1C2LaeGVJ6d0cCKlmX3sp7B+cXv41QpuBrUELj06uc5G+Gz/A8YrWc X46k8FhmGwLx34gJZpXIzfC3ErmIuhI9ivHp7APW6kcO2yBQNbBfZ09NGel5ZboMAKES //4jLQyGl4VaLDeeOSy+pb7iEScH0VedvwLPiLrd6q3Zw1yQY2tVKIuNkYBMCDgvpeCd YlRQ==
MIME-Version: 1.0
X-Received: by 10.112.146.200 with SMTP id te8mr3011804lbb.32.1384808171846; Mon, 18 Nov 2013 12:56:11 -0800 (PST)
Received: by 10.112.46.98 with HTTP; Mon, 18 Nov 2013 12:56:11 -0800 (PST)
In-Reply-To: <F8C2334E1B3B4A63875ECFCD151726CC@codalogic>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp> <F8C2334E1B3B4A63875ECFCD151726CC@codalogic>
Date: Mon, 18 Nov 2013 15:56:11 -0500
Message-ID: <CAMm+LwiHVc0mDrUr8yCMKt9wChV1tvybTtxSQej7eDSVq3SOnA@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Pete Cordell <petejson@codalogic.com>
Content-Type: multipart/alternative; boundary="047d7b3a83d42a41ff04eb79c74d"
X-Mailman-Approved-At: Mon, 18 Nov 2013 13:02:16 -0800
Cc: John Cowan <cowan@mercury.ccil.org>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, es-discuss <es-discuss@mozilla.org>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, "www-tag@w3.org" <www-tag@w3.org>, IETF Discussion <ietf@ietf.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 20:56:21 -0000

On Mon, Nov 18, 2013 at 8:36 AM, Pete Cordell <petejson@codalogic.com>wrote:

> ----- Original Message ----- From: ""Martin J. Dürst"" <
> duerst@it.aoyama.ac.jp>
>
>  On 2013/11/18 20:11, Henry S. Thompson wrote:
>>
>>> Pete Cordell writes:
>>>
>>>  Given the history below, would it be sensible to accept BOMs for UTF-8
>>>> encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs
>>>> needed
>>>> and/or used in the wild for UTF-16 and UTF-32?
>>>>
>>>> Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
>>>> and MAY accept BOMs for UTF-16 and / or UTF-32"?
>>>>
>>>
>>> My sense is that you'll see more UTF-16 BOMs than anything else.
>>>
>>
>> Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire
>> UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are
>> discussing.)
>>
>
> The in-memory case is not entirely irrelevant because a number of JSON
> messages will be constructed in memory and then squirted to line.
>
> I did a little experiment with Visual Studio.  It will allow me to save in
> UTF-8 with or without a BOM (like thing).  Saving in UTF-16 (Or was it
> UCS2?) is always with a BOM.  There didn't seem to be a UTF-32 option.
>
> JSON doesn't need BOMs.  However, there are cases where people might hand
> edit messages, and if they choose to save in UTF-16 they will likely have a
> BOM.
>
> Is it acceptable to tell people not to save hand editted files in UTF-16,
> suggesting UTF-8 (possibly with an encoded BOM) as an alternative?
>
> I would imagine that if someone did have a hand editted UTF-8 file on
> Windows then the allowance of a BOM would help their sanity immeasurably,
> but it's not something I have firsthand knowledge of.
>


I believe the opposite is true.

The failure of Windows to correctly process documents without BOM markers
is a constant pain trying to use .NET to parse XML.

The ability to compose a JSON message by wrapping another JSON message is
essential. That is, it has to be possible to write something like

printf ("{\"Object\", %s}", Text);


I use the .NET platform heavily. Please do not let Microsoft off the hook
here. The cost of doing so is having to write code to kick out spurious BOM
sequences occurring at any random point in the text. Which becomes really
painful when having to deal with strings where there might actually be a
reason to put the BOM in.

The benefit of not doing so is that it might encourage Microsoft to fix
their tools so that they don't insert spurious BOM sequences in documents
where doing so breaks them.


-- 
Website: http://hallambaker.com/