Re: [Json] BOMs

ht@inf.ed.ac.uk (Henry S. Thompson) Wed, 20 November 2013 10:30 UTC

Return-Path: <ht@inf.ed.ac.uk>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7E72F1ADBCA; Wed, 20 Nov 2013 02:30:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.826
X-Spam-Level:
X-Spam-Status: No, score=-3.826 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_34=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.525, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0bbwwzAXOPey; Wed, 20 Nov 2013 02:30:12 -0800 (PST)
Received: from nougat.ucs.ed.ac.uk (nougat.ucs.ed.ac.uk [129.215.13.205]) by ietfa.amsl.com (Postfix) with ESMTP id EA5BE1AE3C1; Wed, 20 Nov 2013 02:30:11 -0800 (PST)
Received: from crunchie.inf.ed.ac.uk (crunchie.inf.ed.ac.uk [129.215.33.180]) by nougat.ucs.ed.ac.uk (8.13.8/8.13.4) with ESMTP id rAKATd93012524; Wed, 20 Nov 2013 10:29:43 GMT
Received: from troutbeck.inf.ed.ac.uk (troutbeck.inf.ed.ac.uk [129.215.25.32]) by crunchie.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id rAKATas3032549; Wed, 20 Nov 2013 10:29:37 GMT
Received: from troutbeck.inf.ed.ac.uk (localhost [127.0.0.1]) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id rAKATbjj032038; Wed, 20 Nov 2013 10:29:37 GMT
Received: (from ht@localhost) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4/Submit) id rAKATZ2G032034; Wed, 20 Nov 2013 10:29:35 GMT
X-Authentication-Warning: troutbeck.inf.ed.ac.uk: ht set sender to ht@inf.ed.ac.uk using -f
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp> <020401cee50f$a2cdf5c0$4001a8c0@gateway.2wire.net> <528B46EA.4040503@it.aoyama.ac.jp> <43255615-2FC9-4726-99FD-1B13D6B1F033@wirfs-brock.com> <f5br4ackyqm.fsf@troutbeck.inf.ed.ac.uk> <528C5445.3050600@it.aoyama.ac.jp>
From: ht@inf.ed.ac.uk
Date: Wed, 20 Nov 2013 10:29:35 +0000
In-Reply-To: <528C5445.3050600@it.aoyama.ac.jp> ("Martin J. Dürst"'s message of "Wed\, 20 Nov 2013 15\:18\:45 +0900")
Message-ID: <f5bd2lvl628.fsf@troutbeck.inf.ed.ac.uk>
User-Agent: Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Edinburgh-Scanned: at nougat.ucs.ed.ac.uk with MIMEDefang 2.60, Sophie, Sophos Anti-Virus, Clam AntiVirus
X-Scanned-By: MIMEDefang 2.60 on 129.215.13.205
X-Mailman-Approved-At: Wed, 20 Nov 2013 04:39:38 -0800
Cc: Allen Wirfs-Brock <allen@wirfs-brock.com>, John Cowan <cowan@mercury.ccil.org>, IETF Discussion <ietf@ietf.org>, Pete Cordell <petejson@codalogic.com>, JSON WG <json@ietf.org>, www-tag@w3.org, Anne van Kesteren <annevk@annevk.nl>, "t.p." <daedulus@btconnect.com>, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2013 10:30:15 -0000

Martin J. Dürst writes:

> Hello Henry, others,
>
> The fact that some libraries or Web sites accept a BOM for JSON isn't
> a proof that all (well, let's say the majority) accept a BOM.

I wasn't suggesting that it did, rather that the kind of testing that
needed to be done was testing that included the interface between
transport and parser, not parser alone, since, as we are all agreed,
the BOM is not allowed by the language itself.

>> As previously discussed, _no-one_ is arguing that BOMs are in the JSON
>> language as such.  JSON parsers shouldn't accept BOMs.
>>
>> BOMs are, to quote the UNICODE spec, "not part of the text".  It is
>> appropriate that specs concerned with JSON-on-the-wire, for example
>> the media type registration for 'application/json', _should_ discuss
>> the BOM, and it's open to them, _without changing the language at
>> all_, to say that BOMs are acceptable but, again, are not part of the
>> text which the parser has to accept.
>
> I agree that *from a theoretical viewpoint*, this is correct. But
> theory isn't everything. As I have written before (and you have cited
> in another thread, for another spec):
>
>   What's most important now is to know what receivers actually
>   accept. We are not in a design phase, we are just updating the
>   definition ... and making sure we fix problems if there are
>   problems, but we have to use the installed base for the main
>   guidance
>
> For our update from RFC 4627, the null hypothesis is that there are no
> BOMs (neither for UTF-8 nor for UTF-16). The patterns given in
> http://tools.ietf.org/html/rfc4627#section-3 cannot apply to
> characters, they can only apply to bytes. If we want to allow a spec
> in application/json, then we have to have strong evidence that almost
> all parsers can deal with BOMs, not just fragmentary evidence that
> some parsers don't choke on a BOM.

I'm not suggesting parsers should allow BOMs, if by parser you mean
what implements the grammar of the language.  And I'm certainly not
suggesting that two examples pulled at random from a Google search
constitute strong evidence about practice at large.  All I was trying
to suggest was that showing that the language parser as such in
Firefox etc.didn't allow BOMs, as the OP had done, was the right kind
of evidence _either_.

> Please note that there's some parallel to XML, in that neither Unicode
> (for the encoding form) nor the IETF (for the 'charset') require a BOM
> for "UTF-16", but XML nevertheless strictly requires it.

I agree that XML is a useful point of comparison, in particular
because it too does not allow a BOM as part of an XML document, but
rather treats it as an aspect of packaging/transport external to the
XML document, which seems to me to be the kind of approach to BOMs the
JSON WG might consider.

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]