Re: [Json] BOMs

t.p. <daedulus@btconnect.com> Tue, 19 November 2013 10:13 UTC

Return-Path: <daedulus@btconnect.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D36081AD8EE; Tue, 19 Nov 2013 02:13:54 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.299
X-Spam-Level:
X-Spam-Status: No, score=0.299 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Up3Sv9GJfU3D; Tue, 19 Nov 2013 02:13:53 -0800 (PST)
Received: from db9outboundpool.messaging.microsoft.com (mail-db9lp0250.outbound.messaging.microsoft.com [213.199.154.250]) by ietfa.amsl.com (Postfix) with ESMTP id 8F75E1AD8F1; Tue, 19 Nov 2013 02:13:52 -0800 (PST)
Received: from mail121-db9-R.bigfish.com (10.174.16.228) by DB9EHSOBE037.bigfish.com (10.174.14.100) with Microsoft SMTP Server id 14.1.225.22; Tue, 19 Nov 2013 10:13:45 +0000
Received: from mail121-db9 (localhost [127.0.0.1]) by mail121-db9-R.bigfish.com (Postfix) with ESMTP id AE906460572; Tue, 19 Nov 2013 10:13:45 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.249.85; KIP:(null); UIP:(null); IPV:NLI; H:AMSPRD0710HT004.eurprd07.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: -16
X-BigFish: PS-16(z569dhzbb2dI98dI9371Ic89bh542I1432Izz1f42h2148h208ch1ee6h1de0h1fdah2073h2146h1202h1e76h20f7h1d1ah1d2ah1fc6hzz1de098h1033IL17326ah8275bh8275dh1de097h186068hz2dh2a8h5a9h839h93fhd24hf0ah1177h1179h1288h12a5h12a9h12bdh137ah139eh13b6h1441h1504h1537h162dh1631h1758h17f1h184fh1898h18e1h1946h19b5h19ceh1ad9h1b0ah2222h224fh1d0ch1d2eh1d3fh1dfeh1dffh1e1dh1e23h2218h2216h304l1d11m1155h)
Received: from mail121-db9 (localhost.localdomain [127.0.0.1]) by mail121-db9 (MessageSwitch) id 1384856022689400_19253; Tue, 19 Nov 2013 10:13:42 +0000 (UTC)
Received: from DB9EHSMHS018.bigfish.com (unknown [10.174.16.253]) by mail121-db9.bigfish.com (Postfix) with ESMTP id 9941842011C; Tue, 19 Nov 2013 10:13:42 +0000 (UTC)
Received: from AMSPRD0710HT004.eurprd07.prod.outlook.com (157.56.249.85) by DB9EHSMHS018.bigfish.com (10.174.14.28) with Microsoft SMTP Server (TLS) id 14.16.227.3; Tue, 19 Nov 2013 10:13:38 +0000
Received: from AMXPRD0310HT003.eurprd03.prod.outlook.com (157.56.248.133) by pod51017.outlook.com (10.255.160.167) with Microsoft SMTP Server (TLS) id 14.16.383.1; Tue, 19 Nov 2013 10:13:37 +0000
Message-ID: <020401cee50f$a2cdf5c0$4001a8c0@gateway.2wire.net>
From: "t.p." <daedulus@btconnect.com>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic><f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk> <5289F974.9020709@it.aoyama.ac.jp>
Date: Tue, 19 Nov 2013 10:10:47 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Originating-IP: [157.56.248.133]
Content-Transfer-Encoding: quoted-printable
X-OriginatorOrg: btconnect.com
X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn%
X-Mailman-Approved-At: Tue, 19 Nov 2013 06:22:13 -0800
Cc: John Cowan <cowan@mercury.ccil.org>, IETF Discussion <ietf@ietf.org>, Pete Cordell <petejson@codalogic.com>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 10:13:54 -0000

----- Original Message -----
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Cc: "John Cowan" <cowan@mercury.ccil.org>; "IETF Discussion"
<ietf@ietf.org>; "Pete Cordell" <petejson@codalogic.com>; "JSON WG"
<json@ietf.org>; "Anne van Kesteren" <annevk@annevk.nl>;
<www-tag@w3.org>; "es-discuss" <es-discuss@mozilla.org>
Sent: Monday, November 18, 2013 11:26 AM

> On 2013/11/18 20:11, Henry S. Thompson wrote:
> > Pete Cordell writes:
> >
> >> Given the history below, would it be sensible to accept BOMs for
UTF-8
> >> encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs
needed
> >> and/or used in the wild for UTF-16 and UTF-32?
> >>
> >> Maybe the text can say something like "SHOULD accept BOMs for
UTF-8,
> >> and MAY accept BOMs for UTF-16 and / or UTF-32"?
> >
> > My sense is that you'll see more UTF-16 BOMs than anything else.
>
> Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire
> UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we
> are discussing.) To bring up the XML example again, XML actually
> strictly requires a BOM for UTF-16. The IETF definition of UTF-16 does
> not require a BOM for UTF-16. See http://tools.ietf.org/html/rfc2781,
in
> particular http://tools.ietf.org/html/rfc2781#section-3.2,
> http://tools.ietf.org/html/rfc2781#section-3.3, and
> http://tools.ietf.org/html/rfc2781#section-4.
>
> For UTF-8, the BOM is not a Byte Order Mark, because such a mark isn't
> necessary at all. It may serve as a signature, but is not necessary,
and
> in some circumstances counterproductive.

Martin

We had a similar discussion with syslog back in 2005, the issue being
that UTF-8 was new and different and how to tell whether it was being
used or not, and what made it into RFC5424 was
"  If a syslog application encodes MSG in UTF-8, the string MUST start
   with the Unicode byte order mask (BOM), which for UTF-8 is ABNF
   %xEF.BB.BF.  "
which remains a MUST to this day.  There are no relevant Errata.

Tom Petch

> As for what to say about whether to accept BOMs or not, I'd really
want
> to know what the various existing parsers do. If they accept BOMs,
then
> we can say they should accept BOMs. If they don't accept BOMs, then we
> should say that they don't.
>
> Regards,   Martin.
>
> > UTF-32 support seems to be waning (at least in the browsers), but
> > UTF-16 is in pretty widespread use.  John, do you think you can fool
> > google into counting BOMs for us?
>
>