Re: [Json] BOMs

John Cowan <cowan@mercury.ccil.org> Mon, 18 November 2013 16:22 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7211011E8141; Mon, 18 Nov 2013 08:22:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iCcKyOyKYZEb; Mon, 18 Nov 2013 08:22:17 -0800 (PST)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id D758F11E81E1; Mon, 18 Nov 2013 08:20:09 -0800 (PST)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1ViRYI-00036h-GB; Mon, 18 Nov 2013 11:19:54 -0500
Date: Mon, 18 Nov 2013 11:19:54 -0500
From: John Cowan <cowan@mercury.ccil.org>
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Message-ID: <20131118161954.GB23458@mercury.ccil.org>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <f5bk3g6ufqy.fsf@troutbeck.inf.ed.ac.uk>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Cc: IETF Discussion <ietf@ietf.org>, Pete Cordell <petejson@codalogic.com>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, "Martin J. Dürst\"\"" <duerst@it.aoyama.ac.jp>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
Subject: Re: [Json] BOMs
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 16:22:22 -0000

Henry S. Thompson scripsit:

> My sense is that you'll see more UTF-16 BOMs than anything else.

I agree.

> UTF-32 support seems to be waning (at least in the browsers), but
> UTF-16 is in pretty widespread use.  John, do you think you can fool
> google into counting BOMs for us?

No, because Google transcodes everything into UTF-8 as soon as it starts
to process it.  What I can say (auct. Mark Davis) is that UTF-16 documents
in all formats represent much less than 0.1% of the searchable Web.
By contrast, UTF-8 (including ASCII) amounts to 80% of it.  This reflects
actual rather than declared encodings, and is as of January 2012.

-- 
So they play that [tune] on                     John Cowan
their fascist banjos, eh?                       cowan@ccil.org
        --Great-Souled Sam                      http://www.ccil.org/~cowan