Re: [Json] BOMs (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

John Cowan <cowan@mercury.ccil.org> Mon, 18 November 2013 18:42 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5BEDC1A1F50; Mon, 18 Nov 2013 10:42:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.225
X-Spam-Level:
X-Spam-Status: No, score=-1.225 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.525] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OJRqCg4iYKT3; Mon, 18 Nov 2013 10:42:03 -0800 (PST)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id CC9531A1F48; Mon, 18 Nov 2013 10:42:03 -0800 (PST)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1ViTlh-0000QF-G3; Mon, 18 Nov 2013 13:41:53 -0500
Date: Mon, 18 Nov 2013 13:41:53 -0500
From: John Cowan <cowan@mercury.ccil.org>
To: Pete Cordell <petejson@codalogic.com>
Message-ID: <20131118184153.GH23458@mercury.ccil.org>
References: <AA45B3C6-1DC5-4B1E-8045-C9FE76022584@vpnc.org> <CEA92854.2CC53%jhildebr@cisco.com> <20131113224737.GI31823@mercury.ccil.org> <f5bob5n71y7.fsf@troutbeck.inf.ed.ac.uk> <5284B095.4070004@it.aoyama.ac.jp> <C37B2FE59C164DBCA982AC81A56A09AA@codalogic> <CAHBU6ivieGAmNF=ZyMNoBCLO3q17E-J_g=pMN1jkfd1J_PW9iA@mail.gmail.com> <79BD90E325154FD981F41A6CDF790C45@codalogic> <20131118170708.GE23458@mercury.ccil.org> <B40B293D5050437792908F50A5E24367@codalogic>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <B40B293D5050437792908F50A5E24367@codalogic>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, JSON WG <json@ietf.org>, Tim Bray <tbray@textuality.com>, Anne van Kesteren <annevk@annevk.nl>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, www-tag@w3.org, IETF Discussion <ietf@ietf.org>
Subject: Re: [Json] BOMs (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 18:42:06 -0000

Pete Cordell scripsit:

> Do you mean that the presence of a UTF-8 BOF sequence doesn't prove
> that it's not Windows cp-1252 or do you mean you can tell apart a
> UTF-8 and cp-1252 file without BOMs?

I meant the latter, but the former is true, too.  A plain text document
beginning "" in Windows-1252 will appear to begin with an 8-BOM
in the absence of out of band information.

> If the latter, do the relevant tools take the time to distinguish
> the 2 without BOMs?

Some tools do, some don't.  The IRC client I use, XChat, attempts to
convert input as UTF-8, and if that fails, converts it as Latin-1.
I have not yet seen it produce mojibake.

-- 
John Cowan   cowan@ccil.org  http://www.ccil.org/~cowan
Most languages are dramatically underdescribed, and at least one is
dramatically overdescribed.  Still other languages are simultaneously
overdescribed and underdescribed.  Welsh pertains to the third category.
        --Alan King