Re: [Json] Unpaired surrogates in JSON strings

John Cowan <cowan@mercury.ccil.org> Thu, 06 June 2013 15:08 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 27E3C21F965B for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 08:08:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.519
X-Spam-Level:
X-Spam-Status: No, score=-2.519 tagged_above=-999 required=5 tests=[AWL=1.080, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id T9AeE8XQJIvE for <json@ietfa.amsl.com>; Thu, 6 Jun 2013 08:08:32 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id BCB8E21F944F for <json@ietf.org>; Thu, 6 Jun 2013 08:08:32 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1Ukbnh-0001zc-DA; Thu, 06 Jun 2013 11:08:29 -0400
Date: Thu, 06 Jun 2013 11:08:29 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: Tim Bray <tbray@textuality.com>
Message-ID: <20130606150829.GC3090@mercury.ccil.org>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Cc: "json@ietf.org" <json@ietf.org>, Paul Hoffman <paul.hoffman@vpnc.org>, Douglas Crockford <douglas@crockford.com>, "Joe Hildebrand (jhildebr)" <jhildebr@cisco.com>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 15:08:37 -0000

Tim Bray scripsit:

> I’m fairly shocked.  I have always believed that JSON encodes what its
> introduction (and section 2.5 "Strings") say it encodes, Unicode
> characters.
> 
> If it is a requirement to accommodate the class of bug where languages that
> use UTF-16 (Java, JavaScript, C#) can emit unpaired UTF-16 surrogates, the
> spec needs to be clear that the INTENT is actually to support Unicode
> characters, and that unpaired surrogates are always evidence of a bug, and
> there can be no expectation that any software receiving such buggy data
> will be able to do anything useful with it, or even avoid crashing in a
> hard-to-debug way down in the bowels of a library routine.  -T

+1

"If Parliament does not mean what it says, it must say so."

-- 
[W]hen I wrote it I was more than a little              John Cowan
febrile with foodpoisoning from an antique carrot       cowan@ccil.org
that I foolishly ate out of an illjudged faith          http://ccil.org/~cowan
in the benignancy of vegetables.  --And Rosta