Re: [Json] Unpaired surrogates in JSON strings

John Cowan <cowan@mercury.ccil.org> Thu, 06 June 2013 01:09 UTC

Return-Path: <cowan@ccil.org>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9988721F9648 for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 18:09:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1
X-Spam-Level:
X-Spam-Status: No, score=-1 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BnyKnwYXiVkg for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 18:09:47 -0700 (PDT)
Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by ietfa.amsl.com (Postfix) with ESMTP id 48EEB21F9631 for <json@ietf.org>; Wed, 5 Jun 2013 18:09:47 -0700 (PDT)
Received: from cowan by earth.ccil.org with local (Exim 4.72) (envelope-from <cowan@ccil.org>) id 1UkOi1-0000b8-GR; Wed, 05 Jun 2013 21:09:45 -0400
Date: Wed, 05 Jun 2013 21:09:45 -0400
From: John Cowan <cowan@mercury.ccil.org>
To: Tim Bray <tbray@textuality.com>
Message-ID: <20130606010945.GA1362@mercury.ccil.org>
References: <20130605162246.GG3680@mercury.ccil.org> <51AF7988.6040009@crockford.com> <20130605184702.GB6999@mercury.ccil.org> <51AF8A09.50806@crockford.com> <AE081E5F-82AB-416F-A690-E8373C0369B0@vpnc.org> <CAHBU6is9NBuicPm=mNSTLRUvXjrAt8BA5KH=A4pSeCNJy=vTNQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CAHBU6is9NBuicPm=mNSTLRUvXjrAt8BA5KH=A4pSeCNJy=vTNQ@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: John Cowan <cowan@ccil.org>
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, Douglas Crockford <douglas@crockford.com>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 01:09:51 -0000

Tim Bray scripsit:

> In section 2.5 of 4627, a reasonable reading of the text clearly
> disallows unpaired surrogates, because the discussion is exclusively
> of characters, which surrogates aren’t; they are code points,
> but there are no characters that have those code points. From the
> introduction: “A string is a sequence of zero or more Unicode
> characters”. Case closed.

Unfortunately it isn't.

> A loose reading of the BNF probably allows naked surrogates if you
> ignore what the text says.

It's not about a loose reading.  Section 4 says "A JSON parser MUST
accept all texts that conform to the JSON grammar".  That's a flat
contradiction of the above.

> I think anyone who’s delivering those codepoints is already in
> violation of 4627, and I don’t think we should retroactively forgive
> those sins.

It's already been stated that ECMA can't swallow this change.

-- 
John Cowan    cowan@ccil.org    http://ccil.org/~cowan
Objective consideration of contemporary phenomena compel the conclusion
that optimum or inadequate performance in the trend of competitive
activities exhibits no tendency to be commensurate with innate capacity,
but that a considerable element of the unpredictable must invariably be
taken into account. --Ecclesiastes 9:11, Orwell/Brown version