Re: [Json] Unpaired surrogates in JSON strings

Douglas Crockford <douglas@crockford.com> Thu, 06 June 2013 01:08 UTC

Return-Path: <douglas@crockford.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F3B7621F9632 for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 18:08:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0
X-Spam-Level:
X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eEja2DFKJhTy for <json@ietfa.amsl.com>; Wed, 5 Jun 2013 18:08:39 -0700 (PDT)
Received: from mout.perfora.net (mout.perfora.net [74.208.4.194]) by ietfa.amsl.com (Postfix) with ESMTP id 1477221F9631 for <json@ietf.org>; Wed, 5 Jun 2013 18:08:38 -0700 (PDT)
Received: from [192.168.0.108] (173-228-7-202.dsl.static.sonic.net [173.228.7.202]) by mrelay.perfora.net (node=mrus2) with ESMTP (Nemesis) id 0Mbx30-1V0HgN2BTa-00JGuD; Wed, 05 Jun 2013 21:08:37 -0400
Message-ID: <51AFE107.7020301@crockford.com>
Date: Wed, 05 Jun 2013 18:08:23 -0700
From: Douglas Crockford <douglas@crockford.com>
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: Tim Bray <tbray@textuality.com>
References: <20130605162246.GG3680@mercury.ccil.org> <51AF7988.6040009@crockford.com> <20130605184702.GB6999@mercury.ccil.org> <51AF8A09.50806@crockford.com> <AE081E5F-82AB-416F-A690-E8373C0369B0@vpnc.org> <CAHBU6is9NBuicPm=mNSTLRUvXjrAt8BA5KH=A4pSeCNJy=vTNQ@mail.gmail.com>
In-Reply-To: <CAHBU6is9NBuicPm=mNSTLRUvXjrAt8BA5KH=A4pSeCNJy=vTNQ@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Provags-ID: V02:K0:FylmLBTAq2Uk6xMjNcqy4MlF+4DqS0CZwHp/eNGmJ+9 p0ujblWKuG6ucJCv0jZwSH51rPRrpCITNBFIVOXhXgEElc97MH hLa6uCC1QQAoA/p73SN9/+R7KyaH1pk7N4jXLS1se2J7JxGFHU wJ9k+gXpnVqQwT8sIaBuckf7OCsUDBwieqYB/23KpVwounau1f lgyhMXHCRhlk4s77QnOhNOd7iyKQkz0ResWEkiq/j9eaS9lNqN voLINWZybOAiRdmfvvEFXDFnS9S+7p2Vr3p93Aw/u39hPGAAt3 u35FUxUzSHT1UvzjSkmSdMF57QcyPhe0MKBBJEMh0DWYtSaNfE wXlnePm/H2IFW4xpijYY=
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] Unpaired surrogates in JSON strings
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jun 2013 01:08:45 -0000

On 6/5/2013 5:18 PM, Tim Bray wrote:
> In section 2.5 of 4627, a reasonable reading of the text clearly 
> disallows unpaired surrogates, because the discussion is exclusively 
> of characters, which surrogates aren’t; they are code points, but 
> there are no characters that have those code points. From the 
> introduction: “A string is a sequence of zero or more Unicode 
> characters”. Case closed.
>
> A loose reading of the BNF probably allows naked surrogates if you 
> ignore what the text says.
>
> I think anyone who’s delivering those codepoints is already in 
> violation of 4627, and I don’t think we should retroactively forgive 
> those sins.
>
> -T
>
>
> On Wed, Jun 5, 2013 at 4:55 PM, Paul Hoffman <paul.hoffman@vpnc.org 
> <mailto:paul.hoffman@vpnc.org>> wrote:
>
>     On Jun 5, 2013, at 11:57 AM, Douglas Crockford
>     <douglas@crockford.com <mailto:douglas@crockford.com>> wrote:
>
>     > On 6/5/2013 11:47 AM, John Cowan wrote:
>     >> Douglas Crockford scripsit:
>     >>
>     >>> Such a requirement will be breaking. Breaking changes are out
>     of scope.
>     >> How is it a breaking change to limit what documents are allowed
>     to be
>     >> *generated*?
>     >>
>     > Because JSON is currently being used in applications that
>     deliver those codepoints.
>
>     Can you say why an application would do that, given the JSON
>     specification?
>
The application that does that is JavaScript. Any 16-bit value can be 
put next to any other 16-bit value and then JSON encoded. The meaning of 
'character' throughout the RFC is ECMAScript's, which is roughly the 
same as Unicode's code point. This can be seen in the BNF. When the RFC 
talks about Unicode Characters, it is in the sense of Unicode 
Characters  and not EBCDIC Characters.

JSON is just the pipe. It doesn't need to be enforcing Unicode over 
JavaScript. The sender and receiver can argue about what it means to be 
a character. JSON has always been agnostic about this.