Re: [Json] On characters and code points

Stefan Drees <stefan@drees.name> Fri, 07 June 2013 16:16 UTC

Return-Path: <stefan@drees.name>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 75B6321F9957 for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 09:16:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.191
X-Spam-Level:
X-Spam-Status: No, score=-2.191 tagged_above=-999 required=5 tests=[AWL=0.058, BAYES_00=-2.599, HELO_EQ_DE=0.35]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l58wHfIU2IhP for <json@ietfa.amsl.com>; Fri, 7 Jun 2013 09:16:15 -0700 (PDT)
Received: from mout.web.de (mout.web.de [212.227.15.14]) by ietfa.amsl.com (Postfix) with ESMTP id 3BA3821F964C for <json@ietf.org>; Fri, 7 Jun 2013 09:15:56 -0700 (PDT)
Received: from newyork.local.box ([93.129.186.5]) by smtp.web.de (mrweb102) with ESMTPSA (Nemesis) id 0MCZh8-1Ubfed3PBi-009QLa; Fri, 07 Jun 2013 18:15:46 +0200
Message-ID: <51B20731.3040300@drees.name>
Date: Fri, 07 Jun 2013 18:15:45 +0200
From: Stefan Drees <stefan@drees.name>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: Tim Bray <tbray@textuality.com>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp> <9ld3r8pc0tufif18dohb2fmi0ijna1vs4n@hive.bjoern.hoehrmann.de> <56A163E9-E7CD-46B3-9984-8F009EBFF500@vpnc.org> <CAHBU6ivG=ONc8roT7W=LdpKYNMqRH_d5BobZ=pHnk=mVaKZKaA@mail.gmail.com>
In-Reply-To: <CAHBU6ivG=ONc8roT7W=LdpKYNMqRH_d5BobZ=pHnk=mVaKZKaA@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:oeXGsdWOgc/sTMYAgReaQ8ieM2a2Mo8YE/vlnRPecRs eTBSjoG89c7ngUCCmsoE6dUsyrNK7G9catAALDIINaWTrQ1VxG bkxFpwBOirsprgxZI3kqtTxsqak4fDAzY0THIfKKKKCbagkePy pGGwu15BS3qVeEtMP3/RaPor2QwIKpGfkyIPtswjfMd/YdGSHn 7V/+cKqQX/FCU1voOsiMg==
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] On characters and code points
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: stefan@drees.name
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2013 16:16:21 -0000

On 2013-06-07 18:01, Tim Bray wrote:
> On Fri, Jun 7, 2013 at 8:56 AM, Paul Hoffman ... wrote:
>
>     This may be a part of the spec where some people have to hold their
>     noses. The Unicode definition of "character" does not include
>     non-characters, and the code points for some of those non-characters
>     make sense in JSON strings when those strings. Bjoern has pointed
>     out a good one: strings used for test cases of other code. The issue
>     not just unpaired surrogates. Do we *really* want to prohibit:
>         { "End of data marker": "\uFFFF" }
>
>
> Yes, I *really* want to prohibit that. The one corner case it buys you
> is outweighed by a factor of a thousand or so in not being able to use
> general-purpose string processing software to deal with JSON payloads.
> BTW, a huge amount of deployed software out there ALREADY processes JSON
> text fields using general-purpose string processing libraries, and will
> explode unpredictably and in hard-to-debug ways if this starts happening.

and what about { "Decorate my slash": "\/" } and "general-purpose string 
processing software". Isn't this also a case, where you need a 
"pre-conditioner" that replaces the JSON specific escape sequence "\" 
with "/" before feeding it into "general-purpose string processing 
software" :-?)


> Also, consider the lovely consequences when unpaired surrogates start
> showing  up in key fields and are fed to hash functions in every
> programming language in the world, which expect to receive Unicode
> characters.
>   -T
>

For today I better not imagine all these laguages and implementations 
blindly stuffing some json text transformed into their own memory 
structures ... maybe later during the weekend


>     ...

Stefan.