Re: [Json] On characters and code points

Stefan Drees <stefan@drees.name> Sat, 08 June 2013 09:56 UTC

Return-Path: <stefan@drees.name>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3DCB021F9A14 for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 02:56:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.207
X-Spam-Level:
X-Spam-Status: No, score=-2.207 tagged_above=-999 required=5 tests=[AWL=0.042, BAYES_00=-2.599, HELO_EQ_DE=0.35]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IebCymRjyvif for <json@ietfa.amsl.com>; Sat, 8 Jun 2013 02:56:46 -0700 (PDT)
Received: from mout.web.de (mout.web.de [212.227.15.14]) by ietfa.amsl.com (Postfix) with ESMTP id 1F55B21F9A0C for <json@ietf.org>; Sat, 8 Jun 2013 02:56:45 -0700 (PDT)
Received: from newyork.local.box ([93.129.147.120]) by smtp.web.de (mrweb002) with ESMTPSA (Nemesis) id 0LqUKH-1U7mCR3LxB-00dvRk; Sat, 08 Jun 2013 11:56:43 +0200
Message-ID: <51B2FFD9.4030106@drees.name>
Date: Sat, 08 Jun 2013 11:56:41 +0200
From: Stefan Drees <stefan@drees.name>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: Tim Bray <tbray@textuality.com>
References: <A723FC6ECC552A4D8C8249D9E07425A70FC2E7E1@xmb-rcd-x10.cisco.com> <51B06F38.8050707@crockford.com> <CAHBU6iuFBuW-RfgBLQF5q4BnUOzs088QXW3uOQG1OjBFjZttkw@mail.gmail.com> <51B1B4E7.8090101@it.aoyama.ac.jp> <9ld3r8pc0tufif18dohb2fmi0ijna1vs4n@hive.bjoern.hoehrmann.de> <56A163E9-E7CD-46B3-9984-8F009EBFF500@vpnc.org> <CAHBU6ivG=ONc8roT7W=LdpKYNMqRH_d5BobZ=pHnk=mVaKZKaA@mail.gmail.com> <51B20731.3040300@drees.name> <CAHBU6iufTsLoBoeFxT4pHSGAUi8H-wUFQYj1VcVQu1K_QCdhww@mail.gmail.com>
In-Reply-To: <CAHBU6iufTsLoBoeFxT4pHSGAUi8H-wUFQYj1VcVQu1K_QCdhww@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:swaIZuoH44soCGK8yLO2Q066MvMtAlgOXFAqC7gANou Jb2fI7HZZZL6il2KS6p0CeDF/BdRXnEhWUqEr6nwXgNRziDuuj pwj1FwCuA8ALlgT6eHyDyQp6JA5WHCCiW8POl1fYKfGm+qdhVg XVZNSPqTWRD++HSeOFSlgMSeMdPcl3KVfoxoiVV7vf7qcxA/FU 0IQYcTOrOl2KQ5POJiwaQ==
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, "json@ietf.org" <json@ietf.org>
Subject: Re: [Json] On characters and code points
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: stefan@drees.name
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2013 09:56:52 -0000

On 2013-06-07 18:19, Tim Bray wrote:
> On Fri, Jun 7, 2013 at 9:15 AM, Stefan Drees ... wrote:
>
>     and what about { "Decorate my slash": "\/" } and "general-purpose
>     string processing software". Isn't this also a case, where you need
>     a "pre-conditioner" that replaces the JSON specific escape sequence
>     "\" with "/" before feeding it into "general-purpose string
>     processing software" :-?)
>
>
> Red herring.  JSON, just like XML, has ways to encode characters that
> are hard to type.  What we're arguing about is the actual content of the
> payload after the parser/pre-conditioner.  -T ...

I beg to differ. I probed the claim, that "general-purpose string 
processing software" would have problems with entity so-and-so but not 
with JSON text as is.

Also, when it comes to "typing" (as human-machine directed communication 
means) forward slashes, dot-ini files or YAML text in my experience are 
far more easy to type than JSON text ;-)

As John Cowan rightfully stated: "That[the general-purpose string 
processing software] would be the JSON parser." **but** it's task may be 
considered being anything from easy to hard.
We discuss here on these matters, so it's not a red herring, isn't it?



Personal motivation (if time and interest allow):

Any differences in expressing common "atomic" values or "core 
structures" (like type of inter atomic bindings) should be verified as 
each may introduce otherwise not needed hurdles or even traps in 
translation. It is data interchange we specify.

For a transformation source and target that look quite similar (as eg. 
JSON variables and corresponding language L "things") for some language 
L in (ECMAScript, PHP, Pyton, ...) one would expect, to not need any 
"heavy" lifting when transforming from one representation into another.

I observe, that "we programmers" more and more look at our code (esp. 
atomic variables and structures in some language L), compare these to 
the serialized (prettyprinted) JSON equivalent and then expect them to 
be essentially identical (some people even start to think eg. "JSON 
**is** Python syntax" as recently observed on the python-ideas mailing 
list when argueing in favour of a JSON serializer).

At least one should expect for a successful and proven data interchange 
format, that the atomic values simply "slip back and forth" and do not 
need special per-character handling.

So the more we extend basic escaping "C"ulture (as we further superset 
the C escape sequences), the wider this gap will become.

Interesting "structural" rule differences (not being candidate to 
change, I know) are:

* the trailing comma allowed (or even needed) in many programming
   languages for instances of list-like structures being not allowed in
   JSON objects nor arrays (I am sorry that this will not change)

* the JSON need for quoting the names (I am neutral here)

do bite us "machine instructing humans" once in a while, don't they?

The "might-look-like-a-babylonian" system of programming languages that 
evolved as solution attempts to the

* N humans instructing
* M machines with
* L languages utilizing
* F formats for data interchange or storage to handle
* P problems

is hopefully offering NP-complete solutions :-) but IMO some of the 
couplings should be reconsidered with passionate caution from time to time.

Maybe now is such a "time" to reconsider some regions of the space 
spanned by the cartesian product L x (F=JSON).

Thanks for reading this long opinionated message (which I also did 
before sending. Promised ;-)

Stefan.