Re: [Json] Nudging the English-language vs. formalisms discussion forward

"Pete Cordell" <petejson@codalogic.com> Thu, 20 February 2014 15:23 UTC

Return-Path: <petejson@codalogic.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E8B1E1A01C0 for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 07:23:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.537
X-Spam-Level: ***
X-Spam-Status: No, score=3.537 tagged_above=-999 required=5 tests=[BAYES_50=0.8, FH_HOST_EQ_D_D_D_D=0.765, HELO_MISMATCH_COM=0.553, RDNS_DYNAMIC=0.982, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, STOX_REPLY_TYPE=0.439] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UnLnIkNhZz5T for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 07:22:59 -0800 (PST)
Received: from ppsa-online.com (lvps217-199-162-192.vps.webfusion.co.uk [217.199.162.192]) by ietfa.amsl.com (Postfix) with ESMTP id 9B4AE1A01B3 for <json@ietf.org>; Thu, 20 Feb 2014 07:22:58 -0800 (PST)
Received: (qmail 30039 invoked from network); 20 Feb 2014 15:22:14 +0000
Received: from host81-155-177-242.range81-155.btcentralplus.com (HELO codalogic) (81.155.177.242) by lvps217-199-162-217.vps.webfusion.co.uk with ESMTPSA (RC4-MD5 encrypted, authenticated); 20 Feb 2014 15:22:14 +0000
Message-ID: <AF211B67DB3D453D9DE8F8FA53886F73@codalogic>
From: "Pete Cordell" <petejson@codalogic.com>
To: "Phillip Hallam-Baker" <hallam@gmail.com>
References: <C87F9B96-E028-4F0E-A950-B39D3F68FFE7@vpnc.org><CAMm+LwhUh_yN-hzaoDWfrO_H2iGvYvj99BCE4EcYmgqCPqXoVQ@mail.gmail.com><CAHBU6itpttXBfVQGKw=u==k_XSdrht81+m_YDNZP6RM+=9CNow@mail.gmail.com><CAK3OfOjHkBFOzJSx=bhhoQJ8Z2bWyEXK52dNyYGWVb9FAj99ow@mail.gmail.com><CAHBU6itzQ0rzU3EUYUqzm2qhx03qk1mpx2sehS_zeiw1ypcEgw@mail.gmail.com><CAK3OfOhfjkbq6eREkt=MBVL1C9ubh-6My3Lvg-mnOxD0+cpN1Q@mail.gmail.com><CAHBU6isZbew8O1HJ+XcFsMCR42iDoO_uemPXVwa3=vM5A=MngA@mail.gmail.com><CAK3OfOgmVsNJqrqCfsD7h37axssOoaX3DGHqO=bTn5bWrA+MFA@mail.gmail.com><A4B53816-6FBF-4A37-8BC9-F0A9D0867BCD@tzi.org><357740A8AA0F4316BE630917321FAB4D@codalogic><B1EBE05A69362F001777F807@cyrus.local><47BB9131737D42218A6382DEF45BBE2C@codalogic> <CAMm+LwgmHjoLu2=zTOERN8LO74hWpp45yy2epd2JzqDRM9oFfg@mail.gmail.com>
X-Unsent: 1
X-Vipre-Scanned: 016B397900682A016B3AC6
Date: Thu, 20 Feb 2014 15:22:52 -0000
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Archived-At: http://mailarchive.ietf.org/arch/msg/json/E28bm9hRRd0_o4PfTsPT7W1VcfU
Cc: Cyrus Daboo <cyrus@daboo.name>, Carsten Bormann <cabo@tzi.org>, JSON WG <json@ietf.org>
Subject: Re: [Json] Nudging the English-language vs. formalisms discussion forward
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Feb 2014 15:23:03 -0000

----- Original Message From: "Phillip Hallam-Baker"
>> Thanks Cyrus.  It certainly looks like another option.  I would suggest
>> that the authors could look at the following to improve it:
>>
>> - It doesn't seem to have much in the way of supporting modularity.  It
>> would be attractive if a schema could pull in and use definitions defined
>> in other documents.  Many IETF protocols consist of suites of components
>> and support for that would be handy.
>>
>
> That is probably the right decision at this point.

I'm not sure what "That" refers to here.

> IETF has not decided how to make JSON protocols composable.

Isn't that why we're here?

> We have JOSE
> but that may be a poor example because signature and encryption are
> wrappers rather than composition. JOSE does not actually encrypt JSON, it
> encrypts an octet stream that may contain JSON which is a different thing
> entirely.
>
> Now that might in part be due to lack of having a schema so it is not
> currently possible to say 'structure X from RFC9666 goes here'.
>
> But what does modularity look like? Is it importing structures defined in
> one spec into another?
>
> There are bad precedents here. QNames in XML exist in the ABNF but using
> them in schemas has some very odd and unexpected effects that we only
> discovered after trying to use them. It seems obvious that if you have
> <prefix:tag prefix2:attr="foo"> then you can have attribute values <tag
> attr="prefix:foo"> but this requires the application to be aware of the
> namespace fiasco that is usually hidden from view in platforms like .NET
> etc.

Yes, QNames are broken, so don't do that.  Think more packages in Java, 
namespaces in C#.  As simple as doing something like:

    int port;
    com.ietf.sip.contact contact;

yields JSON of:

    "port": 25, "contact": "...whatever..."

We don't have to make things complicated!

> So just pointing to a syntactic production in another spec and making use
> of it may not be acceptable because there may be a subtle change in 
> context.
>
> I certainly don't want anything as broken as XML Namespaces in JSON. Nor
> can I see a need for them.

Be assured that XML Namespaces are irrelevant to the discussion.

>> - It doesn't seem to support third-party extensions to a core protocol.
>>  For example, HTTP and SIP are defined in a core RFC but there are then a
>> number of additional RFC that define extensions.  Support for expressing
>> how a new extension can extend an existing external extension would be
>> good.  (Just about every schema language I've seen fails miserably at 
>> this!)
>>
>
> Also true and again this may be the right choice in this case.
>
> But remember than in JSON, every structure can be extended just by adding 
> a
> tag.

Yes.  The question is, how do we exploit that in the schema language?

>> - The plethora of data types are a problem for schema languages.  I've
>> come to the conclusion that the best option is to declare a "microformat"
>> pseudo-type that basically says this is a string encoded in a particular
>> way.  The 'particular way' can then be defined in narrative form as
>> required (80-20 etc).  This avoids having to define the kitchen sink of
>> data types up front.  e.g.:
>>
>
> I disagree, I don't think the schema should address sub-syntax at all
> except for a small number of encodings that are RFCs such as RFC3339
> timestamps, URIs and DNS labels.
>
>
>
>
>> struct Asset
>> {
>>    GPSLocation location;
>> };
>>
>> microformat GPSLocation;
>>    // A GPSLocation is a pair of comma separated floating-point
>>    // numbers representing longitude and latitude.
>>    // e.g. "location": "0.0,51.5"
>>
>
> These can and should be encoded in JSON:
>
> Structure GPSLocation
>    Float X
>    Float Y
>
> Message Foo
>    GPSLocation location
>
> "location" : {"X" 0.0, "Y", 51.5}

I disagree with this.  Dates alone illustrate that there are 'microformats' 
that can result in a much more compact, meaningful and useful format than 
say:

    { "date": 25, "month": 12, "year": 2015, "hour": 12, "min": 0 }

Another location format might be:

    56°14'23.45"N,18°5'16.65"W

Forcing such formats to be JSONified may cause as much error as forcing IEEE 
754 numbers to be decimalised.

So let's learn from the past and recognise that we can't build them all in 
upfront and design our format accordingly.

> Parsing floating point numbers is complicated. I want that complexity to 
> be
> in the JSON parser and nowhere else. Though this example is possibly a 
> sign
> that decimal fractions should be an intrinsic type since we use decimal
> fractions in protocols but floats very rarely.
>
> Perhaps I will add a Decimal type to my schema that would map to a INT64
> structure with a multiplier of 1,000,000,000. That would allow for numbers
> up to 17,000,000,000 with nine digits of decimal precision. They would map
> to JSON decimal fractions.
>
> When people reach for regular expressions it is almost always a sign that
> they are doing something that they should not.
>
>
> This is also a problem for my binary version of JSON (and I suspect CBOR).
> I added the binary float values to JSON-B because changing from binary to
> decimal fractions results in a loss of precision. Here we have a decimal
> fraction being forced to binary. This is a big problem for currency
> transactions and spatial coordinates.
>
> Right now the encoding scores are
>
> Microformat - 10 bytes
> JSON   - 18 bytes
> JSON-C  - 21 bytes
>
> The compressed JSON encoding actually requires more bytes because the only
> encoding for floats is Real64 (8bytes). And despite the number of bits we
> have a precision loss because of a decimal->binary fraction conversion.
>
> Adding a decimal fraction tag allows the number of bytes for each number 
> to
> be reduced to 3 and the locations would only require 8 bytes each. Though
> this is very much a corner case, the text tags are short so we don't get
> any leverage from compression and real GPS coordinates would have far more
> decimal places.
>
> In most real world examples you are going to get far more leverage by
> sticking to only encoding in JSON data model and then finding an efficient
> way to encode JSON rather than inventing ad-hoc microformats that are
> neither JSON nor JSON data model, are going to require a custom parser and
> are not going to compress.

I'm not interested in compression beyond zip and co.  It may sound harsh of 
me, but my feeling is if this or the errors in floating point numbers is 
critical to you, then use something else.  We don't need something that is 
all things, to all people.

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com