Re: [Json] Nudging the English-language vs. formalisms discussion forward

Phillip Hallam-Baker <hallam@gmail.com> Thu, 20 February 2014 14:36 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: json@ietfa.amsl.com
Delivered-To: json@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CAF1F1A016F for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 06:36:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.167
X-Spam-Level:
X-Spam-Status: No, score=0.167 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FF_IHOPE_YOU_SINK=2.166, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZYp7Nx17dj2z for <json@ietfa.amsl.com>; Thu, 20 Feb 2014 06:36:22 -0800 (PST)
Received: from mail-la0-x235.google.com (mail-la0-x235.google.com [IPv6:2a00:1450:4010:c03::235]) by ietfa.amsl.com (Postfix) with ESMTP id 212791A0199 for <json@ietf.org>; Thu, 20 Feb 2014 06:36:21 -0800 (PST)
Received: by mail-la0-f53.google.com with SMTP id e16so1389938lan.12 for <json@ietf.org>; Thu, 20 Feb 2014 06:36:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=c2Q+VjgGyAvNnEYk1I4HzWSNFmBtWefPXSptP+vhmBE=; b=RZgDzS7X+hAV2dH6BwcNCIsWrjhbOwkE0ztfFEyPiN4UbvbBg0nDjt6d8ELgbSJx74 sa2D9PMPCS3VE3bwyaDSAc3PVNNt2Sk8ScCIrMEHDYhrMFlZYKlJyc+m7c4U8dcQEbD2 7kDNaQXQQq2sO4eeBZrWPDa9CM4xr1/ePepZ5LwhQm0Jsp3EyoWS984gXgBMsA1m1UaO K5UgR3RkF/axu2GOZPX/gU7CU17ecl1Pq4RqwWx2xaAO6WcLuPUD9obHcbd7lXIC/eoW mvJGZmpD6/8YKv3k41qDNlDRLnLd3V8CgqBjc9edFb1EMWlu8/TfqcCSu0cy+CYX//xL 9AwA==
MIME-Version: 1.0
X-Received: by 10.112.39.167 with SMTP id q7mr1278957lbk.82.1392906977704; Thu, 20 Feb 2014 06:36:17 -0800 (PST)
Received: by 10.112.37.168 with HTTP; Thu, 20 Feb 2014 06:36:17 -0800 (PST)
In-Reply-To: <47BB9131737D42218A6382DEF45BBE2C@codalogic>
References: <C87F9B96-E028-4F0E-A950-B39D3F68FFE7@vpnc.org> <CAMm+LwhUh_yN-hzaoDWfrO_H2iGvYvj99BCE4EcYmgqCPqXoVQ@mail.gmail.com> <CAHBU6itpttXBfVQGKw=u==k_XSdrht81+m_YDNZP6RM+=9CNow@mail.gmail.com> <CAK3OfOjHkBFOzJSx=bhhoQJ8Z2bWyEXK52dNyYGWVb9FAj99ow@mail.gmail.com> <CAHBU6itzQ0rzU3EUYUqzm2qhx03qk1mpx2sehS_zeiw1ypcEgw@mail.gmail.com> <CAK3OfOhfjkbq6eREkt=MBVL1C9ubh-6My3Lvg-mnOxD0+cpN1Q@mail.gmail.com> <CAHBU6isZbew8O1HJ+XcFsMCR42iDoO_uemPXVwa3=vM5A=MngA@mail.gmail.com> <CAK3OfOgmVsNJqrqCfsD7h37axssOoaX3DGHqO=bTn5bWrA+MFA@mail.gmail.com> <A4B53816-6FBF-4A37-8BC9-F0A9D0867BCD@tzi.org> <357740A8AA0F4316BE630917321FAB4D@codalogic> <B1EBE05A69362F001777F807@cyrus.local> <47BB9131737D42218A6382DEF45BBE2C@codalogic>
Date: Thu, 20 Feb 2014 09:36:17 -0500
Message-ID: <CAMm+LwgmHjoLu2=zTOERN8LO74hWpp45yy2epd2JzqDRM9oFfg@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Pete Cordell <petejson@codalogic.com>
Content-Type: multipart/alternative; boundary="001a1134d7ae9c7e7b04f2d76dfa"
Archived-At: http://mailarchive.ietf.org/arch/msg/json/FiGcMBQSDM6U4oE0xGAthau2Opg
Cc: Cyrus Daboo <cyrus@daboo.name>, Carsten Bormann <cabo@tzi.org>, JSON WG <json@ietf.org>
Subject: Re: [Json] Nudging the English-language vs. formalisms discussion forward
X-BeenThere: json@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "JavaScript Object Notation \(JSON\) WG mailing list" <json.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/json>, <mailto:json-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/json/>
List-Post: <mailto:json@ietf.org>
List-Help: <mailto:json-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/json>, <mailto:json-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Feb 2014 14:36:26 -0000

On Thu, Feb 20, 2014 at 8:01 AM, Pete Cordell <petejson@codalogic.com>wrote:

> ----- Original Message From: "Cyrus Daboo"
>
>  Please take a look at Andrew Newton's spec (<https://datatracker.ietf.
>> org/doc/draft-newton-json-content-rules/>).
>>
>
> Thanks Cyrus.  It certainly looks like another option.  I would suggest
> that the authors could look at the following to improve it:
>
> - It doesn't seem to have much in the way of supporting modularity.  It
> would be attractive if a schema could pull in and use definitions defined
> in other documents.  Many IETF protocols consist of suites of components
> and support for that would be handy.
>

That is probably the right decision at this point.

IETF has not decided how to make JSON protocols composable. We have JOSE
but that may be a poor example because signature and encryption are
wrappers rather than composition. JOSE does not actually encrypt JSON, it
encrypts an octet stream that may contain JSON which is a different thing
entirely.

Now that might in part be due to lack of having a schema so it is not
currently possible to say 'structure X from RFC9666 goes here'.

But what does modularity look like? Is it importing structures defined in
one spec into another?

There are bad precedents here. QNames in XML exist in the ABNF but using
them in schemas has some very odd and unexpected effects that we only
discovered after trying to use them. It seems obvious that if you have
<prefix:tag prefix2:attr="foo"> then you can have attribute values <tag
attr="prefix:foo"> but this requires the application to be aware of the
namespace fiasco that is usually hidden from view in platforms like .NET
etc.

So just pointing to a syntactic production in another spec and making use
of it may not be acceptable because there may be a subtle change in context.

I certainly don't want anything as broken as XML Namespaces in JSON. Nor
can I see a need for them.



> - It doesn't seem to support third-party extensions to a core protocol.
>  For example, HTTP and SIP are defined in a core RFC but there are then a
> number of additional RFC that define extensions.  Support for expressing
> how a new extension can extend an existing external extension would be
> good.  (Just about every schema language I've seen fails miserably at this!)
>

Also true and again this may be the right choice in this case.

But remember than in JSON, every structure can be extended just by adding a
tag.



> - The plethora of data types are a problem for schema languages.  I've
> come to the conclusion that the best option is to declare a "microformat"
> pseudo-type that basically says this is a string encoded in a particular
> way.  The 'particular way' can then be defined in narrative form as
> required (80-20 etc).  This avoids having to define the kitchen sink of
> data types up front.  e.g.:
>

I disagree, I don't think the schema should address sub-syntax at all
except for a small number of encodings that are RFCs such as RFC3339
timestamps, URIs and DNS labels.




> struct Asset
> {
>    GPSLocation location;
> };
>
> microformat GPSLocation;
>    // A GPSLocation is a pair of comma separated floating-point
>    // numbers representing longitude and latitude.
>    // e.g. "location": "0.0,51.5"
>

These can and should be encoded in JSON:

Structure GPSLocation
    Float X
    Float Y

Message Foo
    GPSLocation location

"location" : {"X" 0.0, "Y", 51.5}

Parsing floating point numbers is complicated. I want that complexity to be
in the JSON parser and nowhere else. Though this example is possibly a sign
that decimal fractions should be an intrinsic type since we use decimal
fractions in protocols but floats very rarely.

Perhaps I will add a Decimal type to my schema that would map to a INT64
structure with a multiplier of 1,000,000,000. That would allow for numbers
up to 17,000,000,000 with nine digits of decimal precision. They would map
to JSON decimal fractions.

When people reach for regular expressions it is almost always a sign that
they are doing something that they should not.


This is also a problem for my binary version of JSON (and I suspect CBOR).
I added the binary float values to JSON-B because changing from binary to
decimal fractions results in a loss of precision. Here we have a decimal
fraction being forced to binary. This is a big problem for currency
transactions and spatial coordinates.

Right now the encoding scores are

Microformat - 10 bytes
JSON   - 18 bytes
JSON-C  - 21 bytes

The compressed JSON encoding actually requires more bytes because the only
encoding for floats is Real64 (8bytes). And despite the number of bits we
have a precision loss because of a decimal->binary fraction conversion.

Adding a decimal fraction tag allows the number of bytes for each number to
be reduced to 3 and the locations would only require 8 bytes each. Though
this is very much a corner case, the text tags are short so we don't get
any leverage from compression and real GPS coordinates would have far more
decimal places.

In most real world examples you are going to get far more leverage by
sticking to only encoding in JSON data model and then finding an efficient
way to encode JSON rather than inventing ad-hoc microformats that are
neither JSON nor JSON data model, are going to require a custom parser and
are not going to compress.


-- 
Website: http://hallambaker.com/