Re: [Json] Nudging the English-language vs. formalisms discussion forward

Phillip Hallam-Baker <hallam@gmail.com> Thu, 20 February 2014 14:36 UTC

MIME-Version: 1.0
In-Reply-To: <47BB9131737D42218A6382DEF45BBE2C@codalogic>
References: <C87F9B96-E028-4F0E-A950-B39D3F68FFE7@vpnc.org> <CAMm+LwhUh_yN-hzaoDWfrO_H2iGvYvj99BCE4EcYmgqCPqXoVQ@mail.gmail.com> <CAHBU6itpttXBfVQGKw=u==k_XSdrht81+m_YDNZP6RM+=9CNow@mail.gmail.com> <CAK3OfOjHkBFOzJSx=bhhoQJ8Z2bWyEXK52dNyYGWVb9FAj99ow@mail.gmail.com> <CAHBU6itzQ0rzU3EUYUqzm2qhx03qk1mpx2sehS_zeiw1ypcEgw@mail.gmail.com> <CAK3OfOhfjkbq6eREkt=MBVL1C9ubh-6My3Lvg-mnOxD0+cpN1Q@mail.gmail.com> <CAHBU6isZbew8O1HJ+XcFsMCR42iDoO_uemPXVwa3=vM5A=MngA@mail.gmail.com> <CAK3OfOgmVsNJqrqCfsD7h37axssOoaX3DGHqO=bTn5bWrA+MFA@mail.gmail.com> <A4B53816-6FBF-4A37-8BC9-F0A9D0867BCD@tzi.org> <357740A8AA0F4316BE630917321FAB4D@codalogic> <B1EBE05A69362F001777F807@cyrus.local> <47BB9131737D42218A6382DEF45BBE2C@codalogic>
Date: Thu, 20 Feb 2014 09:36:17 -0500
Message-ID: <CAMm+LwgmHjoLu2=zTOERN8LO74hWpp45yy2epd2JzqDRM9oFfg@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Pete Cordell <petejson@codalogic.com>
Content-Type: multipart/alternative; boundary="001a1134d7ae9c7e7b04f2d76dfa"
Archived-At: http://mailarchive.ietf.org/arch/msg/json/FiGcMBQSDM6U4oE0xGAthau2Opg
Cc: Cyrus Daboo <cyrus@daboo.name>, Carsten Bormann <cabo@tzi.org>, JSON WG <json@ietf.org>
Subject: Re: [Json] Nudging the English-language vs. formalisms discussion forward
Precedence: list

On Thu, Feb 20, 2014 at 8:01 AM, Pete Cordell <petejson@codalogic.com>wrote:

> ----- Original Message From: "Cyrus Daboo"
>
>  Please take a look at Andrew Newton's spec (<https://datatracker.ietf.
>> org/doc/draft-newton-json-content-rules/>).
>>
>
> Thanks Cyrus.  It certainly looks like another option.  I would suggest
> that the authors could look at the following to improve it:
>
> - It doesn't seem to have much in the way of supporting modularity.  It
> would be attractive if a schema could pull in and use definitions defined
> in other documents.  Many IETF protocols consist of suites of components
> and support for that would be handy.
>

That is probably the right decision at this point.

IETF has not decided how to make JSON protocols composable. We have JOSE
but that may be a poor example because signature and encryption are
wrappers rather than composition. JOSE does not actually encrypt JSON, it
encrypts an octet stream that may contain JSON which is a different thing
entirely.

Now that might in part be due to lack of having a schema so it is not
currently possible to say 'structure X from RFC9666 goes here'.

But what does modularity look like? Is it importing structures defined in
one spec into another?

There are bad precedents here. QNames in XML exist in the ABNF but using
them in schemas has some very odd and unexpected effects that we only
discovered after trying to use them. It seems obvious that if you have
<prefix:tag prefix2:attr="foo"> then you can have attribute values <tag
attr="prefix:foo"> but this requires the application to be aware of the
namespace fiasco that is usually hidden from view in platforms like .NET
etc.

So just pointing to a syntactic production in another spec and making use
of it may not be acceptable because there may be a subtle change in context.

I certainly don't want anything as broken as XML Namespaces in JSON. Nor
can I see a need for them.

> - It doesn't seem to support third-party extensions to a core protocol.
>  For example, HTTP and SIP are defined in a core RFC but there are then a
> number of additional RFC that define extensions.  Support for expressing
> how a new extension can extend an existing external extension would be
> good.  (Just about every schema language I've seen fails miserably at this!)
>

Also true and again this may be the right choice in this case.

But remember than in JSON, every structure can be extended just by adding a
tag.

> - The plethora of data types are a problem for schema languages.  I've
> come to the conclusion that the best option is to declare a "microformat"
> pseudo-type that basically says this is a string encoded in a particular
> way.  The 'particular way' can then be defined in narrative form as
> required (80-20 etc).  This avoids having to define the kitchen sink of
> data types up front.  e.g.:
>

I disagree, I don't think the schema should address sub-syntax at all
except for a small number of encodings that are RFCs such as RFC3339
timestamps, URIs and DNS labels.

> struct Asset
> {
>    GPSLocation location;
> };
>
> microformat GPSLocation;
>    // A GPSLocation is a pair of comma separated floating-point
>    // numbers representing longitude and latitude.
>    // e.g. "location": "0.0,51.5"
>

These can and should be encoded in JSON:

Structure GPSLocation
    Float X
    Float Y

Message Foo
    GPSLocation location

"location" : {"X" 0.0, "Y", 51.5}

Parsing floating point numbers is complicated. I want that complexity to be
in the JSON parser and nowhere else. Though this example is possibly a sign
that decimal fractions should be an intrinsic type since we use decimal
fractions in protocols but floats very rarely.

Perhaps I will add a Decimal type to my schema that would map to a INT64
structure with a multiplier of 1,000,000,000. That would allow for numbers
up to 17,000,000,000 with nine digits of decimal precision. They would map
to JSON decimal fractions.

When people reach for regular expressions it is almost always a sign that
they are doing something that they should not.

This is also a problem for my binary version of JSON (and I suspect CBOR).
I added the binary float values to JSON-B because changing from binary to
decimal fractions results in a loss of precision. Here we have a decimal
fraction being forced to binary. This is a big problem for currency
transactions and spatial coordinates.

Right now the encoding scores are

Microformat - 10 bytes
JSON   - 18 bytes
JSON-C  - 21 bytes

The compressed JSON encoding actually requires more bytes because the only
encoding for floats is Real64 (8bytes). And despite the number of bits we
have a precision loss because of a decimal->binary fraction conversion.

Adding a decimal fraction tag allows the number of bytes for each number to
be reduced to 3 and the locations would only require 8 bytes each. Though
this is very much a corner case, the text tags are short so we don't get
any leverage from compression and real GPS coordinates would have far more
decimal places.

In most real world examples you are going to get far more leverage by
sticking to only encoding in JSON data model and then finding an efficient
way to encode JSON rather than inventing ad-hoc microformats that are
neither JSON nor JSON data model, are going to require a custom parser and
are not going to compress.

-- 
Website: http://hallambaker.com/

[Json] Nudging the English-language vs. formalism… Paul Hoffman
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Paul Hoffman
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
[Json] Nudging the English-language vs. formalism… Paul Hoffman
Re: [Json] Nudging the English-language vs. forma… Tim Bray
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… Tim Bray
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… John Cowan
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… Tim Bray
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Tim Bray
Re: [Json] Nudging the English-language vs. forma… John Cowan
Re: [Json] Nudging the English-language vs. forma… Tim Bray
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Pete Cordell
Re: [Json] Nudging the English-language vs. forma… Pete Cordell
Re: [Json] Nudging the English-language vs. forma… Tim Bray
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… Tatu Saloranta
Re: [Json] Nudging the English-language vs. forma… Carsten Bormann
Re: [Json] Nudging the English-language vs. forma… John Cowan
Re: [Json] Nudging the English-language vs. forma… John Cowan
Re: [Json] Nudging the English-language vs. forma… Barry Leiba
Re: [Json] Nudging the English-language vs. forma… Mark Nottingham
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… Andrew Newton
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… Pete Cordell
Re: [Json] Nudging the English-language vs. forma… Barry Leiba
Re: [Json] Nudging the English-language vs. forma… Bjoern Hoehrmann
Re: [Json] Nudging the English-language vs. forma… John Cowan
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Manger, James
Re: [Json] Nudging the English-language vs. forma… Tim Bray
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… Mark Nottingham
Re: [Json] Nudging the English-language vs. forma… Nico Williams
Re: [Json] Nudging the English-language vs. forma… Cyrus Daboo
Re: [Json] Nudging the English-language vs. forma… Andrew Newton
Re: [Json] Nudging the English-language vs. forma… Paul Hoffman
Re: [Json] Nudging the English-language vs. forma… Pete Cordell
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
Re: [Json] Nudging the English-language vs. forma… John Cowan
Re: [Json] Nudging the English-language vs. forma… Pete Cordell
Re: [Json] Nudging the English-language vs. forma… Phillip Hallam-Baker
[Json] Schema Requirements (Was: Re: Nudging the … Pete Cordell
Re: [Json] Schema Requirements (Was: Re: Nudging … Phillip Hallam-Baker
Re: [Json] Schema Requirements (Was: Re: Nudging … Nico Williams
Re: [Json] Schema Requirements (Was: Re: Nudging … Nico Williams
Re: [Json] Schema Requirements (Was: Re: Nudging … Phillip Hallam-Baker
Re: [Json] Schema Requirements (Was: Re: Nudging … Nico Williams
Re: [Json] Schema Requirements (Was: Re: Nudging … Pete Cordell
Re: [Json] Schema Requirements (Was: Re: Nudging … Phillip Hallam-Baker
Re: [Json] Schema Requirements (Was: Re: Nudging … Pete Cordell
Re: [Json] Schema Requirements (Was: Re: Nudging … Nico Williams
Re: [Json] Schema Requirements (Was: Re: Nudging … Pete Cordell
Re: [Json] Schema Requirements (Was: Re: Nudging … Nico Williams
Re: [Json] Schema Requirements (Was: Re: Nudging … Pete Cordell