Re: [apps-discuss] Concise Binary Object Representation (CBOR)

Phillip Hallam-Baker <hallam@gmail.com> Sun, 26 May 2013 00:40 UTC

MIME-Version: 1.0
In-Reply-To: <CAMm+LwjBWNLPoU+ity+uwY-fNztLspOtfk3HY22OUsXmH+EjJw@mail.gmail.com>
References: <61CB1D18-BABC-4C77-93E6-A9E8CDA8326B@vpnc.org> <CAK3OfOiwE0W=AYCtXh7W1RtrvMC4a1KhNDut=tD1ma+ipRrvHw@mail.gmail.com> <CAMm+LwjBWNLPoU+ity+uwY-fNztLspOtfk3HY22OUsXmH+EjJw@mail.gmail.com>
Date: Sat, 25 May 2013 20:40:31 -0400
Message-ID: <CAMm+LwgavLoTsvAeBXq8jznHLOopqMFwAbmpZ5er0T3P=KygiA@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Nico Williams <nico@cryptonector.com>
Content-Type: multipart/alternative; boundary="f46d043bd77682078e04dd94477d"
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, General discussion of application-layer protocols <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Concise Binary Object Representation (CBOR)
Precedence: list

Just a thought.

JSON-C would meet most if not all the requirements for HTTP/2.0 encoding.

It is easy to parse/emit, it is easy to adapt an existing JSON stack to
emit JSON-C. It is efficient in coding and in space and a Web service can
use the same stack for HTTP framing and for the payload.


On Sat, May 25, 2013 at 12:13 PM, Phillip Hallam-Baker <hallam@gmail.com>wrote:

> On Fri, May 24, 2013 at 11:28 PM, Nico Williams <nico@cryptonector.com>wrote:
>
>> Thinking about what *I* want in a binary JSON encoding, and taking
>> into account PHB's points about online encoding and decoding, all I
>> really want, the one thing I badly want, is counted-bytes and chunked
>> Unicode and octet strings.  That's it.  So picture JSON, complete with
>> square and curly brackets, but no commas nor colons, just strings
>> (byte-counted or chunked), some encoding for numbers, booleans, and
>> null.  It's the handing of scalars that sucks about JSON: string
>> escaping, number printing and parsing.
>>
>> Something like this:
>>
>> {<Unicode string of length 3>foo<Unicode string of length 3>bar<join
>> to preceding string, length 3>baz<Unicode string of length
>> 3>num<integer value 5>}
>>
>> as an encoding of { "foo": "barbaz", "num": 5 }, with "barbaz" chunked.
>>
>> One of the nice things about such an encoding is that it should be
>> possible to implement as a fairly small variation on existing code:
>> it's almost only a different way of encoding scalar types -- the only
>> other difference being that commas and colons are not needed.
>>
>> With a variable-length encoding of integers and IEEE 754 64-bit
>> doubles for reals... that's compact enough.  Not nearly as compact as
>> we could get with schemas and PER-like encodings, but good enough for
>> a schema-less encoding.
>>
>
>
> I like this proposal, more a sort of JSON-B as in JSON Encoding B
>
> The only Con I can think of is that it will still be backwards
> incompatible so why not be more compact? And people might worry about
> JSON-B getting confused with JSON.
>
>
> For my requirements, I think I would still like to be able to binary
> encode floating point numbers. The issue there is the precision loss from
> round tripping and the ability to encode NaN and +/- infinity
>
> But we do have the entire ASCII code set above 128 to play with (actually
> we have more but above 128 is plenty)
>
>
> The result would be slightly less efficient than a true binary JSON but
> not by very much. The tags will still be there.
>
> JSON taged items have an overhead of 4 bytes per entry:  "<tag>":<data>,
>
> (You can add in spaces but they aren't necessary.)
>
> It would not be difficult to reduce those 4 bytes to one. But it isn't a
> big win either. The win comes from not having to Base64 the binary chunks.
>
>
> For purposes of planning tags and making sure there is enough space it is
> probably best to start off expansively and considering all the possible
> types we might recognize
>
> x80    Terminal String 8 bit length
> x81    Terminal String 16 bit length
> x82    Terminal String 32 bit length
> x83    Terminal String 64 bit length
>
> x84    Non Terminal String Chunk 8 bit length
> x85    Non Terminal String Chunk 16 bit length
> x86    Non Terminal String Chunk 32 bit length
> x87    Non Terminal String Chunk 64 bit length
>
> x88    Terminal Binary 8 bit length
> x87    Terminal Binary 16 bit length
> x8A    Terminal Binary 32 bit length
> x8B    Terminal Binary 64 bit length
>
> x8C    Non Terminal Binary Chunk 8 bit length
> x8D    Non Terminal Binary Chunk 16 bit length
> x8E    Non Terminal Binary Chunk 32 bit length
> x8F    Non Terminal Binary Chunk 64 bit length
>
> x90    IEEE 754 Floating Point binary16  (1)
> x91    IEEE 754 Floating Point binary32
> x92    IEEE 754 Floating Point binary64
> x94    IEEE 754 Floating Point binary128  (1)
>
> x96    IEEE 754 Floating Point decimal32 (1)
> x97    IEEE 754 Floating Point decimal64 (1)
> x98    IEEE 754 Floating Point decimal128  (1)
>
> xA0    Unsigned Integer 8 (1)
> xA1    Unsigned Integer 16 (1)
> xA2    Unsigned Integer 32 (1)
> xA3    Unsigned Integer 64 (1)
> xA4    Unsigned Integer 128 (1)
>
> xA5    Signed Integer 8 (1)
> xA6    Signed Integer 16 (1)
> xA7    Signed Integer 32 (1)
> xA8    Signed Integer 64 (1)
> xA9    Signed Integer 128 (1)
>
> xAA    True
> xAB    False
> xAC    Null
>
>
> (1) The need to implement these codes is debatable. But the tag asignment
> scheme should not foreclose the possibility.
>
> That still leaves a block of 72 codes unused. So if people wanted to go
> back and do binary tagging at a later date (JSON-C) that would be possible.
>
> I did try to work out some sort of clever bit mask trick so that the lower
> bits of the code would specify the number of additional bytes to follow but
> that does not work so well as there are 128 bit values to consider. And at
> the end of the day there are only going to be 60 odd code maximum so an
> array works fine.
>
>
> I considered Nico's proposal of a 'continuation block' but that would
> require a reader to read in the start of the next block before it can know
> what to do with the previous data. The writer should know when it is
> starting to write out a chunk that more chunks might follow or not. If no
> more data follows the writer just puts out x80 x00 or x88 x00 to close the
> stream.
>
>
>
> {To follow the rest it might help to look at the diagrams in
> http://www.json.org/]
>
> There are a few tricks that could be used here to further reduce space.
> consider the production "<tag>":<data>,
>
> The initial " is not really needed since the only valid productions
> following an object open brace are the close brace or an element entry. So
> " is only needed if you desperately want to have the possibility of a tag
> '}drop tables'. OK now I get it, leave the initial " in for Randall Munroe
> http://xkcd.com/327/
>
> If the reader sees a code above 7F it can only be binary data so the ":
> separator between the tag and the data are superfluous and could be elided
>
> The binary data descriptions are all defined length and so the terminal ,
> is not needed.
>
>
> So if people wanted to, we could adjust the FSR to allow these
> abbreviations in JSON-B and save the four bytes per object entry overhead.
> But that would still leave the tags there and those can't be eliminated
> without some sort of dictionary, either being passed on the wire (which is
> what compressors are really doing) or out of band (which is what schema
> aware is really about).
>
>
> So for JSON-C we would need ways to define a binding of a tag to a code
> and ways of specifying those codes in object productions.
>
> Keeping the initial " helps us here because it means that the only
> currently valid characters after the initial { are ", } and whitespace.
>
>
> We might just conceivably need more than 64K codes, so the code value is
> potentially a 32 bit space.
>
>
> So the codes I would define are:
>
> C0   8 bit tag code follows
> C1   16 bit tag code follows
> C2   32 bit tag code follows
>
> C4    8 bit definition follows
> C5    16 bit definition follows
> C6    32 bit definition follows
>
> C7    8 bit tag with definition follows
> C8    16 bit tag with definition follows
> C9    32 bit tag with definition follows
>
>
> the codes c4-c9 would be followed with a string definition. So the first
> occurrence of { "foo" : "data" } would become:
>
> x7B               {
> xC7 x01 x80 x03 x66 x6f x6f    "foo":     [Code 1]
> x80 x04 x64 x61 x74 x61          "data"
> x7D               }
>
>
> On the second occurrence the definition is already there:
>
> x7B               {
> xC0 x01  [Code 1]
> x80 x04 x64 x61 x74 x61          "data"
> x7D               }
>
> An implementation could simply dump the dictionary out at the start of the
> message using the C4-C6 codes.
>
>
> --
> Website: http://hallambaker.com/
>



-- 
Website: http://hallambaker.com/

[apps-discuss] Concise Binary Object Representati… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… James M Snell
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… James M Snell
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… James M Snell
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Manger, James H
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Paul E. Jones
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
Re: [apps-discuss] Concise Binary Object Represen… Manger, James H
Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… James M Snell
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
Re: [apps-discuss] Concise Binary Object Represen… Joe Hildebrand (jhildebr)
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Cullen Jennings (fluffy)
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
Re: [apps-discuss] Concise Binary Object Represen… Cullen Jennings (fluffy)
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker