Re: [apps-discuss] Concise Binary Object Representation (CBOR)
Phillip Hallam-Baker <hallam@gmail.com> Sat, 25 May 2013 16:13 UTC
Return-Path: <hallam@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EE7A21F8B60 for <apps-discuss@ietfa.amsl.com>; Sat, 25 May 2013 09:13:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.016
X-Spam-Level:
X-Spam-Status: No, score=-1.016 tagged_above=-999 required=5 tests=[AWL=-1.583, BAYES_00=-2.599, FF_IHOPE_YOU_SINK=2.166, HTML_MESSAGE=0.001, J_BACKHAIR_33=1, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2DSHxuY+idvq for <apps-discuss@ietfa.amsl.com>; Sat, 25 May 2013 09:13:16 -0700 (PDT)
Received: from mail-we0-x236.google.com (mail-we0-x236.google.com [IPv6:2a00:1450:400c:c03::236]) by ietfa.amsl.com (Postfix) with ESMTP id 6300A21F8A6B for <apps-discuss@ietf.org>; Sat, 25 May 2013 09:13:15 -0700 (PDT)
Received: by mail-we0-f182.google.com with SMTP id q57so3123796wes.13 for <apps-discuss@ietf.org>; Sat, 25 May 2013 09:13:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=5DtW2hArUQDqHB7rMmTxqYnFcjJyGA9DFbhRtT1l08c=; b=Mx4O1RGgk2+c1myI9j5w0UxXOsxmwHRjWO0OupyRlnJOLWhtNSPGSzoJkgxoeXSjq3 9aIQsN66o6MdKcijPZyD9eLwqPhAIVmL2a4xp22nF2qXRM/uklxV9Dfdx8xtomZ9QWKV un29feAyoIFjD0ett17gVj58kx0CZLQlgUqL6u4kJ+KzaujnCO6hlFLS/mz3zvUY9xPx YNvZxiKwtpqzC9OdGJK8W2mb20Pi9edL73byUSMPqCmUhzq1bQz4cGu2XRn4HiydId4A rMb79mWkZ2GhqUhK+XxZtB4g69WFkf7h/3G3zmmXL3PuGRe1Tj6656nOn/RTmtLm1/l0 U1dQ==
MIME-Version: 1.0
X-Received: by 10.180.205.200 with SMTP id li8mr2745939wic.15.1369498394453; Sat, 25 May 2013 09:13:14 -0700 (PDT)
Received: by 10.194.123.225 with HTTP; Sat, 25 May 2013 09:13:14 -0700 (PDT)
In-Reply-To: <CAK3OfOiwE0W=AYCtXh7W1RtrvMC4a1KhNDut=tD1ma+ipRrvHw@mail.gmail.com>
References: <61CB1D18-BABC-4C77-93E6-A9E8CDA8326B@vpnc.org> <CAK3OfOiwE0W=AYCtXh7W1RtrvMC4a1KhNDut=tD1ma+ipRrvHw@mail.gmail.com>
Date: Sat, 25 May 2013 12:13:14 -0400
Message-ID: <CAMm+LwjBWNLPoU+ity+uwY-fNztLspOtfk3HY22OUsXmH+EjJw@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Nico Williams <nico@cryptonector.com>
Content-Type: multipart/alternative; boundary="001a11c382c6526d9904dd8d31cc"
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, General discussion of application-layer protocols <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Concise Binary Object Representation (CBOR)
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 25 May 2013 16:13:17 -0000
On Fri, May 24, 2013 at 11:28 PM, Nico Williams <nico@cryptonector.com>wrote: > Thinking about what *I* want in a binary JSON encoding, and taking > into account PHB's points about online encoding and decoding, all I > really want, the one thing I badly want, is counted-bytes and chunked > Unicode and octet strings. That's it. So picture JSON, complete with > square and curly brackets, but no commas nor colons, just strings > (byte-counted or chunked), some encoding for numbers, booleans, and > null. It's the handing of scalars that sucks about JSON: string > escaping, number printing and parsing. > > Something like this: > > {<Unicode string of length 3>foo<Unicode string of length 3>bar<join > to preceding string, length 3>baz<Unicode string of length > 3>num<integer value 5>} > > as an encoding of { "foo": "barbaz", "num": 5 }, with "barbaz" chunked. > > One of the nice things about such an encoding is that it should be > possible to implement as a fairly small variation on existing code: > it's almost only a different way of encoding scalar types -- the only > other difference being that commas and colons are not needed. > > With a variable-length encoding of integers and IEEE 754 64-bit > doubles for reals... that's compact enough. Not nearly as compact as > we could get with schemas and PER-like encodings, but good enough for > a schema-less encoding. > I like this proposal, more a sort of JSON-B as in JSON Encoding B The only Con I can think of is that it will still be backwards incompatible so why not be more compact? And people might worry about JSON-B getting confused with JSON. For my requirements, I think I would still like to be able to binary encode floating point numbers. The issue there is the precision loss from round tripping and the ability to encode NaN and +/- infinity But we do have the entire ASCII code set above 128 to play with (actually we have more but above 128 is plenty) The result would be slightly less efficient than a true binary JSON but not by very much. The tags will still be there. JSON taged items have an overhead of 4 bytes per entry: "<tag>":<data>, (You can add in spaces but they aren't necessary.) It would not be difficult to reduce those 4 bytes to one. But it isn't a big win either. The win comes from not having to Base64 the binary chunks. For purposes of planning tags and making sure there is enough space it is probably best to start off expansively and considering all the possible types we might recognize x80 Terminal String 8 bit length x81 Terminal String 16 bit length x82 Terminal String 32 bit length x83 Terminal String 64 bit length x84 Non Terminal String Chunk 8 bit length x85 Non Terminal String Chunk 16 bit length x86 Non Terminal String Chunk 32 bit length x87 Non Terminal String Chunk 64 bit length x88 Terminal Binary 8 bit length x87 Terminal Binary 16 bit length x8A Terminal Binary 32 bit length x8B Terminal Binary 64 bit length x8C Non Terminal Binary Chunk 8 bit length x8D Non Terminal Binary Chunk 16 bit length x8E Non Terminal Binary Chunk 32 bit length x8F Non Terminal Binary Chunk 64 bit length x90 IEEE 754 Floating Point binary16 (1) x91 IEEE 754 Floating Point binary32 x92 IEEE 754 Floating Point binary64 x94 IEEE 754 Floating Point binary128 (1) x96 IEEE 754 Floating Point decimal32 (1) x97 IEEE 754 Floating Point decimal64 (1) x98 IEEE 754 Floating Point decimal128 (1) xA0 Unsigned Integer 8 (1) xA1 Unsigned Integer 16 (1) xA2 Unsigned Integer 32 (1) xA3 Unsigned Integer 64 (1) xA4 Unsigned Integer 128 (1) xA5 Signed Integer 8 (1) xA6 Signed Integer 16 (1) xA7 Signed Integer 32 (1) xA8 Signed Integer 64 (1) xA9 Signed Integer 128 (1) xAA True xAB False xAC Null (1) The need to implement these codes is debatable. But the tag asignment scheme should not foreclose the possibility. That still leaves a block of 72 codes unused. So if people wanted to go back and do binary tagging at a later date (JSON-C) that would be possible. I did try to work out some sort of clever bit mask trick so that the lower bits of the code would specify the number of additional bytes to follow but that does not work so well as there are 128 bit values to consider. And at the end of the day there are only going to be 60 odd code maximum so an array works fine. I considered Nico's proposal of a 'continuation block' but that would require a reader to read in the start of the next block before it can know what to do with the previous data. The writer should know when it is starting to write out a chunk that more chunks might follow or not. If no more data follows the writer just puts out x80 x00 or x88 x00 to close the stream. {To follow the rest it might help to look at the diagrams in http://www.json.org/] There are a few tricks that could be used here to further reduce space. consider the production "<tag>":<data>, The initial " is not really needed since the only valid productions following an object open brace are the close brace or an element entry. So " is only needed if you desperately want to have the possibility of a tag '}drop tables'. OK now I get it, leave the initial " in for Randall Munroe http://xkcd.com/327/ If the reader sees a code above 7F it can only be binary data so the ": separator between the tag and the data are superfluous and could be elided The binary data descriptions are all defined length and so the terminal , is not needed. So if people wanted to, we could adjust the FSR to allow these abbreviations in JSON-B and save the four bytes per object entry overhead. But that would still leave the tags there and those can't be eliminated without some sort of dictionary, either being passed on the wire (which is what compressors are really doing) or out of band (which is what schema aware is really about). So for JSON-C we would need ways to define a binding of a tag to a code and ways of specifying those codes in object productions. Keeping the initial " helps us here because it means that the only currently valid characters after the initial { are ", } and whitespace. We might just conceivably need more than 64K codes, so the code value is potentially a 32 bit space. So the codes I would define are: C0 8 bit tag code follows C1 16 bit tag code follows C2 32 bit tag code follows C4 8 bit definition follows C5 16 bit definition follows C6 32 bit definition follows C7 8 bit tag with definition follows C8 16 bit tag with definition follows C9 32 bit tag with definition follows the codes c4-c9 would be followed with a string definition. So the first occurrence of { "foo" : "data" } would become: x7B { xC7 x01 x80 x03 x66 x6f x6f "foo": [Code 1] x80 x04 x64 x61 x74 x61 "data" x7D } On the second occurrence the definition is already there: x7B { xC0 x01 [Code 1] x80 x04 x64 x61 x74 x61 "data" x7D } An implementation could simply dump the dictionary out at the start of the message using the C4-C6 codes. -- Website: http://hallambaker.com/
- [apps-discuss] Concise Binary Object Representati… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… James M Snell
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… James M Snell
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
- Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
- Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
- Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… James M Snell
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Manger, James H
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Paul E. Jones
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
- Re: [apps-discuss] Concise Binary Object Represen… Manger, James H
- Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
- Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
- Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
- Re: [apps-discuss] Concise Binary Object Represen… Dave Cridland
- Re: [apps-discuss] Concise Binary Object Represen… Dave Crocker
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Tony Finch
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… James M Snell
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Paul Hoffman
- Re: [apps-discuss] Concise Binary Object Represen… Joe Hildebrand (jhildebr)
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Cullen Jennings (fluffy)
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Nico Williams
- Re: [apps-discuss] Concise Binary Object Represen… Cullen Jennings (fluffy)
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker
- Re: [apps-discuss] Concise Binary Object Represen… Carsten Bormann
- Re: [apps-discuss] Concise Binary Object Represen… Phillip Hallam-Baker