Re: [apps-discuss] Concise Binary Object Representation (CBOR) -- support streaming

"Manger, James H" <James.H.Manger@team.telstra.com> Thu, 23 May 2013 13:15 UTC

Return-Path: <James.H.Manger@team.telstra.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 94DCD21F9128 for <apps-discuss@ietfa.amsl.com>; Thu, 23 May 2013 06:15:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.469
X-Spam-Level:
X-Spam-Status: No, score=-0.469 tagged_above=-999 required=5 tests=[AWL=0.432, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, RELAY_IS_203=0.994]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qY+gEpV2yw6V for <apps-discuss@ietfa.amsl.com>; Thu, 23 May 2013 06:15:27 -0700 (PDT)
Received: from ipxbno.tcif.telstra.com.au (ipxbno.tcif.telstra.com.au [203.35.82.204]) by ietfa.amsl.com (Postfix) with ESMTP id E13B821F90CC for <apps-discuss@ietf.org>; Thu, 23 May 2013 06:15:25 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.87,728,1363093200"; d="scan'208";a="130555113"
Received: from unknown (HELO ipccni.tcif.telstra.com.au) ([10.97.216.208]) by ipobni.tcif.telstra.com.au with ESMTP; 23 May 2013 23:15:23 +1000
X-IronPort-AV: E=McAfee;i="5400,1158,7083"; a="138919679"
Received: from wsmsg3751.srv.dir.telstra.com ([172.49.40.172]) by ipccni.tcif.telstra.com.au with ESMTP; 23 May 2013 23:15:23 +1000
Received: from WSMSG3153V.srv.dir.telstra.com ([172.49.40.159]) by WSMSG3751.srv.dir.telstra.com ([172.49.40.172]) with mapi; Thu, 23 May 2013 23:15:23 +1000
From: "Manger, James H" <James.H.Manger@team.telstra.com>
To: Carsten Bormann <cabo@tzi.org>
Date: Thu, 23 May 2013 23:15:21 +1000
Thread-Topic: [apps-discuss] Concise Binary Object Representation (CBOR) -- support streaming
Thread-Index: Ac5XikCVrFpuveo0R0GP57EvNz4omAAH/Z1g
Message-ID: <255B9BB34FB7D647A506DC292726F6E1151A8AD4C1@WSMSG3153V.srv.dir.telstra.com>
References: <61CB1D18-BABC-4C77-93E6-A9E8CDA8326B@vpnc.org> <255B9BB34FB7D647A506DC292726F6E1151A8ACB34@WSMSG3153V.srv.dir.telstra.com> <964FB690-4904-4BC4-99A8-AAB57B21FB52@tzi.org>
In-Reply-To: <964FB690-4904-4BC4-99A8-AAB57B21FB52@tzi.org>
Accept-Language: en-US, en-AU
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US, en-AU
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Concise Binary Object Representation (CBOR) -- support streaming
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 May 2013 13:15:33 -0000

> > sizes have to be known upfront
> 
> Indeed.  The old "counted vs. delimited" debate.
> 
> JSON is delimited because is HAS TO BE -- counted is a non-starter for
> text formats (as amply demonstrated by FORTAN Hollerith data).
> 
> CBOR is counted because that is the best fit for a binary format.
> Fast deserializers really benefit from knowing counts upfront (and that
> also helps with constrained deserializers).

Is this because the deserializer can allocate exactly the right memory for an array (of pointers to elements), instead of having to guess then resize as necessary?

> (The number of things they will have to create, e.g., the number of
> data items; not the byte size of their representation -- counting bytes
> here was a mistake RFC 731 and ASN.1 BER shared and that creates much
> pain in BER.)

Ok. I can see than an item count can avoid *some* of the difficulties with a count of encoded bytes.

> With a byte-oriented encoding, counting also happens to require fewer
> bytes than delimiting for small counts.

Save 1 byte per map and array that has fewer than 28 items. Sounds nice, but not that compelling.

> So if the serializer is in a position to provide count information in
> time, we should use it.
> 
> CBOR supports streaming by having self-delimited data items -- you can
> send many of these in sequence for a stream (as in, say, an XMPP-like
> protocol).
> 
> So streaming *of* data items is not a problem with CBOR.

Supporting streaming -- but only at the top layer of a protocol -- doesn't sound good. If it is useful at a "top-layer" surely it is useful at other layers as well. Protocols are often embedded in other protocols. Once you have something that relies on a stream *of* data items, and you want to embed it in another layer you have to stop using CBOR. That's bad.

> Now let's look at "streaming" *within* a data item.
> 
> The actual need for this is a fringe case, but one that is worth
> looking at.
> The only significant use case I know is a state-limited (on-the-fly)
> converter from a delimited format (such as JSON) to CBOR.
> (Unfortunately, that is not just a use case in gateways; using CBOR as
> a plug-in replacement for JSON may saddle one with an API that makes
> the CBOR serializer such a converter.)
> Putting together an aquarium when being handed fish sticks.

This doesn't feel like a "fringe case". Converting from JSON is one of the listed objectives (#6). The draft has a section on how to do it.

It is generally only for potentially large things that you need to bother with something like CBOR for efficiency, instead of staying with JSON. It is these large things that are going to have unknown sizes at the start.

Plenty of languages pass around iterators, instead of arrays or maps, because they are a useful abstraction that covers more situations -- but they don't give you a size up front.

> Anything that addresses this specific use case needs to be a hybrid.
> (Purely delimited is possible for binary, but tends to be more taxing
> on the deserializer; it also often requires additional work at the
> serializer to enable data transparency.)

What does "enable data transparency" mean?

> I tried to avoid designing a hybrid.
> Counted is the way things have been moving to in much of CS for the
> last 10, 15 years.
> 
> As you say, delimiters (push/pop, or maybe some form of continuation
> scheme) can be easily hacked into CBOR, the only victim being its
> architecture.
> Should we?

Yes.

I'm not convinced that situations where you don't know the size of a byte array, text, array, or map up front are so rare that CBOR can ignore them.

I am not familiar enough with squeezing performance from a deserializer to judge whether a hybrid is worth the complexity cost over just delimiters for maps & arrays.

--
James Manger