Re: [apps-discuss] Concise Binary Object Representation (CBOR) -- support streaming

Carsten Bormann <> Thu, 23 May 2013 07:50 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 598BD21F96DD for <>; Thu, 23 May 2013 00:50:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -106.212
X-Spam-Status: No, score=-106.212 tagged_above=-999 required=5 tests=[AWL=0.037, BAYES_00=-2.599, HELO_EQ_DE=0.35, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 3-wLYZDsIJ7c for <>; Thu, 23 May 2013 00:50:46 -0700 (PDT)
Received: from ( [IPv6:2001:638:708:30c9::12]) by (Postfix) with ESMTP id 21AAB21F96F5 for <>; Thu, 23 May 2013 00:50:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
Received: from ( []) by (8.14.4/8.14.4) with ESMTP id r4N7ocfr023005; Thu, 23 May 2013 09:50:38 +0200 (CEST)
Received: from [] ( []) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 656EC3B29; Thu, 23 May 2013 09:50:38 +0200 (CEST)
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Content-Type: text/plain; charset="iso-8859-1"
From: Carsten Bormann <>
In-Reply-To: <>
Date: Thu, 23 May 2013 09:50:37 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <>
To: "Manger, James H" <>
X-Mailer: Apple Mail (2.1503)
Cc: "" <>
Subject: Re: [apps-discuss] Concise Binary Object Representation (CBOR) -- support streaming
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 23 May 2013 07:50:52 -0000

On May 23, 2013, at 03:49, "Manger, James H" <> wrote:

> sizes have to be known upfront

Indeed.  The old "counted vs. delimited" debate.

JSON is delimited because is HAS TO BE -- counted is a non-starter for text formats (as amply demonstrated by FORTAN Hollerith data).

CBOR is counted because that is the best fit for a binary format.
Fast deserializers really benefit from knowing counts upfront (and that also helps with constrained deserializers).
(The number of things they will have to create, e.g., the number of data items; not the byte size of their representation -- counting bytes here was a mistake RFC 731 and ASN.1 BER shared and that creates much pain in BER.)
With a byte-oriented encoding, counting also happens to require fewer bytes than delimiting for small counts.
So if the serializer is in a position to provide count information in time, we should use it.

CBOR supports streaming by having self-delimited data items -- you can send many of these in sequence for a stream (as in, say, an XMPP-like protocol).

So streaming *of* data items is not a problem with CBOR.
Now let's look at "streaming" *within* a data item.

The actual need for this is a fringe case, but one that is worth looking at.
The only significant use case I know is a state-limited (on-the-fly) converter from a delimited format (such as JSON) to CBOR.
(Unfortunately, that is not just a use case in gateways; using CBOR as a plug-in replacement for JSON may saddle one with an API that makes the CBOR serializer such a converter.)
Putting together an aquarium when being handed fish sticks.

Anything that addresses this specific use case needs to be a hybrid.
(Purely delimited is possible for binary, but tends to be more taxing on the deserializer; it also often requires additional work at the serializer to enable data transparency.)
I tried to avoid designing a hybrid.
Counted is the way things have been moving to in much of CS for the last 10, 15 years.

As you say, delimiters (push/pop, or maybe some form of continuation scheme) can be easily hacked into CBOR, the only victim being its architecture.
Should we?

Grüße, Carsten