[core] SenML JSON syntax and collection+senml+json

Christian Amsüss <c.amsuess@energyharvesting.at> Tue, 17 November 2015 22:45 UTC

Return-Path: <c.amsuess@energyharvesting.at>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30B311B3496 for <core@ietfa.amsl.com>; Tue, 17 Nov 2015 14:45:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.1
X-Spam-Level: *
X-Spam-Status: No, score=1.1 tagged_above=-999 required=5 tests=[BAYES_50=0.8, MIME_8BIT_HEADER=0.3] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VL5cL8MZYaj8 for <core@ietfa.amsl.com>; Tue, 17 Nov 2015 14:45:11 -0800 (PST)
Received: from prometheus.amsuess.com (prometheus.amsuess.com [IPv6:2a01:4f8:190:3064::2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 992561B3494 for <core@ietf.org>; Tue, 17 Nov 2015 14:45:10 -0800 (PST)
Received: from poseidon-mailhub.amsuess.com (unknown [IPv6:2a02:b18:c13b:8001:a800:ff:fede:b1bd]) by prometheus.amsuess.com (Postfix) with ESMTPS id F11AD4000C; Tue, 17 Nov 2015 23:45:07 +0100 (CET)
Received: from poseidon-mailbox.amsuess.com (poseidon-mailbox.amsuess.com [10.13.13.231]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id D6B464A; Tue, 17 Nov 2015 23:45:06 +0100 (CET)
Received: from hephaistos.amsuess.com (hermes.amsuess.com [10.13.13.254]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id 883272E; Tue, 17 Nov 2015 23:45:06 +0100 (CET)
Received: (nullmailer pid 4232 invoked by uid 1000); Tue, 17 Nov 2015 22:44:51 -0000
Date: Tue, 17 Nov 2015 23:44:51 +0100
From: Christian Amsüss <c.amsuess@energyharvesting.at>
To: Michael Koster <michaeljohnkoster@gmail.com>
Message-ID: <20151117224451.GA22217@hephaistos.amsuess.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="r5Pyd7+fXNt84Ff3"
Content-Disposition: inline
User-Agent: Mutt/1.5.24 (2015-08-30)
Archived-At: <http://mailarchive.ietf.org/arch/msg/core/qGZkMGlHA6PgFvIsK07tkewXwos>
Cc: "draft-jennings-core-senml@tools.ietf.org" <draft-jennings-core-senml@tools.ietf.org>, core <core@ietf.org>
Subject: [core] SenML JSON syntax and collection+senml+json
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Nov 2015 22:45:13 -0000

Hello Michael,
hello SenML and core-interfaces people,

I'd like to pick up the topic of streamable SenML from the context of
the `SenML JSON syntax` syntax thread from before IETF94.

To summarize what I know of the state of things:

* JSON SenML can't enforce that the base {name, time} entries precede
  the entries list while still being JSON. To parse a generic SenML
  message, it is thus required to keep the whole message in memory.

  An alternative syntax is proposed [{base dict}, [entries]]; that can
  be extended to allow repetitions thereof (with incremental base
  values), or the distinction between base and entry data could be
  lifted further.

  This assumes that the "e" record list takes a special role in SenML by
  being the workhorse list of data, which conflicts with:

* CoRE interfaces serves collections as both data and metadata in a
  unified SenML structure, where resource states are given in the
  classical "e" array, and the metadata next to it in an "l" array as in
  application/link-format+json.

A notation for treating the "l" array as an "e" element was proposed,
but did not resonate well with Michael (from the CoRE interface side);
I'd like to take up the line of discussion from there:

On Tue, Oct 20, 2015 at 12:52:19PM -0700, Michael Koster wrote:
> It’s more than a simple visual relationship. I’m used to JSON tools
> that create an in-memory data structure that conforms to the JSON
> serialization. With the “old” SenML model, the elements of the object
> identified by “bn” are rendered as an array within the element
> identified by “bn” and tagged by “e”. 
>
> The new construct more than just enables streaming, it forces serial
> interpretation, i.e. it *requires* streaming.

Yes, and that's the very point. If I'm to parse SenML on a constrained
device, especially given that the sender can use its extensibility to
send along data that is not expected by the receiver, that means that I
need to be prepared to store whichever length the complete message has.

For an example of a situation when this can be an issue, take an update
to a DMX (RGB spots or other light installations) controller. A PUT to
atomically update the complete scene of connected devices in JSON
serialization can easily take up 10k plus network overhead in network
buffer space even without any additional metadata from SenML extensions,
but (if read in a serializable way) implementations could get away with
a single-MTU-buffer network implementation plus 1k for double-buffered
state.

Another example (from my everyday CoAP communication, but not involving
embedded parsing) are history readouts of sensor values, which can
exceed 100kB for devices with intermittent network connectivity.

> Would it make sense to create a new content-format that optimizes for
> streaming processing?

This is not about streaming Big Data around to the point where big
devices need to go into "streaming mode" (though it's useful there too),
this is about (not the most common, but still relatively) normal
situations and not returning 4.13 from small devices any time someone
doesn't chunk up his request to small multiples of the MTU.

I don't like to exaggerate, so please take this with a grain of salt and
aware that this is written in the heat of the argument: If we don't find
an agreeable serialization that can be processed in a streaming fashion,
we might right as well put a hard limit to the maximum size of a SenML
representation, that are a required minimum for SenML implementors to
support. What would that be, 4k? 16k?

> > In my opinion, it raises the question of how generic SenML should
> > attempt to be. My personal view of it is that SenML is a way of
> > encapsulating several resource representations (be they of different
> > points in time or different resource) in a single message. With that in
> > mind, maybe the following would work for you (rephrasing your example
> > into senml-02 syntax, with comments):
>
> SenML is already being used to represent simple collections in CoRE
> Interfaces, OMA LWM2M, and OIC. Whether to have it be extensible and
> evolvable or not is certainly a tradeoff against complexity and stream
> processing ability. I would lean toward evolvability. 

Concerning evolvability:

That shouldn't be a show stopper: extensions can still go both in the
base dictionary and in the events; it's just they wouldn't profit from
the guaranteed sequence.

An approach I don't like in its current form but that could point the
direction for something more elegant is to indicate the "key" of
subsequent lists in the base dictionary; with your "l" example, that
could be

    [ {"bn": "/collection1/", "next-object": "e"},
      [{"n": "item1", "sv": "value1"}, ...],
      {"next-object": "l"},
      [{"href": "item1", ...}, ...}
    ]

As said, it's not pretty, nor what I'd endorse as-is, but extensibility
and easy-to-parse sequence don't necessarily conflict.

Concerning focus of SenML:

Simple collections seems to be a good outline; would you also agree to
"simple collections of resource representations and their metadata"?

> > What do you think of the above arrangement?
>
> I think it’s a substantial compromise in the ability to represent data
> structure to get streaming processing ability. But I do like the idea
> of a “ov” element for object values.

Does that refer to the new serialization format in general or to packing
the link list into an entity response in particular? In the latter case,
please elaborate -- the latter "happened" with the infrastructure I've
been using (under certain conditions, my batch resources contain their
application/link-format as "s": entries), I've found it practical, and
it would come in much more handy with "ov":link-format+json.

Best regards
Christian

-- 
Christian Amsüss                      | Energy Harvesting Solutions GmbH
founder, system architect             | headquarter:
mailto:c.amsuess@energyharvesting.at  | Arbeitergasse 15, A-4400 Steyr
tel:+43-664-97-90-6-39                | http://www.energyharvesting.at/
                                      | ATU68476614