Re: [core] SenML JSON syntax and collection+senml+json

Cullen Jennings <fluffy@iii.ca> Wed, 18 November 2015 02:15 UTC

Return-Path: <fluffy@iii.ca>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B80941B37F3 for <core@ietfa.amsl.com>; Tue, 17 Nov 2015 18:15:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.099
X-Spam-Level: *
X-Spam-Status: No, score=1.099 tagged_above=-999 required=5 tests=[BAYES_50=0.8, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oGueBXCEEllS for <core@ietfa.amsl.com>; Tue, 17 Nov 2015 18:15:23 -0800 (PST)
Received: from smtp69.ord1c.emailsrvr.com (smtp69.ord1c.emailsrvr.com [108.166.43.69]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 944471B37F2 for <core@ietf.org>; Tue, 17 Nov 2015 18:15:23 -0800 (PST)
Received: from smtp17.relay.ord1c.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp17.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id D08CD180184; Tue, 17 Nov 2015 21:15:22 -0500 (EST)
X-Auth-ID: fluffy@iii.ca
Received: by smtp17.relay.ord1c.emailsrvr.com (Authenticated sender: fluffy-AT-iii.ca) with ESMTPSA id C6B4818017A; Tue, 17 Nov 2015 21:15:21 -0500 (EST)
X-Sender-Id: fluffy@iii.ca
Received: from [192.168.4.100] ([UNAVAILABLE]. [128.107.241.180]) (using TLSv1 with cipher DHE-RSA-AES256-SHA) by 0.0.0.0:465 (trex/5.5.4); Tue, 17 Nov 2015 21:15:22 -0500
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\))
From: Cullen Jennings <fluffy@iii.ca>
In-Reply-To: <20151117224451.GA22217@hephaistos.amsuess.com>
Date: Tue, 17 Nov 2015 19:15:20 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <B3813C91-DB84-4A37-8655-64CF5C16BB6D@iii.ca>
References: <20151117224451.GA22217@hephaistos.amsuess.com>
To: Christian Amsüss <c.amsuess@energyharvesting.at>
X-Mailer: Apple Mail (2.3096.5)
Archived-At: <http://mailarchive.ietf.org/arch/msg/core/s41mJxbChSkPjWnu0ycLI5ulfJQ>
Cc: "draft-jennings-core-senml@tools.ietf.org" <draft-jennings-core-senml@tools.ietf.org>, core <core@ietf.org>
Subject: Re: [core] SenML JSON syntax and collection+senml+json
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Nov 2015 02:15:25 -0000

Random thoughts on a  few subjects:

I feel like SenML is getting to complex and we should ask if we can put it on a diet. Perhaps this streaming is just too much to put into it. An alternative is to not have SenML do streaming but allow a protocol using it to support steaming by sending many SenML objects with the convention that is any given object did not have a base value, then the base values from the previous SENML object applied. I'm not sure if this is a good idea or not but I'm just saying that if things start to get too complicated to do streaming inside SenML, we can punt it up a layer. 


Complexity :

I'm sure someone will think I am nuts for suggesting that SenML is looking too complicated but as another example ... take InfluxDB which is pretty good for stuff like this. Ive been using it for a cloud DB for streaming RT measurements. It deprecated JSON and replaced it with "Line Protocol" which is effectively the sensor name followed by space separated  followed by the value followed by CRLF. That produced noticeable improvements in real deployments over general JSON. A big part of SenML was to *not* be be general JSON and be a very restricted subset of JSON such that it could achieve the performance of something like "Line Protocol" or proto bufs and still have some extensibility story. 

So Line Protocol would send the example from later in this email as a single line with 

urn:dev:mac:0024befffe804ff1/voltage u=V 120.1



MetaData:

The more I think about metadata and data the less I know which is what. Consider 

 [ {"bn": "urn:dev:mac:0024befffe804ff1/"},
    [ { "n": "voltage", "t": 0, "u": "V", "v": 120.1 } ]
 ]

You could argue the only thing that is not metadata is 120.1 

I think the goal of SenML is to have a record that has a minimal set of info that is often needed to interpret the data in one record. The base names were added merely as compression scheme to reduce duplication of same bits several times. I'm not real wound up about it some of it is meta data or not. 



Streaming:

When I first read that line that said the latest SenML draft "requires support of streaming" I thought that was wrong but the more I thought about it, yes, I think this is a very serious problem with the current proposal. I was thinking about sensor data being send from a small device to a big cloud device and this might work OK but in the case of data going to another small device, this is a problem. It does highlight the problem of max size for a SenML data. 

Perhaps we need two different formats - a SenML object and a SenML stream. That would allow protocols that used this to be clear about if they used one or the other or both and with HTTP or CoAP, the normal approaches could be used to negotiate them. 



> On Nov 17, 2015, at 3:44 PM, Christian Amsüss <c.amsuess@energyharvesting.at> wrote:
> 
> Hello Michael,
> hello SenML and core-interfaces people,
> 
> I'd like to pick up the topic of streamable SenML from the context of
> the `SenML JSON syntax` syntax thread from before IETF94.
> 
> To summarize what I know of the state of things:
> 
> * JSON SenML can't enforce that the base {name, time} entries precede
>  the entries list while still being JSON. To parse a generic SenML
>  message, it is thus required to keep the whole message in memory.
> 
>  An alternative syntax is proposed [{base dict}, [entries]]; that can
>  be extended to allow repetitions thereof (with incremental base
>  values), or the distinction between base and entry data could be
>  lifted further.
> 
>  This assumes that the "e" record list takes a special role in SenML by
>  being the workhorse list of data, which conflicts with:
> 
> * CoRE interfaces serves collections as both data and metadata in a
>  unified SenML structure, where resource states are given in the
>  classical "e" array, and the metadata next to it in an "l" array as in
>  application/link-format+json.
> 
> A notation for treating the "l" array as an "e" element was proposed,
> but did not resonate well with Michael (from the CoRE interface side);
> I'd like to take up the line of discussion from there:
> 
> On Tue, Oct 20, 2015 at 12:52:19PM -0700, Michael Koster wrote:
>> It’s more than a simple visual relationship. I’m used to JSON tools
>> that create an in-memory data structure that conforms to the JSON
>> serialization. With the “old” SenML model, the elements of the object
>> identified by “bn” are rendered as an array within the element
>> identified by “bn” and tagged by “e”. 
>> 
>> The new construct more than just enables streaming, it forces serial
>> interpretation, i.e. it *requires* streaming.
> 
> Yes, and that's the very point. If I'm to parse SenML on a constrained
> device, especially given that the sender can use its extensibility to
> send along data that is not expected by the receiver, that means that I
> need to be prepared to store whichever length the complete message has.
> 
> For an example of a situation when this can be an issue, take an update
> to a DMX (RGB spots or other light installations) controller. A PUT to
> atomically update the complete scene of connected devices in JSON
> serialization can easily take up 10k plus network overhead in network
> buffer space even without any additional metadata from SenML extensions,
> but (if read in a serializable way) implementations could get away with
> a single-MTU-buffer network implementation plus 1k for double-buffered
> state.
> 
> Another example (from my everyday CoAP communication, but not involving
> embedded parsing) are history readouts of sensor values, which can
> exceed 100kB for devices with intermittent network connectivity.
> 
>> Would it make sense to create a new content-format that optimizes for
>> streaming processing?
> 
> This is not about streaming Big Data around to the point where big
> devices need to go into "streaming mode" (though it's useful there too),
> this is about (not the most common, but still relatively) normal
> situations and not returning 4.13 from small devices any time someone
> doesn't chunk up his request to small multiples of the MTU.
> 
> I don't like to exaggerate, so please take this with a grain of salt and
> aware that this is written in the heat of the argument: If we don't find
> an agreeable serialization that can be processed in a streaming fashion,
> we might right as well put a hard limit to the maximum size of a SenML
> representation, that are a required minimum for SenML implementors to
> support. What would that be, 4k? 16k?
> 
>>> In my opinion, it raises the question of how generic SenML should
>>> attempt to be. My personal view of it is that SenML is a way of
>>> encapsulating several resource representations (be they of different
>>> points in time or different resource) in a single message. With that in
>>> mind, maybe the following would work for you (rephrasing your example
>>> into senml-02 syntax, with comments):
>> 
>> SenML is already being used to represent simple collections in CoRE
>> Interfaces, OMA LWM2M, and OIC. Whether to have it be extensible and
>> evolvable or not is certainly a tradeoff against complexity and stream
>> processing ability. I would lean toward evolvability. 
> 
> Concerning evolvability:
> 
> That shouldn't be a show stopper: extensions can still go both in the
> base dictionary and in the events; it's just they wouldn't profit from
> the guaranteed sequence.
> 
> An approach I don't like in its current form but that could point the
> direction for something more elegant is to indicate the "key" of
> subsequent lists in the base dictionary; with your "l" example, that
> could be
> 
>    [ {"bn": "/collection1/", "next-object": "e"},
>      [{"n": "item1", "sv": "value1"}, ...],
>      {"next-object": "l"},
>      [{"href": "item1", ...}, ...}
>    ]
> 
> As said, it's not pretty, nor what I'd endorse as-is, but extensibility
> and easy-to-parse sequence don't necessarily conflict.
> 
> Concerning focus of SenML:
> 
> Simple collections seems to be a good outline; would you also agree to
> "simple collections of resource representations and their metadata"?
> 
>>> What do you think of the above arrangement?
>> 
>> I think it’s a substantial compromise in the ability to represent data
>> structure to get streaming processing ability. But I do like the idea
>> of a “ov” element for object values.
> 
> Does that refer to the new serialization format in general or to packing
> the link list into an entity response in particular? In the latter case,
> please elaborate -- the latter "happened" with the infrastructure I've
> been using (under certain conditions, my batch resources contain their
> application/link-format as "s": entries), I've found it practical, and
> it would come in much more handy with "ov":link-format+json.
> 
> Best regards
> Christian
> 
> -- 
> Christian Amsüss                      | Energy Harvesting Solutions GmbH
> founder, system architect             | headquarter:
> mailto:c.amsuess@energyharvesting.at  | Arbeitergasse 15, A-4400 Steyr
> tel:+43-664-97-90-6-39                | http://www.energyharvesting.at/
>                                      | ATU68476614