Re: [core] SemML time series data representation?

Ari Keränen <ari.keranen@ericsson.com> Thu, 19 November 2015 19:43 UTC

Return-Path: <ari.keranen@ericsson.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 54B3E1B3499 for <core@ietfa.amsl.com>; Thu, 19 Nov 2015 11:43:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.301
X-Spam-Level:
X-Spam-Status: No, score=-3.301 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_64=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zrDbXicx0A2u for <core@ietfa.amsl.com>; Thu, 19 Nov 2015 11:43:10 -0800 (PST)
Received: from sessmg23.ericsson.net (sessmg23.ericsson.net [193.180.251.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6DC411B349B for <core@ietf.org>; Thu, 19 Nov 2015 11:43:09 -0800 (PST)
X-AuditID: c1b4fb2d-f79626d000004282-57-564e264ac7ab
Received: from ESESSHC019.ericsson.se (Unknown_Domain [153.88.183.75]) by sessmg23.ericsson.net (Symantec Mail Security) with SMTP id 53.68.17026.A462E465; Thu, 19 Nov 2015 20:43:07 +0100 (CET)
Received: from ESESSMB205.ericsson.se ([169.254.5.29]) by ESESSHC019.ericsson.se ([153.88.183.75]) with mapi id 14.03.0248.002; Thu, 19 Nov 2015 20:43:06 +0100
From: Ari Keränen <ari.keranen@ericsson.com>
To: "Isomaki Markus (Nokia-TECH/Espoo)" <markus.isomaki@nokia.com>
Thread-Topic: [core] SemML time series data representation?
Thread-Index: AdEiHUZ1breKcyk1Qzq1U/mXRYRdgQA3NneA
Date: Thu, 19 Nov 2015 19:43:05 +0000
Message-ID: <8309DD6A-FED6-4E3D-86E8-FDF842BC9458@ericsson.com>
References: <1d3a2378c7df499e84f3edae6f5d1f96@NOKWDCFIEXCH02P.nnok.nokia.com>
In-Reply-To: <1d3a2378c7df499e84f3edae6f5d1f96@NOKWDCFIEXCH02P.nnok.nokia.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [153.88.183.154]
Content-Type: text/plain; charset="utf-8"
Content-ID: <5D9C9BC58767A549BD60CD3261C9B268@ericsson.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLIsWRmVeSWpSXmKPExsUyM2K7t663ml+YwcmtzBbLLzxnsdj3dj2z xYFpE1gtPqz/wWhx/u8iNgdWj6377zJ5LFnyk8nj8vmPjB53b11i8vhy+TNbAGsUl01Kak5m WWqRvl0CV8b3/jVMBWs6GSs+7z/K1sD4opWxi5GDQ0LAROJzA38XIyeQKSZx4d56ti5GLg4h gcOMElcnrmYGSQgJLGaUOPLQBMRmE7CVeNK6jxXEFhFwkvi87jQzSAOzwD1Gifbb/9hBhgoL WEmcX5UBUWMtsbX9JxuEbSTxZdkisJksAqoSPe2fWUBsXgF7iUmHPjBB7PKV2LX3HTuIzSng J9H15xJYPSPQcd9PrQGrYRYQl7j1ZD4TxNECEkv2nGeGsEUlXj7+xwphK0ksuv2ZCeQcZgFN ifW79CFarSVaXnazQNiKElO6H7JDnCAocXLmE5YJjOKzkGyYhdA9C0n3LCTds5B0L2BkXcUo WpxaXJybbmSsl1qUmVxcnJ+nl5dasokRGKkHt/zW3cG4+rXjIUYBDkYlHl4DGb8wIdbEsuLK 3EOMEhzMSiK8nopAId6UxMqq1KL8+KLSnNTiQ4zSHCxK4rwtTA9ChQTSE0tSs1NTC1KLYLJM HJxSDYw8H9hbEndw/mDKM+rZ06twudNqwr1fPkI+jRXauxYp7p+m/POkG8O26k//E+78DVjL 9FSqfAbbxfIY7n9r3FlEZfiCTqn/WyjtXc9yqZ3r9OZ3qqzRd9Ofle/b/mhKV9+qnQ1GFytC tkrmnSvnmCn4wIWDJ1tm4n1rpc+HVqaYSlX+tDrS66nEUpyRaKjFXFScCABCkHoU0AIAAA==
Archived-At: <http://mailarchive.ietf.org/arch/msg/core/7w_068FrDyioiX4-sMc9kgWUIR8>
Cc: core <core@ietf.org>, "draft-jennings-core-senml@tools.ietf.org" <draft-jennings-core-senml@tools.ietf.org>, Christian Amsüss <c.amsuess@energyharvesting.at>
Subject: Re: [core] SemML time series data representation?
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Nov 2015 19:43:13 -0000

Hi Markus,

The current syntax allows you to have multiple bases and hence you could drop the n elements if needed. Something like:

  [{"bn": "urn:dev:mac:0024befffe804ff1/voltage",
    "bt": 1276020076,
    "bu": "A",
    "ver": 1},
   [ { "u": "V", "v": 120.1 } ],
   {"bn": "urn:dev:mac:0024befffe804ff1/current"},
   [
     { "t": -5, "v": 1.2 },
     { "t": -4, "v": 1.30 },
     { "t": -3, "v": 0.14e1 },
     { "t": -2, "v": 1.5 },
     { "t": -1, "v": 1.6 },
     { "t": 0,  "v": 1.7 } ]
  ]

But for the further compression you suggested (just values) there is no mechanism.


Cheers,
Ari

> On 18 Nov 2015, at 18:37, Isomaki Markus (Nokia-TECH/Espoo) <markus.isomaki@nokia.com> wrote:
> 
> Hi,
> 
> I've not followed CORE or SenML discussions for a while, so apologies if this a FAQ. I noticed there is a discussion about SenML streaming and that triggered a question related to a project I'm working on. Basically I would like to send a large number of sensor readings that have been measured with a constant sample rate. This could be even tens of thousands of samples at a time. In SenML, is there any reasonable way to represent this type of time series in a compact manner? In the draft I see this kind of example:
> 
>   [{"bn": "urn:dev:mac:0024befffe804ff1/",
>     "bt": 1276020076,
>     "bu": "A",
>     "ver": 1},
>    [ { "n": "voltage", "u": "V", "v": 120.1 },
>      { "n": "current", "t": -5, "v": 1.2 },
>      { "n": "current", "t": -4, "v": 1.30 },
>      { "n": "current", "t": -3, "v": 0.14e1 },
>      { "n": "current", "t": -2, "v": 1.5 },
>      { "n": "current", "t": -1, "v": 1.6 },
>      { "n": "current", "t": 0,  "v": 1.7 } ]
>   ]
> 
> This kind of works but it would be quite redundant to literally send   "n":"current","t":N, "v":   ten thousand times. The current format we are using has additional metadata such as sample rate (or sample interval), and also the measurement type and the measurement unit can be given only once. This means we can just send an array of actual measurement results of the same type and sample interval, e.g. [1.2, 1.30, 0.14e1, 1.5, 1.6, 1.7], in a compact manner.
> 
> Is this possible in SenML? Would seem like a useful feature for many purposes where sensors report data in batches. 
> 
> Markus 
> 
>> -----Original Message-----
>> From: core [mailto:core-bounces@ietf.org] On Behalf Of EXT Cullen Jennings
>> Sent: Wednesday, November 18, 2015 4:15 AM
>> To: Christian Amsüss <c.amsuess@energyharvesting.at>
>> Cc: draft-jennings-core-senml@tools.ietf.org; core <core@ietf.org>
>> Subject: Re: [core] SenML JSON syntax and collection+senml+json
>> 
>> 
>> Random thoughts on a  few subjects:
>> 
>> I feel like SenML is getting to complex and we should ask if we can put it on a
>> diet. Perhaps this streaming is just too much to put into it. An alternative is to
>> not have SenML do streaming but allow a protocol using it to support
>> steaming by sending many SenML objects with the convention that is any
>> given object did not have a base value, then the base values from the
>> previous SENML object applied. I'm not sure if this is a good idea or not but
>> I'm just saying that if things start to get too complicated to do streaming
>> inside SenML, we can punt it up a layer.
>> 
>> 
>> Complexity :
>> 
>> I'm sure someone will think I am nuts for suggesting that SenML is looking
>> too complicated but as another example ... take InfluxDB which is pretty
>> good for stuff like this. Ive been using it for a cloud DB for streaming RT
>> measurements. It deprecated JSON and replaced it with "Line Protocol"
>> which is effectively the sensor name followed by space separated  followed
>> by the value followed by CRLF. That produced noticeable improvements in
>> real deployments over general JSON. A big part of SenML was to *not* be be
>> general JSON and be a very restricted subset of JSON such that it could
>> achieve the performance of something like "Line Protocol" or proto bufs and
>> still have some extensibility story.
>> 
>> So Line Protocol would send the example from later in this email as a single
>> line with
>> 
>> urn:dev:mac:0024befffe804ff1/voltage u=V 120.1
>> 
>> 
>> 
>> MetaData:
>> 
>> The more I think about metadata and data the less I know which is what.
>> Consider
>> 
>> [ {"bn": "urn:dev:mac:0024befffe804ff1/"},
>>    [ { "n": "voltage", "t": 0, "u": "V", "v": 120.1 } ]  ]
>> 
>> You could argue the only thing that is not metadata is 120.1
>> 
>> I think the goal of SenML is to have a record that has a minimal set of info
>> that is often needed to interpret the data in one record. The base names
>> were added merely as compression scheme to reduce duplication of same
>> bits several times. I'm not real wound up about it some of it is meta data or
>> not.
>> 
>> 
>> 
>> Streaming:
>> 
>> When I first read that line that said the latest SenML draft "requires support
>> of streaming" I thought that was wrong but the more I thought about it, yes,
>> I think this is a very serious problem with the current proposal. I was thinking
>> about sensor data being send from a small device to a big cloud device and
>> this might work OK but in the case of data going to another small device, this
>> is a problem. It does highlight the problem of max size for a SenML data.
>> 
>> Perhaps we need two different formats - a SenML object and a SenML
>> stream. That would allow protocols that used this to be clear about if they
>> used one or the other or both and with HTTP or CoAP, the normal
>> approaches could be used to negotiate them.
>> 
>> 
>> 
>>> On Nov 17, 2015, at 3:44 PM, Christian Amsüss
>> <c.amsuess@energyharvesting.at> wrote:
>>> 
>>> Hello Michael,
>>> hello SenML and core-interfaces people,
>>> 
>>> I'd like to pick up the topic of streamable SenML from the context of
>>> the `SenML JSON syntax` syntax thread from before IETF94.
>>> 
>>> To summarize what I know of the state of things:
>>> 
>>> * JSON SenML can't enforce that the base {name, time} entries precede
>>> the entries list while still being JSON. To parse a generic SenML
>>> message, it is thus required to keep the whole message in memory.
>>> 
>>> An alternative syntax is proposed [{base dict}, [entries]]; that can
>>> be extended to allow repetitions thereof (with incremental base
>>> values), or the distinction between base and entry data could be
>>> lifted further.
>>> 
>>> This assumes that the "e" record list takes a special role in SenML
>>> by  being the workhorse list of data, which conflicts with:
>>> 
>>> * CoRE interfaces serves collections as both data and metadata in a
>>> unified SenML structure, where resource states are given in the
>>> classical "e" array, and the metadata next to it in an "l" array as in
>>> application/link-format+json.
>>> 
>>> A notation for treating the "l" array as an "e" element was proposed,
>>> but did not resonate well with Michael (from the CoRE interface side);
>>> I'd like to take up the line of discussion from there:
>>> 
>>> On Tue, Oct 20, 2015 at 12:52:19PM -0700, Michael Koster wrote:
>>>> It’s more than a simple visual relationship. I’m used to JSON tools
>>>> that create an in-memory data structure that conforms to the JSON
>>>> serialization. With the “old” SenML model, the elements of the object
>>>> identified by “bn” are rendered as an array within the element
>>>> identified by “bn” and tagged by “e”.
>>>> 
>>>> The new construct more than just enables streaming, it forces serial
>>>> interpretation, i.e. it *requires* streaming.
>>> 
>>> Yes, and that's the very point. If I'm to parse SenML on a constrained
>>> device, especially given that the sender can use its extensibility to
>>> send along data that is not expected by the receiver, that means that
>>> I need to be prepared to store whichever length the complete message
>> has.
>>> 
>>> For an example of a situation when this can be an issue, take an
>>> update to a DMX (RGB spots or other light installations) controller. A
>>> PUT to atomically update the complete scene of connected devices in
>>> JSON serialization can easily take up 10k plus network overhead in
>>> network buffer space even without any additional metadata from SenML
>>> extensions, but (if read in a serializable way) implementations could
>>> get away with a single-MTU-buffer network implementation plus 1k for
>>> double-buffered state.
>>> 
>>> Another example (from my everyday CoAP communication, but not
>>> involving embedded parsing) are history readouts of sensor values,
>>> which can exceed 100kB for devices with intermittent network
>> connectivity.
>>> 
>>>> Would it make sense to create a new content-format that optimizes for
>>>> streaming processing?
>>> 
>>> This is not about streaming Big Data around to the point where big
>>> devices need to go into "streaming mode" (though it's useful there
>>> too), this is about (not the most common, but still relatively) normal
>>> situations and not returning 4.13 from small devices any time someone
>>> doesn't chunk up his request to small multiples of the MTU.
>>> 
>>> I don't like to exaggerate, so please take this with a grain of salt
>>> and aware that this is written in the heat of the argument: If we
>>> don't find an agreeable serialization that can be processed in a
>>> streaming fashion, we might right as well put a hard limit to the
>>> maximum size of a SenML representation, that are a required minimum
>>> for SenML implementors to support. What would that be, 4k? 16k?
>>> 
>>>>> In my opinion, it raises the question of how generic SenML should
>>>>> attempt to be. My personal view of it is that SenML is a way of
>>>>> encapsulating several resource representations (be they of different
>>>>> points in time or different resource) in a single message. With that
>>>>> in mind, maybe the following would work for you (rephrasing your
>>>>> example into senml-02 syntax, with comments):
>>>> 
>>>> SenML is already being used to represent simple collections in CoRE
>>>> Interfaces, OMA LWM2M, and OIC. Whether to have it be extensible and
>>>> evolvable or not is certainly a tradeoff against complexity and
>>>> stream processing ability. I would lean toward evolvability.
>>> 
>>> Concerning evolvability:
>>> 
>>> That shouldn't be a show stopper: extensions can still go both in the
>>> base dictionary and in the events; it's just they wouldn't profit from
>>> the guaranteed sequence.
>>> 
>>> An approach I don't like in its current form but that could point the
>>> direction for something more elegant is to indicate the "key" of
>>> subsequent lists in the base dictionary; with your "l" example, that
>>> could be
>>> 
>>>   [ {"bn": "/collection1/", "next-object": "e"},
>>>     [{"n": "item1", "sv": "value1"}, ...],
>>>     {"next-object": "l"},
>>>     [{"href": "item1", ...}, ...}
>>>   ]
>>> 
>>> As said, it's not pretty, nor what I'd endorse as-is, but
>>> extensibility and easy-to-parse sequence don't necessarily conflict.
>>> 
>>> Concerning focus of SenML:
>>> 
>>> Simple collections seems to be a good outline; would you also agree to
>>> "simple collections of resource representations and their metadata"?
>>> 
>>>>> What do you think of the above arrangement?
>>>> 
>>>> I think it’s a substantial compromise in the ability to represent
>>>> data structure to get streaming processing ability. But I do like the
>>>> idea of a “ov” element for object values.
>>> 
>>> Does that refer to the new serialization format in general or to
>>> packing the link list into an entity response in particular? In the
>>> latter case, please elaborate -- the latter "happened" with the
>>> infrastructure I've been using (under certain conditions, my batch
>>> resources contain their application/link-format as "s": entries), I've
>>> found it practical, and it would come in much more handy with "ov":link-
>> format+json.
>>> 
>>> Best regards
>>> Christian
>>> 
>>> --
>>> Christian Amsüss                      | Energy Harvesting Solutions GmbH
>>> founder, system architect             | headquarter:
>>> mailto:c.amsuess@energyharvesting.at  | Arbeitergasse 15, A-4400 Steyr
>>> tel:+43-664-97-90-6-39                | http://www.energyharvesting.at/
>>>                                     | ATU68476614
>> 
>> _______________________________________________
>> core mailing list
>> core@ietf.org
>> https://www.ietf.org/mailman/listinfo/core
> _______________________________________________
> core mailing list
> core@ietf.org
> https://www.ietf.org/mailman/listinfo/core