Re: [core] Designs to resolve streaming issues in SenML

"Cullen Jennings (fluffy)" <fluffy@cisco.com> Mon, 25 January 2016 22:16 UTC

Return-Path: <fluffy@cisco.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3D5751A1A91 for <core@ietfa.amsl.com>; Mon, 25 Jan 2016 14:16:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -111.102
X-Spam-Level:
X-Spam-Status: No, score=-111.102 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, J_CHICKENPOX_21=0.6, J_CHICKENPOX_54=0.6, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZYDJlyBmp-7J for <core@ietfa.amsl.com>; Mon, 25 Jan 2016 14:15:57 -0800 (PST)
Received: from alln-iport-7.cisco.com (alln-iport-7.cisco.com [173.37.142.94]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 497291A1A97 for <core@ietf.org>; Mon, 25 Jan 2016 14:15:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=42716; q=dns/txt; s=iport; t=1453760157; x=1454969757; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=LZqGTqXy23dIObDQ3vCPI6kId5NEfStG0Gg0QsHBdqg=; b=Q0wbQn/Rzxf4jTfxnUz9huvJrWFYV2K8HSBxZi//U1JdNxR7dk7izIdK bQYWsOHcUEGfy45o0OF5wL0BHhUAAkd0JkLb/RFRO3LH/C4ukcVAYLknJ nVxkgHg+HzdCiO/iVTD+gXRpIjY0Mzk7j659ZFcTRsGabDKEhCll1PeGD k=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0ASBQBOnaZW/5RdJa1egzpSbQaBE4c+tAUkhWsCHIEpPBABAQEBAQEBgQqEQgEBBBoJVhACAQg4BwMCAgIwFBECBAENBQkSiAAOrzqPAwEBAQEBAQEBAQEBAQEBAQEBAQEBARWINoFmgQOEJQENLhcRgkMrgQ8FjSqJTAGFRYJxhR8GgVhKg3qIV4VuhH6DUgEPKCuCARiBUGoBhgIBHx18AQEB
X-IronPort-AV: E=Sophos;i="5.22,346,1449532800"; d="scan'208,217";a="231303451"
Received: from rcdn-core-12.cisco.com ([173.37.93.148]) by alln-iport-7.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Jan 2016 22:15:55 +0000
Received: from XCH-RTP-003.cisco.com (xch-rtp-003.cisco.com [64.101.220.143]) by rcdn-core-12.cisco.com (8.14.5/8.14.5) with ESMTP id u0PMFtuV003543 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Mon, 25 Jan 2016 22:15:55 GMT
Received: from xch-rtp-004.cisco.com (64.101.220.144) by XCH-RTP-003.cisco.com (64.101.220.143) with Microsoft SMTP Server (TLS) id 15.0.1104.5; Mon, 25 Jan 2016 17:15:54 -0500
Received: from xch-rtp-004.cisco.com ([64.101.220.144]) by XCH-RTP-004.cisco.com ([64.101.220.144]) with mapi id 15.00.1104.009; Mon, 25 Jan 2016 17:15:54 -0500
From: "Cullen Jennings (fluffy)" <fluffy@cisco.com>
To: Christian Amsüss <c.amsuess@energyharvesting.at>, Michael Koster <michaeljohnkoster@gmail.com>
Thread-Topic: [core] Designs to resolve streaming issues in SenML
Thread-Index: AQHRUeAS1qrVpL28FUOtKmOA3wV52J8NK/sA
Date: Mon, 25 Jan 2016 22:15:54 +0000
Message-ID: <E149F2E5-58BC-46D7-B4AD-53CFC0BA3283@cisco.com>
References: <20160118110500.GA7789@hephaistos.amsuess.com>
In-Reply-To: <20160118110500.GA7789@hephaistos.amsuess.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.24.120.33]
Content-Type: multipart/alternative; boundary="_000_E149F2E558BC46D7B4AD53CFC0BA3283ciscocom_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/core/PrjEKWUW4zo1HZf7qzcpbIZWZIE>
Cc: core <core@ietf.org>
Subject: Re: [core] Designs to resolve streaming issues in SenML
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jan 2016 22:16:02 -0000

This message really helps clarify things. Thanks. More inline


On Jan 18, 2016, at 4:05 AM, Christian Amsüss <c.amsuess@energyharvesting.at<mailto:c.amsuess@energyharvesting.at>> wrote:

Hello Michael,

it appears that there are three concepts both being called or related to
"streaming" that get mixed up in this thread, I'll try to flesh them out
fist, describing their purpose, status of drafts, and interaction with
client and server side.

A. The draft paragraph about "store or transmit SenML in a stream-like
 fashion" in the "Multiple Datapoints" section:

 This is more about transportation, and implies that there is some kind
 of boundary between elements at which transmission can pause.

 Such boundaries are present in all drafts so far.

 It appears to me that this would primarily work with HTTP in a fashion
 similar to long polling, which is something both communicating parties
 need to be aware of (all drafts: "MUST specify that they are doing
 this"), lest they time out the connection.

agree and I think a different mine type can be used for both sides to understand they are doing this. In some cases the stream may never end and the JSON to close the data structures would never be sent so at some level it’s not even valid JSON.


B. Buffer-less operation:

 For very small applications (eg. using uIP and few kB of RAM), it is
 desirable to use data formats that never require back-seeking, that
 limit back-seeks to a fixed length or that can do with a fixed-length
 buffer to hold information for that. The specification does not need
 to actively describe those operations, requiring some basic structure
 is sufficient.

 It has been suggested earlier on this list that -01 with an additional
 requirement that "bn" and "bt" to come first in the dictionaries would
 allow this mode of operation too -- truth, but hard to achieve with
 generic JSON implementations and semantics.

 Thus, this goal is only achieved by drafts -02 and later. (This is
 also what my demo is about, which shows that -03 is particularly
 suitable).

 That this only an issue for the receiving side (the sender is free
 to choose element sequences as is most practical for it anyway, as
 long as it follows the specification). Having this negotiable would
 defeat its purpose: If receivers may rely on it, we can't allow
 senders to not support it.

I think supporting this is important. It’s good for small things but it also tends to support server implementations that are large but need to be able to process really a lot of data quickly.


C. Multiple base elements:

 The idea to have multiple base elements came up independently of the
 others, and gives smaller data streams for cases in which different
 sensors' time series are produced, or when there are larger gaps in a
 timeline. The proposal worked well with the array-as-root scheme
 envisioned with exactly 2 elements for B., and was implemented
 together.


I’m starting to think that the use case for multiple base elements is very weak and that perhaps we should only allow a single set of base units which are before any of the measurements. If we get this for free or cheap, great, but not something I care a lot about.


 It is present in -03 and -04.

 Senders are free not to use it. In my impression, this is easily to
 implement in receivers. (Even if they desire to directly access
 in-memory representations of JSON; the key to an entry might then be
 (list-number, list-position) instead of just list-position).


PS Can we succinctly describe the use case for ordered serial
processing of senml+json elements being important? From reading the
draft, it seems like the important bit is saving the buffer memory for
large responses.

The one I'm arguing here is B., the buffer-less operation. The use case
are devices with limited heap memory (say 4k for incoming and outgoing
packages). I'm using SenML in core-interfaces batches[1] to get and set
device state. When those devices announce supporting batch operation,
they need to accept SenML PUTs to as many resources as are batched
together, so far without arbitrary constraints. Due to the proposed
extensibility of SenML, even if I limited the number of resources in a
batch, I couldn't predict the maximum size of an incoming update.

Now when the update arrives and a SenML that does not satisfy B. comes
in, all I can do is to store the whole message body and parse it when
the last block arrived. If the sender updated too many resources or used
too large extensions, I'd need to 4.13-Request-Entity-too-large out.
Still, until then (particularly until the overflowing package arrives),
memory is clogged with unparsable data.

Is it a bigger deal with CBOR encoding or less important.

Whether CBOR or JSON is being parsed is almost irrelevant here. The only
difference that comes to my mind is that with CBOR, we might be able to
prescribe a serialization where "bn" and "bt" always precede "e" even in
-01-style, but even that would make it harder to utilize for whomever
implement this with a native-object-style library.

Is the most important consideration being able to know the base values
ahead of the data?

This is about all information that may be in the dictionary and modifies
how to interpret the elements' data. In core SenML, that information is
the base attributes, and the version -- that is, all the possible root
variables. (In a scenario .)

For extensions, this completely depends on them unless the receiver
chooses to ignore them anyway. As far as I understand, your link format
extension could add items to a core-interfaces linked batch, and then
set a value in one go; for that, the extension data would need to
precede the elements in parsing. Other extensions (say, metadata about
when to expect changes) may be useful no matter where in the data stream
it appears.

[1] https://tools.ietf.org/html/draft-ietf-core-interfaces-04#section-6.2

Also, where is -03? there doesn’t seem to be a version -03 that you all are talking about on the IETF website:
https://tools.ietf.org/html/draft-jennings-core-senml-04 <https://tools.ietf.org/html/draft-jennings-core-senml-04>

The data tracker (or rather tools.ietf.org<http://tools.ietf.org>) seems not to take drafts
submitted on the same day well. You can view it as [2].

[2] https://datatracker.ietf.org/doc/draft-jennings-core-senml/03/?include_text=1

On Sat, Jan 16, 2016 at 07:24:41AM -0800, Michael Koster wrote:
The optimizations you recommend require a new string input parser,
where I am using the JSON parser that comes with the library. My point
was that I will need to either make my own string parser for SenML or
make a re-parser with the state machine I described in order to use
-02 +[...]

I don't see yet where the state machine comes in. Surely, much
state-machingin is going on in the optimized example, but when it comes
to the core difference from -01 to -02, that is, that

   {any-key-but-e: any-data, "e": elements-list}

becomes

   [{any-key-but-e: any-data}, elements-list]

, I don't see how this is not changing access from _senml[key] to
_senml[0][key] or _senml["e"] to _senml[1].

(If it's about the C. multiple-base-elements, I'd ask to keep these
issues separate -- if C. breaks too much, although I like it, we could
have a -02 with only two elements and still satisfy B., but I don't
think we need to resort to that.)

[...] but there is another issue anyway:

One of the big advantages of using JSON is connecting the embedded
world to the web world. To do this in a low-friction way, it’s good to
be able to use well known tools and patterns.

I fully agree that easy use of established web tools is essential here:
but these very threads are where we should come up with mechanisms that
work both with them and constrained devices.


There is one point where I think that state-machining over incoming JSON
objects does make sense, that is when it comes to names -- I like to see
them as URIs (that's consistent with the examples given, but not fully
required in the drafts), and when the "n" elements are relative URIs,
they are not useful until joined with the base name. But that is not a
change introduced in -01, but a preference of mine that is reflected in
the demo to show that relative URIs are not that complicated even for
embedded systems. (Existing web tools usually have their ways of joining
URIs shipped).

It might be better if specification which used SenML said the combination of bn+n MUST be a URI and perhaps even constrain what type of URI are OK.


Thanks also for looking into my code. I could easily use the -02 or
-03 in the senml processing as you show. What is missing still from
everything after -01 is the element tag that I am re-using to indicate
links and forms in the document:

I did see that you had extensions in your code, but as Carsten
suggested, the "l" could stay in the first dictionary alongside the "bn"
and "ver". Maybe the name "base dictionary" is suboptimal here -- the
dictionary itself is not something that gets somehow prefixed to the
following entries, it's just the place where the base attributes reside
along with other things, like "ver" and extensions.

if any of the post -01 version broke the ability to add extensions, that is just an mistake or my part and we can fix it. Allowing JSON-LD style links in the bn makes sense to me.



On Sat, Jan 16, 2016 at 09:43:55AM -0800, Michael Koster wrote:
It would be useful if the order of top level element types could be arbitrary e.g.: [ {},[],[],[],{},[],{},{} ... ]

That's something I could well see in a -05. It would also make minimal
SenML files that don't use any base attributes a little smaller, and
embedded parsing wouldn't suffer from it AFAICT.

I need to send more justification about this to the list but looking at common JSON parsers for languages that do no allow variant arrays, it seems that many of theses libraries don’t easily support JSON with array of different types.

For example the golang JSON stuff. You can see how this works at

https://github.com/cisco/senmlCat/blob/master/senmlCat.go

On a side note, senmlCat can covert SenML to and from various formats and can also act as a HTTP server that receives POST of SenML and then writes them files, or influxdb, or kafka. I would not call it more alpha than done but hey, send a pull request.  I could not talk about some of the SenML project but having some open source code to point at that we can talk about is useful.

If you poke thought the libraries at http://www.json.org/ for a language like C, you can see they either do mallocs for every element in the list - which gets very slow compared to something like protobuf - or the schema tells the system the type of element in the array.

Now clearly JSON, XML, CBOR etc can all support variant arrays. And clearly in any language you can write a parser and data structure that allows them (even Fortran). We are just talking about performance and convenience  of using variant arrays in a language like C. I think it would very nice to be able to avoid variant arrays if we don’t need them.



On Sat, Jan 16, 2016 at 10:33:00AM -0800, Michael Koster wrote:
PS for correctness my example should look more like this:
[
 {
   "bn": "/light/brightness",
   "l": [ ... ],
   "f": [ ...],
   "e": [
     { "n": "tbr", "v": "44.3", },
     { "n": "tt", "v": "1.0", }
   ]
 }
]

Just let's please not allow the "e" element any more. Anything with a
top-level array is incompatible with older draft users anyway.


Best regards
Christian

--
Christian Amsüss                      | Energy Harvesting Solutions GmbH
founder, system architect             | headquarter:
mailto:c.amsuess@energyharvesting.at  | Arbeitergasse 15, A-4400 Steyr
tel:+43-664-97-90-6-39                | http://www.energyharvesting.at/
                                     | ATU68476614