Re: [core] Designs to resolve streaming issues in SenML

Michael Koster <michaeljohnkoster@gmail.com> Mon, 18 January 2016 15:42 UTC

Return-Path: <michaeljohnkoster@gmail.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 893611B38AC for <core@ietfa.amsl.com>; Mon, 18 Jan 2016 07:42:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.8
X-Spam-Level:
X-Spam-Status: No, score=0.8 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_54=0.6, MIME_8BIT_HEADER=0.3, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3v3WHzMAAr5L for <core@ietfa.amsl.com>; Mon, 18 Jan 2016 07:42:14 -0800 (PST)
Received: from mail-pf0-x230.google.com (mail-pf0-x230.google.com [IPv6:2607:f8b0:400e:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D82361B389D for <core@ietf.org>; Mon, 18 Jan 2016 07:42:13 -0800 (PST)
Received: by mail-pf0-x230.google.com with SMTP id n128so159453906pfn.3 for <core@ietf.org>; Mon, 18 Jan 2016 07:42:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=Ll8QwTD8sH1yxB1lOGpwYyKgiqhus36eNhfgZ4yGOhY=; b=YgbTGEBy07MJmwM6sfyK9LF4+JB1fiJnEP2TzbzF6SwuVK+F1bichep0b35/Y7EE8v gXy39xl6rULlkcMuNA939OkbvT7ukY4XugRt6MPMOccqWjWDDJQOyWbYzFIz36x3at9G h9UOnJpVY7pAVevWPY3CXzpGUobhyT2WkWfJuoH6KT4QBU1mRYrGI4nDEMM3mrhDSfb/ f6opLX80C7iKo41l6TQ2xFvX9VadOK9qb8vOHOKjC5IW2815TbGpPK1Bo0WjlwMpjcf3 mddUk4XL90oFffBgnNDl404MWKYs6FlEgMKOLLKqBxFH8kUd5iE9vbwCslj7NK8r0PYJ HFbg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=Ll8QwTD8sH1yxB1lOGpwYyKgiqhus36eNhfgZ4yGOhY=; b=kUa2t43unf+VHq+NwfxKx1wTghc+AkjvTQgRX5Oi/cIspDI9I1OBEllUZxBAQYaHwj ZMHvZ3471NvwEbC/3sBybCFYR/fGCpI5r0jZuyEgBjj4Yl6Ga09S9EJ6fiLZUtoYspRY Z3GoTFCs6SqKx5xY7iTDZu/tCpVHu9+W5BiwmTDR7e3NVUW1Eom4C1zUXTOeqcZXirCA 3rOfKDtGOBwDQ40Ry2EnnrGY0qjgiMaNJ2E69ZQYJUPS+5F/jzpO7vd5HPXVzad9GzLs 0H2tzUlFaDO5VAFCdSp6BjaL0Xh5QVz//uAjy5akpkNwoOBKk5dHWefgPTPXxFxHorDW 0hhg==
X-Gm-Message-State: ALoCoQnwvpGXqAvzO7U8EWMrnan058XoNShXzzJKbGz8fp60w0OPwlQo/LJjKHY6//4R7UUcKomJEmEMPRtrP/d/HcEpqX04Iw==
X-Received: by 10.98.42.213 with SMTP id q204mr37305565pfq.141.1453131733425; Mon, 18 Jan 2016 07:42:13 -0800 (PST)
Received: from [10.0.0.21] (108-201-184-41.lightspeed.sntcca.sbcglobal.net. [108.201.184.41]) by smtp.gmail.com with ESMTPSA id i76sm34866904pfj.68.2016.01.18.07.42.11 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 18 Jan 2016 07:42:12 -0800 (PST)
Content-Type: multipart/alternative; boundary="Apple-Mail=_21C4F613-C5D6-4029-B236-219FDE45F1B6"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Michael Koster <michaeljohnkoster@gmail.com>
In-Reply-To: <20160118110500.GA7789@hephaistos.amsuess.com>
Date: Mon, 18 Jan 2016 07:42:10 -0800
Message-Id: <7BEAC3D7-C2B8-42A2-9496-DDED5837ACF2@gmail.com>
References: <20160118110500.GA7789@hephaistos.amsuess.com>
To: Christian Amsüss <c.amsuess@energyharvesting.at>
X-Mailer: Apple Mail (2.2104)
Archived-At: <http://mailarchive.ietf.org/arch/msg/core/mZPRRuCbTXrwZjtIB_ER_JGOp-w>
Cc: "Cullen Jennings (fluffy)" <fluffy@cisco.com>, core <core@ietf.org>
Subject: Re: [core] Designs to resolve streaming issues in SenML
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jan 2016 15:42:18 -0000

Hi Christian,

Thanks for the comprehensive reply. I understand better now the bigger picture.

I have added a few comments below. 

> On Jan 18, 2016, at 3:05 AM, Christian Amsüss <c.amsuess@energyharvesting.at> wrote:
> 
> Hello Michael,
> 
> it appears that there are three concepts both being called or related to
> "streaming" that get mixed up in this thread, I'll try to flesh them out
> fist, describing their purpose, status of drafts, and interaction with
> client and server side.
> 
> A. The draft paragraph about "store or transmit SenML in a stream-like
>  fashion" in the "Multiple Datapoints" section:
> 
>  This is more about transportation, and implies that there is some kind
>  of boundary between elements at which transmission can pause.
> 
>  Such boundaries are present in all drafts so far.
> 
>  It appears to me that this would primarily work with HTTP in a fashion
>  similar to long polling, which is something both communicating parties
>  need to be aware of (all drafts: "MUST specify that they are doing
>  this"), lest they time out the connection.
> 
Yes, I understand. I’m assuming that any array structure with small enough elements
will satisfy this goal. In this case the base needs to be remembered at the receiver 
anyway, so  having base in a separate element makes sense. We are receiving data 
with a state machine which stores each base value as a new state and context from 
which to interpret new data elements.

> B. Buffer-less operation:
> 
>  For very small applications (eg. using uIP and few kB of RAM), it is
>  desirable to use data formats that never require back-seeking, that
>  limit back-seeks to a fixed length or that can do with a fixed-length
>  buffer to hold information for that. The specification does not need
>  to actively describe those operations, requiring some basic structure
>  is sufficient.
> 
>  It has been suggested earlier on this list that -01 with an additional
>  requirement that "bn" and "bt" to come first in the dictionaries would
>  allow this mode of operation too -- truth, but hard to achieve with
>  generic JSON implementations and semantics.
> 
Limiting the size of each element in the array will limit the back-seeking needed
thus the size of the buffer will need to be related to the maximum size of an array
element. Having the outermost structure be an array enables that in the drafts
post -01.

>  Thus, this goal is only achieved by drafts -02 and later. (This is
>  also what my demo is about, which shows that -03 is particularly
>  suitable).
> 
>  That this only an issue for the receiving side (the sender is free
>  to choose element sequences as is most practical for it anyway, as
>  long as it follows the specification). Having this negotiable would
>  defeat its purpose: If receivers may rely on it, we can't allow
>  senders to not support it.
> 
This to me is key. For reliable operation the system designer needs to insure 
about the maximum message sizes and buffers. The limit on the size of any 
array element, base or data, can’t be specified in this RFC, right?

> C. Multiple base elements:
> 
>  The idea to have multiple base elements came up independently of the
>  others, and gives smaller data streams for cases in which different
>  sensors' time series are produced, or when there are larger gaps in a
>  timeline. The proposal worked well with the array-as-root scheme
>  envisioned with exactly 2 elements for B., and was implemented
>  together.
> 
>  It is present in -03 and -04.
> 
>  Senders are free not to use it. In my impression, this is easily to
>  implement in receivers. (Even if they desire to directly access
>  in-memory representations of JSON; the key to an entry might then be
>  (list-number, list-position) instead of just list-position).
> 
I am using a multiple base format in my reference implementation.
Some of my examples are for just one element in the outermost array.
> 
>> PS Can we succinctly describe the use case for ordered serial
>> processing of senml+json elements being important? From reading the
>> draft, it seems like the important bit is saving the buffer memory for
>> large responses. 
> 
> The one I'm arguing here is B., the buffer-less operation. The use case
> are devices with limited heap memory (say 4k for incoming and outgoing
> packages). I'm using SenML in core-interfaces batches[1] to get and set
> device state. When those devices announce supporting batch operation,
> they need to accept SenML PUTs to as many resources as are batched
> together, so far without arbitrary constraints. Due to the proposed
> extensibility of SenML, even if I limited the number of resources in a
> batch, I couldn't predict the maximum size of an incoming update.
> 
> Now when the update arrives and a SenML that does not satisfy B. comes
> in, all I can do is to store the whole message body and parse it when
> the last block arrived. If the sender updated too many resources or used
> too large extensions, I'd need to 4.13-Request-Entity-too-large out.
> Still, until then (particularly until the overflowing package arrives),
> memory is clogged with unparsable data.
> 
My use case is the hypermedia collection in CoRE Interfaces-04, which has batch 
and linked batch (and group). I like the idea of being able to do these multi-resource 
updates in chunked operations on small devices. It seems the array format is suitable
for this use case. It does require a state machine at the receiver to map all of the elements
to the most recently sent base, but that is the nature of buffer-less processing.

>> Is it a bigger deal with CBOR encoding or less important. 
> 
> Whether CBOR or JSON is being parsed is almost irrelevant here. The only
> difference that comes to my mind is that with CBOR, we might be able to
> prescribe a serialization where "bn" and "bt" always precede "e" even in
> -01-style, but even that would make it harder to utilize for whomever
> implement this with a native-object-style library.
> 
>> Is the most important consideration being able to know the base values
>> ahead of the data?
> 
> This is about all information that may be in the dictionary and modifies
> how to interpret the elements' data. In core SenML, that information is
> the base attributes, and the version -- that is, all the possible root
> variables. (In a scenario .)
> 
> For extensions, this completely depends on them unless the receiver
> chooses to ignore them anyway. As far as I understand, your link format
> extension could add items to a core-interfaces linked batch, and then
> set a value in one go; for that, the extension data would need to
> precede the elements in parsing. Other extensions (say, metadata about
> when to expect changes) may be useful no matter where in the data stream
> it appears.
> 
> [1] https://tools.ietf.org/html/draft-ietf-core-interfaces-04#section-6.2 <https://tools.ietf.org/html/draft-ietf-core-interfaces-04#section-6.2>

The base and elements need to both be known to start the action. There is not 
a requirement  to know the base any time in advance of the elements. This is 
why mapping is OK, it just requires a buffer.
> 
>> Also, where is -03? there doesn’t seem to be a version -03 that you all are talking about on the IETF website:
>> https://tools.ietf.org/html/draft-jennings-core-senml-04 <https://tools.ietf.org/html/draft-jennings-core-senml-04>
> 
> The data tracker (or rather tools.ietf.org) seems not to take drafts
> submitted on the same day well. You can view it as [2].
> 
> [2] https://datatracker.ietf.org/doc/draft-jennings-core-senml/03/?include_text=1
> 
> On Sat, Jan 16, 2016 at 07:24:41AM -0800, Michael Koster wrote:
>> The optimizations you recommend require a new string input parser,
>> where I am using the JSON parser that comes with the library. My point
>> was that I will need to either make my own string parser for SenML or
>> make a re-parser with the state machine I described in order to use
>> -02 +[...]
> 
> I don't see yet where the state machine comes in. Surely, much
> state-machingin is going on in the optimized example, but when it comes
> to the core difference from -01 to -02, that is, that
> 
>    {any-key-but-e: any-data, "e": elements-list}
> 
> becomes
> 
>    [{any-key-but-e: any-data}, elements-list]
> 
> , I don't see how this is not changing access from _senml[key] to
> _senml[0][key] or _senml["e"] to _senml[1].
> 
> (If it's about the C. multiple-base-elements, I'd ask to keep these
> issues separate -- if C. breaks too much, although I like it, we could
> have a -02 with only two elements and still satisfy B., but I don't
> think we need to resort to that.)

The state machine comes in from assuming multiple base format. If there
is exactly one base and zero or more data elements, the array semantics 
can be used as you show.

If there is an array of maps, each with base and elements, then it’s natural 
to sequentially process them as separate base items.

If it’s an array with a base map followed by data elements, I will probably 
transform that to a map anyway and coalesce the data elements rather 
than refer to elements by [0] and [1:] in my higher level code.

senml[“e”] has semantic meaning, but I wrap it an an elements class anyway.
I guess I was using the format itself as a data model, thus refer to objects as 
“senml” objects. That is what feels broken by the new formats. I should not
tie the data model to the format.

> 
>> [...] but there is another issue anyway:
>> 
>> One of the big advantages of using JSON is connecting the embedded
>> world to the web world. To do this in a low-friction way, it’s good to
>> be able to use well known tools and patterns.
> 
> I fully agree that easy use of established web tools is essential here:
> but these very threads are where we should come up with mechanisms that
> work both with them and constrained devices.
> 
> 
> There is one point where I think that state-machining over incoming JSON
> objects does make sense, that is when it comes to names -- I like to see
> them as URIs (that's consistent with the examples given, but not fully
> required in the drafts), and when the "n" elements are relative URIs,
> they are not useful until joined with the base name. But that is not a
> change introduced in -01, but a preference of mine that is reflected in
> the demo to show that relative URIs are not that complicated even for
> embedded systems. (Existing web tools usually have their ways of joining
> URIs shipped).
> 
It is very important to the proper operation of collections and link embedding 
that the base name (uri of the collection) and resource names in the collection
(relative references from that base) are consistently and easily handled
(by the tools for the developer). 

Collection templates need to be reusable, therefore relative references to 
late binding base URIs and hypermedia controls are important.

That is our motivation to use the senml data model in the CoRE Interfaces 
collection models. It also why to me it makes sense to have a format where 
links and items are combined, for hypermedia operation.
> 
>> Thanks also for looking into my code. I could easily use the -02 or
>> -03 in the senml processing as you show. What is missing still from
>> everything after -01 is the element tag that I am re-using to indicate
>> links and forms in the document:
> 
> I did see that you had extensions in your code, but as Carsten
> suggested, the "l" could stay in the first dictionary alongside the "bn"
> and "ver". Maybe the name "base dictionary" is suboptimal here -- the
> dictionary itself is not something that gets somehow prefixed to the
> following entries, it's just the place where the base attributes reside
> along with other things, like "ver" and extensions.
> 
Yes, putting the links in with the base makes perfect sense. this would 
allow hypermedia based assembly of the following data elements. The
links would have “href” items that can be used to match and select 
items by name (“n”:) in the data stream.
> 
> On Sat, Jan 16, 2016 at 09:43:55AM -0800, Michael Koster wrote:
>> It would be useful if the order of top level element types could be arbitrary e.g.: [ {},[],[],[],{},[],{},{} ... ]
> 
> That's something I could well see in a -05. It would also make minimal
> SenML files that don't use any base attributes a little smaller, and
> embedded parsing wouldn't suffer from it AFAICT.
> 
It would give the system designer more options to optimize the semantics 
while keeping the required buffer size small.

> On Sat, Jan 16, 2016 at 10:33:00AM -0800, Michael Koster wrote:
>> PS for correctness my example should look more like this:
>> [ 
>>  { 
>>    "bn": "/light/brightness",
>>    "l": [ ... ],
>>    "f": [ ...],
>>    "e": [
>>      { "n": "tbr", "v": "44.3", },
>>      { "n": "tt", "v": "1.0", }
>>    ]
>>  }
>> ]
> 
> Just let's please not allow the "e" element any more. Anything with a
> top-level array is incompatible with older draft users anyway.
> 
We seem to be in agreement mostly. If it is important to get rid of the “e”
tag and it’s remnant from the flattened <e/> in the xml, that makes sense.
I am not attached at all to using the “e” tag.

Best regards,

Michael

> 
> Best regards
> Christian
> 
> -- 
> Christian Amsüss                      | Energy Harvesting Solutions GmbH
> founder, system architect             | headquarter:
> mailto:c.amsuess@energyharvesting.at  | Arbeitergasse 15, A-4400 Steyr
> tel:+43-664-97-90-6-39                | http://www.energyharvesting.at/
>                                      | ATU68476614