Re: [core] Designs to resolve streaming issues in SenML

Christian Amsüss <c.amsuess@energyharvesting.at> Fri, 15 January 2016 08:41 UTC

Return-Path: <c.amsuess@energyharvesting.at>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4784D1A89B3 for <core@ietfa.amsl.com>; Fri, 15 Jan 2016 00:41:46 -0800 (PST)
X-Quarantine-ID: <jEqNTyiOJ3wd>
X-Virus-Scanned: amavisd-new at amsl.com
X-Amavis-Alert: BANNED, message contains text/plain,.exe
X-Spam-Flag: NO
X-Spam-Score: 0.4
X-Spam-Level:
X-Spam-Status: No, score=0.4 tagged_above=-999 required=5 tests=[BAYES_05=-0.5, J_CHICKENPOX_54=0.6, MIME_8BIT_HEADER=0.3] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jEqNTyiOJ3wd for <core@ietfa.amsl.com>; Fri, 15 Jan 2016 00:41:44 -0800 (PST)
Received: from prometheus.amsuess.com (prometheus.amsuess.com [5.9.147.112]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 661D61A89B1 for <core@ietf.org>; Fri, 15 Jan 2016 00:41:44 -0800 (PST)
Received: from poseidon-mailhub.amsuess.com (unknown [IPv6:2a02:b18:c13b:8001:a800:ff:fede:b1bd]) by prometheus.amsuess.com (Postfix) with ESMTPS id CC9644184F; Fri, 15 Jan 2016 09:41:41 +0100 (CET)
Received: from poseidon-mailbox.amsuess.com (poseidon-mailbox.amsuess.com [10.13.13.231]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 4A8A82D; Fri, 15 Jan 2016 09:41:39 +0100 (CET)
Received: from hephaistos.amsuess.com (hephaistos.amsuess.com [10.13.13.129]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id CD9D92F; Fri, 15 Jan 2016 09:41:38 +0100 (CET)
Received: (nullmailer pid 27834 invoked by uid 1000); Fri, 15 Jan 2016 08:41:38 -0000
Date: Fri, 15 Jan 2016 09:41:38 +0100
From: Christian Amsüss <c.amsuess@energyharvesting.at>
To: Michael Koster <michaeljohnkoster@gmail.com>
Message-ID: <20160115084138.GA18563@hephaistos.amsuess.com>
References: <175A1806-ACB0-4FC7-A318-2A58FF66CDD2@cisco.com> <85DB69FC-A20E-4314-AAB0-5E8AA8B5E5E6@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="k+w/mQv8wyuph6w0"
Content-Disposition: inline
In-Reply-To: <85DB69FC-A20E-4314-AAB0-5E8AA8B5E5E6@gmail.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Archived-At: <http://mailarchive.ietf.org/arch/msg/core/MC4n0BErwss8lMj9OIBVowJ_3VQ>
Cc: "Cullen Jennings (fluffy)" <fluffy@cisco.com>, core <core@ietf.org>
Subject: Re: [core] Designs to resolve streaming issues in SenML
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jan 2016 08:41:46 -0000

Hello Michael,

On Thu, Jan 14, 2016 at 06:43:38PM -0800, Michael Koster wrote:
> As pointed out, the streaming format can only be processed in
> streaming form. Storing this in memory using a JSON parser does not
> result in a structure that can be easily sorted or indexed by base
> element. If I were to design a parser for this, it would have a state
> machine that would consume the top level array elements in sequence
> and create an in-memory structure that looks like the structures
> described in draft-02:

thank you for clarifying what's meant by "forced to do stream
processing" (after all, I thought, you can still buffer things).

As I understand it, your concern is keeping your language's native
representation of the JSON structures as the internal state of your
SenML objects -- I'll argue that this is possible with -3 as well as
with -2.

> With these earlier structures I could use standard JSON parse tools
> and then selection algorithms that match the “bn” value and select the
> contents within that item’s scope (for patching, etc.)

I'll be assuming a Python implementation derived from
resource/SenmlHandler.py in your MachineHypermediaToolkit; other query
schemes should be similar to adapt. I'll also assume an older variant of
-03 where there are only two top-level elements allowed (the even-odd
scheme came from another thread) in the interest of brevity, but would
be happy to demonstrate a repeating-elements version if this is
unsatisfying.

Where a -01 implementation would be initialized as

    class Senml():
        def __init__(self, items=None, baseName=None):
            self._items = SenmlItems(items)
            self._senml = {}
            self._senml["e"] = self._items._items

replacing the last two lines with

            self._senml = [{}, self._items._items]

deals with the item locations, and base name assignment goes from

    self._senml["bn"] = baseName

to

    self._senml[0]["bn"] = baseName

. One or two more changes along similar lines should have the code base
covered, with no changes to SenmlItems being required.


That being said, I think that even with -01, such a data structure might
not be an ideal candidate for many applications, because it necessitates
(even though unimplemented in MachineHypermediaToolkit) that access to
both event names and values, they be added to the respective base
values for meaningful comparison. When elements are accessed repeatedly,
going over the whole structure at read time and building decompressed
values in memory would make things more straightforward as a whole. I
don't want to get into a discussion about implementation of memory
structures, though -- my point is that the mechanisms that incite such
discussion are present both in -01 and later.

> The other issue is that in draft-01 the element tag “e” maps to the
> XML serialization <e/> tags nicely. I have tools that process both XML
> and JSON that understand this.

Both -03 and -04 update their XML serializations to match the JSON
pretty. I see shortcomings in -03 (how would extensions with non-scalar
data store their elements in the head?), but they could probably be
mitigated by moving the head elements in a dedicated <b> element
and all the children into an <es> element.

> Also, this is easily extensible to add more element classes to the
> senml data model, which I have done of links and forms.

How is this not possible any more by adding them to the first
dictionary?

Extensions whose interpretations do not depend on "bn" or other entries
in the base dictionary are even processable easily by constrained
devices.

(I'm still unconvinced that a root "l":{links} is better than a
"v":{links} in elements, but would prefer to stick to the topic of "is
-03/-04 a regression" here).


I'm confident we can work a single SenML that works for all relevant
applications.

Christian

-- 
You don't become great by trying to be great. You become great by
wanting to do something, and then doing it so hard that you become great
in the process.
  -- Marie Curie (as quoted by Randall Munroe)