Re: [icnrg] Some thoughts on architectural choices for Manifests

christian.tschudin@unibas.ch Thu, 06 August 2020 02:33 UTC

Return-Path: <christian.tschudin@unibas.ch>
X-Original-To: icnrg@ietfa.amsl.com
Delivered-To: icnrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A83A53A0C84 for <icnrg@ietfa.amsl.com>; Wed, 5 Aug 2020 19:33:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.1
X-Spam-Level: ***
X-Spam-Status: No, score=3.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0rZfYquPEgQH for <icnrg@ietfa.amsl.com>; Wed, 5 Aug 2020 19:33:16 -0700 (PDT)
Received: from smtp22-priv.unibas.ch (smtp22-priv.unibas.ch [131.152.226.211]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3EE6C3A0C5B for <icnrg@irtf.org>; Wed, 5 Aug 2020 19:33:15 -0700 (PDT)
IronPort-PHdr: =?us-ascii?q?9a23=3AEZqalRR6YNuXVmrh0r2d6Ml+vdpsv+yvbD5Q0Y?= =?us-ascii?q?Iujvd0So/mwa6ybR2N2/xhgRfzUJnB7Loc0qyK6v6mBj1Lv8/JmUtBWaQEbw?= =?us-ascii?q?UCh8QSkl5oK+++Imq/EsTXaTcnFt9JTl5v8iLzG0FUHMHjew+a+SXqvnYdFR?= =?us-ascii?q?rlKAV6OPn+FJLMgMSrzeCy/IDYbxlViDanbr5+MRS7oR/Tu8QSjodvK7s9wQ?= =?us-ascii?q?bVr3VVfOhb2XlmLk+JkRbm4cew8p9j8yBOtP8k6sVNT6b0cbkmQLJBFDgpPH?= =?us-ascii?q?w768PttRnYUAuA/WAcXXkMkhpJGAfK8hf3VYrsvyTgt+p93C6aPdDqTb0xRD?= =?us-ascii?q?+v4btnRAPuhSwaLDMy7n3ZhdJsg6JauBKhpgJww4jIYIGOKfFyerrRcc4GSW?= =?us-ascii?q?ZdW8pcUTFKDIGhYIsVF+cPM+ZWoZfgqVQMrhW+CwejC+zzxTJTg3/6wbc30u?= =?us-ascii?q?Q9HQzc3gEtGc8FvnTOrNXyMacfSfy6zKnSzTXCdPNWxTb955LOchw7vfGMRq?= =?us-ascii?q?5/ccrMyUYyFgPFiE6dqZHjPzOUzesCqXOb4/B8WuKvjWMstg5+rCS1yMg2lo?= =?us-ascii?q?nJmpwaykrC9Shh3os4K9K2RUB6bNOnDpZetyCXOop5TM4iTWxlujs3xqAbtJ?= =?us-ascii?q?C7eCUG1ZAqyRDfZfGGcYWE/hLtWemeLDp7hn9oZLSyjAux/0i40uDwS8i53E?= =?us-ascii?q?tQoiZYjNXBsmoB2wLN5sWIUPdw8Vut1DCS3A7J8O5EO1o7la/DJp4kxb4/i4?= =?us-ascii?q?QcvFzYHi/zhEX2lKiWdlg4+uSw6+TofLHmppiEOoBwlw3zNroiltaiDek5MQ?= =?us-ascii?q?UCRXWX9f6i2LH9+0L1WLRKjvsonanFqJ3WOMUWqrOjDwNIzIou7wyzAym43N?= =?us-ascii?q?kXh3UKI05JdAqCj4fzOlHOJP74De24g1SpiDpk2/DGPqfgApXWMnjDka3ufb?= =?us-ascii?q?Bn505A0wo80dBf549JBbEAJvL+QVLxtNrZDhAiKQO02PzrB8l91o8GQ2KAHr?= =?us-ascii?q?eZML/OsV+P/u8gPvODZJELtzb6Mfgq/fjugGQ+mV8HZ6ap24YYaHe9Hvh8JU?= =?us-ascii?q?WWf2bsiM8bEWgWpgo+UPDqiFqaXD5Se3myWbg85j4gBYKnF4fDWo6tjKaG3C?= =?us-ascii?q?ehEZ0FLlxBXxqgGGnpe82+Hb8rciuUaIc1njUaWL+7QokJ0gunsgP9zvxhI/?= =?us-ascii?q?aCqQMCspe239hv5vbPlBg0syBvBMSA32CLZ3xyn2kBXHk93PMs6Xdhw0uOhP?= =?us-ascii?q?Ary8dTEsZesrYQCl83?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2GEAQBwaytf/yjggaENU4tVkHUlgQK?= =?us-ascii?q?OTIpWgWkLAQEBAQEBAQEBCC8EAQGBVoJ2AoIqJTgTAhABAQYBAQEBAQYEAQE?= =?us-ascii?q?ChhgBgm8pAYNqAQEBAgEjRAUNEAsSMAICSQ4GhhazZHaBMolDgSmBOIpHgSm?= =?us-ascii?q?DOIERM4FfLhsHLj6ECAELBwGDOIJgBI9ZAYojgSZDmS+BBgedFA+CbY8BjgS?= =?us-ascii?q?xfIFqgQtwgUCCak8mgzqBFIlaAgEXjkSBCwIGAQcBAQMJjUKCRgEB?=
X-IPAS-Result: =?us-ascii?q?A2GEAQBwaytf/yjggaENU4tVkHUlgQKOTIpWgWkLAQEBA?= =?us-ascii?q?QEBAQEBCC8EAQGBVoJ2AoIqJTgTAhABAQYBAQEBAQYEAQEChhgBgm8pAYNqA?= =?us-ascii?q?QEBAgEjRAUNEAsSMAICSQ4GhhazZHaBMolDgSmBOIpHgSmDOIERM4FfLhsHL?= =?us-ascii?q?j6ECAELBwGDOIJgBI9ZAYojgSZDmS+BBgedFA+CbY8BjgSxfIFqgQtwgUCCa?= =?us-ascii?q?k8mgzqBFIlaAgEXjkSBCwIGAQcBAQMJjUKCRgEB?=
X-IronPort-AV: E=Sophos;i="5.75,440,1589234400"; d="scan'208";a="65463682"
Received: from robinwoodap.surfnetc.com (HELO [192.168.1.22]) ([161.129.224.40]) by smtp22-ext.unibas.ch with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2020 04:33:09 +0200
Date: Wed, 5 Aug 2020 19:33:03 -0700 (PDT)
From: christian.tschudin@unibas.ch
X-X-Sender: tschudin@uusi
To: "David R. Oran" <daveoran@orandom.net>
cc: ICNRG <icnrg@irtf.org>
In-Reply-To: <C56B63E0-444A-409B-A68C-D3B0FF491E42@orandom.net>
Message-ID: <alpine.OSX.2.21.2008051903040.98272@uusi>
References: <C56B63E0-444A-409B-A68C-D3B0FF491E42@orandom.net>
User-Agent: Alpine 2.21 (OSX 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="0-19776863-1596681191=:98272"
Archived-At: <https://mailarchive.ietf.org/arch/msg/icnrg/HDtHi9VdNPkm62gW4lAanwUAHGA>
Subject: Re: [icnrg] Some thoughts on architectural choices for Manifests
X-BeenThere: icnrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Information-Centric Networking research group discussion list <icnrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/icnrg>, <mailto:icnrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/icnrg/>
List-Post: <mailto:icnrg@irtf.org>
List-Help: <mailto:icnrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/icnrg>, <mailto:icnrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Aug 2020 02:33:20 -0000

Hi Dave and Marc,

great to have your "unpacking"! After the ICNRG interim meeting I had
started to write down my viewpoint, yesterday I revised it and I was
about to revise it again - maybe I should just blast it out, same
TL;DR disclaimer.

Summary: I distinguish between
- packet (data object)
- real blob (a packet's content)
- virtual blob (content as defined by a manifest packet)
Manifests are the mechanism to define virtual blobs - full stop. The
"full stop" excludes object composition.

Relating to Dave's enumeration of options, my writeup would belong to #2
("Keep the current FLIC design, but build in some extensibility
features"). It would be a very lean FLIC, though.

Regarding "direct embedding" - I also think that this is essential to
have (I had called it inlining).

Regarding Marc's comments on annotation (thanks, too!): my writeup
belongs to your group 3 ("Separate Data/Content objects within the
FLIC world") in a strict way, the reason being that the encoding size
of a manifest's annotations in general cannot be bound, hence would be
the first candidate to benefit from virtual blobs as offered by the
manifest construct. If the encoding is small enough, the "separate
data structure" can be "directly embedded". I guess from your text
that such embedding is not equal to your "option 2" (parallel groups
in one manifest), but it could be close as you wrote yourself.

So here it goes, perhaps not a full manifest manifesto, but laying out
an argument for keeping FLIC minimal and version-resistant.

best, c

---

First, I re-tell the FLIC story in new terms (no reference to inodes)
for positioning FLIC and what it achieves. Then I will look into the
"parallel datastructure" concern and how directory tree organization
should be handled. I also include a grammar for such a lean FLIC.

As a preparation I introduce "derivation functions" which take some
packets (signed data objects) and produce new packets.  The first
derivation function of interest is "make_virtual_blob(h_1,..h_n)"
whose result is exactly one packet of type manifest that contains the
given sequence of packet hash values, these packet being either a
normal data object ("real blob") or a manifest spec.

A packet with a manifest spec defines a "virtual blob" which is data
of any size. The hash name of the manifest packet becomes the name of
the virtual blob; the virtual blob inherits the signature of the
manifest packet.

   real_blob_pkt.content() = justs tap into the named data object
   virtual_blob_pkt.content() = gather all content by traversing the manifest tree

   note that for real blobs, real_blob_pkt.content() equals real_blob_pkt.payload
   (introducing the notation of field access vs a procedure)

A manifest is a claim, namely that there exists a virtual blob which
is the concatenating of all referenced content: it's a claim
(attribution to a given manifest) that can only be verified by
materializing the virtual blob. The claim made by a manifest can fail
for several reasons, one being that some of the included hash values
never had a corresponding data packet or that some of the referenced
packets have been forgotten, or that the manifest spec is broken.

A second derivation function is "get_virtual_blob_content_size(h)".
Again, it is a claim about a manifest that can only be verified by
materializing the virtual blob. But unlike the first function, this
function is not essential for defining the virtual blob, it is mere
decoration (meta data). There are many more decorations:

vblob_size_in_records
vblob_size_in_UTF8_chars
vblob_tree_depth
vblob_number_of_leaf_blobs
creation_time_in_sec

The last example stands for a decoration that cannot be verified.

My suggestion is to leave decorations outside the FLIC draft, except
for the possibility to have decorations. That is, it's ok to use a
manifest for defining resource forks. I will come back to size
limitations in just a moment.

Resource forks are a well known pattern that also map well to the
networking world, for example layered video encodings, labeling
information for a video etc. These additional resources are optional
and not needed to build the virtual blob abstraction - they enrich
it. And these additional (parallel) resources all have the problem of
potentially being desynchronized - after all they are all claims that
the consuming end has to handle with caution.

Size is one among many of such decorations. Yes, one could request
that all manifest MUST include the size resource fork. But then the
FLIC draft must explain how a virtual blob consumer should handle
manifests that lack that resource fork, or what to do when traversal
reveals that a size value was wrong. The answer will probably include
a sentence like "if one of the correctness checks fails you must
revert to deriving that property yourself", which is another way of
saying that handling decoration data is decoration-specific. So better
have a separate "FLIC decoration draft" explaining the various ways of
adding decorations, including size: do it per-manifest (= store the
sum of all referenced subtrees sizes), or as a single table at the
root manifest that keeps the (size) labels for all nodes of the tree
etc.

My other argument (beside diminishing decorations as claims) for not
being concerned about potentially non-synchronous data is that code
for producing and consuming manifests itself is constantly evolving
and is not synchronized, due to bugs or partial implementations. Even
when resource forks themselves would be synchronized with a manifest's
content, the softwares (plural) will not - the next version of a
producer software may introduce wrong decorations expected by all
existing consumer software. Said differently: Synchronization loops
pass through software - making some decorations mandatory will not
prevent desynchronization.

Note that decorations can be useful for applications _and_ for the
manifest traversal algorithms - I did not keep these two usages apart,
there is only one decorations field. As a side effect, manifests thus
provide a "decorated virtual blob" abstraction.

When it comes to the question of using manifests to express
directories or similar trees of components, we should not confound
decoration with object composition. It looks as if one could use the
(hash) table as a table for sub-component data. One argument here is
that we will bump into the size limitation of (manifest) packets
wherefore we should compose objects out of virtual blobs _and_ the
composition spec also be stored in a separate virtual blob. For
example, what if the file name is larger than fits in one packet?
It is much cleaner to store the mapping table (that maps from file
attributes like names to virtual blobs) in a virtual blob.

Potential confusion can arise when we build a directory tree where
files and directories are all stored in virtual blobs. Isn't this
a manifest tree of manifest trees? No, it is not because they operate
at different levels. When traversing a manifest tree, we extract hash
references from the manifest's fields.  When traversing directories,
we extract hash references from inside the virtual blob (defined by
that manifest). These are very different memory zones.

Here is a writeup in form of a grammar/definitions:

packet := length-limited data object (NDN, CCN)

          I assume that a data object has type information such that
          "real blob" can be distinguished from "manifest packet". Having
          a hash name and retrieving the data object will tell us what
          kind of packet it is. Anything else (e.g., tagging hash
          pointers whether they are pointing to real blobs or a manifest
          pkt is a claim and thus unreliable. It's fine to keep such
          tags as a decoration but these are not needed to implement
          the virtual blob abstraction.

r_pkt := packet with TLV for "real blob"
m_pkt := packet with TLV for "manifest"

r_pkt.payload = any                              # the "real blob"
m_pkt.payload = manifest_spec

manifest_spec := deco + cont_seq                 # two fields, must fit in one packet
deco          := None | hashval | inlined
cont_seq      := sequence of (hashval | inlined)

vblob_content(h) is defined as follows:
           if type(fetch(h)) == r_pkt:
                   fetch(h).payload
           elif type(fetch(h)) == m_pkt:
                   concat( vblob_content(j) for all j in fetch(h).payload.cont_seq )
                   # if j is inlined content, this is used instead of vblob_content(j)
           else:
                   fail


# The following would be part of a separate "FLIC decor draft":

vblob_deco(h, fork_name) is defined as follows:
           if type(fetch(h)) == r_pkt:
                   None
           elif type(fetch(h)) == m_pkt:
                   d = fetch(h).payload.deco
                   if d == None:
                        None
                   elif d is inlined:
                        d
                   else:
                        vblob_content(d).access(fork_name)
           # the mapping for fork_name to the resource, labeled as "access" above,
           # would also be part of that separate "FLIC decor draft"


# How would composition e.g., directories look? We show content access
# via a file's name and assume that the directory data structure maps
# a file name to some hash value (which names a virtual blob
# containing the file's content). This content access has no
# similarities with above content retrievals, as it calls
# vblob_content() twice:

dir_get_content(dir_hash_val, file_name):
         vblob_content( vblob_content(dir_hash_val).access(file_name) )

# we can decorate files because they are stored as virtual blobs:

dir_get_deco(dir_hash_val, file_name):
         vblob_deco( vblob_content(dir_hash_val).access(file_name) )

Composition does not require new NDN/CCN types: it's an application
level issue whether it wants to define a "type decoration", or put a
magic number (type) inside each directory virtual blob, and how it
encodes the composition spec.


Conclusions: The sole role of manifests should be to abstract away
from blob size limitations, providing transparent access to virtual
blobs (vblobs) of arbitrary size. A manifest defined as a list of hash
values, together with manifest recursion, are all the tricks we need -
plus name constructors for forwarding reasons, not discussed in this
writeup. "Decorations" (meta data) are important for speedup reasons,
so there must be a hook for them. As decorations may require the
support of vblobs themselves, the suggestion is that the hook comes in
form of a single hash value (that references a vblob), instead of
hardwiring decoration information into the manifest packet. Inlining
is still possible as an optimization. It's also possible to use the
manifest construct for simply decorating a real or virtual blob. For
example, decorations could carry additional signatures, or type
information outside NDN/CCN's TLV system. Composing objects (stored in
vblobs) is different from glueing together blobs (to create
vblobs). Similar to storing decoration data, composition should be
layered on top of the vblob abstraction using one vblob for every
component.

---

On Wed, 5 Aug 2020, David R. Oran wrote:

> 
> These are a bit stream-of-consciousness, and definitely susceptible to TL;DR, so treat
> accordingly.
> 
> I’ve been mulling over the issues we discussed at the ICNRG Interim around FLIC and I’d
> like to unpack a few things along a slightly different axis than Marc did in his two
> excellent messages.
> 
> My first thought is that the genesis of FLIC along the lines of iNodes in Unix was
> probably a really good way to crystalize the problem of representing big single objects
> that need to be chopped up into pieces. This is what happens when layering a file
> system directly onto a disk with fixed-size blocks. If one carries this through
> directly and solves only that problem, it means FLIC’s design derives from the
> following:
>
>  1.
>
>     It is designed to represent the chunking of a single object, not enumerating a
>     collection or other uses.
>
>  2.
>
>     It has a hierarchical (tree, but possibly digraph) representation since the data
>     structure itself has to fit in the same chunking limits as the underlying data
>     objects.
>
>  3.
>
>     It needs the ability to extend/append to cover cases where the object itself
>     supports append operations.
>
>  4.
>
>     It may need some indexing goop and size information if seek operations to
>     particular bytes in the object is needed. If everything is fixed-size-per chunk and
>     that size is know a priori, pointer array indexing suffices.
> 
> These seem fundamental to the model, but there are some less obvious implications if
> one adheres closely to the iNode/filesystem analogy:
>
>  * 
>
>     The data structure is meant to be interpreted by a single client piece of code, not
>     directly read/written by multiple pieces of code that don’t know about one another
>     or are unsure what the thing its pointers point to are.
>
>  * 
>
>     It’s a data structure meant to be used by some middle system software, not random
>     applications
>
>  * 
>
>     It is closely bound to the particular client wanting to use it - most different
>     filesystems in fact use a different format for iNodes - they are somewhat different
>     for EXTx, VFS, HFS+, etc. and the “on-disk structures” are incompatible. The
>     implication is that if we want different structures for ICN collections as the
>     technology matures, one would expect the FLIC data structure to change in
>     incompatible ways.
>
>  * 
>
>     modern file systems actually eschew the first-level iNode entries for small files
>     whose data fits in a single disk block and instead embed the data directly in the
>     iNode containing the directory entry for the file. This has now mixed things
>     together quite a bit in modern file systems, so iNodes are’t the simple low-level
>     thing they once were and directories are not so cleanly layered on iNodes as they
>     once were. The performance advantages are dramatic, and one might expect pretty big
>     performance gains for doing the same for ICN. This means, at a minimum, that
>     supporting packaging a data object “inside” a manifest might in fact be pretty
>     important. There are a bunch of ways to do this, but not considering this now would
>     in my view be a mistake.
> 
> So, if we take this view for the architecture for FLIC:
>
>  *  FLIC is just a convention for a private data structure that can be used by an
>     application
>  *  FLIC is for breaking down single objects; not for describing any other kind of
>     collection.
>  *  There is no expectation that a FLIC Manifest is understandable by multiple
>     applications of the same data
>  *  It’s up to application “magic” using either namespace conventions or some separate
>     discovery machinery to figure out what Name to use in an Interest to fetch a FLIC
>     manifest (or piece thereof).
>  *  We don’t need extension mechanisms, since an application can just make up its own
>     manifest format by modifying FLIC as needed - the code will just “follow” the app
>     in the same way that iNode formats are closely bound to the particular filesystem
>     built on top.
> 
> If you find the above exposition at all convincing, I think there are a number of
> possible ways forward. Let me try to enumerate them:
>
>  1. Define FLIC as limited to the above, and punt anything else for “later”.
>  2. Keep the current FLIC design, but build in some extensibility features (but don’t
>     define any extensions now). This would include at a minimum a versioning scheme and
>     some TLVs so additional things other than pointer arrays of hashes can be
>     expressed.
>  3. Define FLIC as above, but instead make it an “interior” data structure of a more
>     general Manifest format that we work on in parallel. That more general data
>     structure would encompass the stuff we have currently put in the base FLIC
>     Manifest. A good design would allow the FLIC manifest to either be embedded inside
>     the general Manifest, or externally referenced via one of the pointers.
>  4. Continue down the current path, by having one general Manifest format that is
>     extensible and contains the features we considered important, like Name Construtors
>     and annotated pointers.
> 
> In addition to the above, I think any general Manifest format should allow direct
> embedding of content objects to handle the cases of simple small data (e.g. IoT sensors
> etc.) so fetch doesn’t require extra RTTs. This might be useful even in the case of
> FLIC, to allow the same primitive data object to either be fetched independently, or
> with Manifest, and the signature bound to the manifest rather than the data object,
> making re-signing independent of the original data producer code.
> 
> I have my own views on which of these directions we should pursue, but for the purposes
> of this email the question I’d like to ask is whether the taxonomy above is a good way
> to think about this and are there other options for how to architecturally represent
> things I haven’t thought of?
> 
> DaveO
> 
> 
>