[icnrg] Some thoughts on architectural choices for Manifests

"David R. Oran" <daveoran@orandom.net> Wed, 05 August 2020 17:16 UTC

Return-Path: <daveoran@orandom.net>
X-Original-To: icnrg@ietfa.amsl.com
Delivered-To: icnrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8BA2F3A0DC9 for <icnrg@ietfa.amsl.com>; Wed, 5 Aug 2020 10:16:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cmI4Jhz9pRHN for <icnrg@ietfa.amsl.com>; Wed, 5 Aug 2020 10:16:38 -0700 (PDT)
Received: from spark.crystalorb.net (spark.crystalorb.net [IPv6:2607:fca8:1530::c]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E8493A0B9D for <icnrg@irtf.org>; Wed, 5 Aug 2020 10:16:38 -0700 (PDT)
Received: from [192.168.15.161] ([IPv6:2601:184:407f:80ce:ed19:ca54:e785:8297]) (authenticated bits=0) by spark.crystalorb.net (8.14.4/8.14.4/Debian-4+deb7u1) with ESMTP id 075HGWtq000991 (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256 verify=NO) for <icnrg@irtf.org>; Wed, 5 Aug 2020 10:16:34 -0700
From: "David R. Oran" <daveoran@orandom.net>
To: ICNRG <icnrg@irtf.org>
Date: Wed, 05 Aug 2020 13:16:27 -0400
X-Mailer: MailMate (1.13.1r5701)
Message-ID: <C56B63E0-444A-409B-A68C-D3B0FF491E42@orandom.net>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="=_MailMate_F206A609-F485-4B33-9CE7-8C6F06352035_="
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/icnrg/IJ6_eMW34EPi6VxGv6P7Ojroifs>
Subject: [icnrg] Some thoughts on architectural choices for Manifests
X-BeenThere: icnrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Information-Centric Networking research group discussion list <icnrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/icnrg>, <mailto:icnrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/icnrg/>
List-Post: <mailto:icnrg@irtf.org>
List-Help: <mailto:icnrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/icnrg>, <mailto:icnrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Aug 2020 17:16:42 -0000

These are a bit stream-of-consciousness, and definitely susceptible to 
TL;DR, so treat accordingly.

I’ve been mulling over the issues we discussed at the ICNRG Interim 
around FLIC and I’d like to unpack a few things along a slightly 
different axis than Marc did in his two excellent messages.

My first thought is that the genesis of FLIC along the lines of iNodes 
in Unix was probably a really good way to crystalize the problem of 
representing big single objects that need to be chopped up into pieces. 
This is what happens when layering a file system directly onto a disk 
with fixed-size blocks. If one carries this through directly and solves 
**only** that problem, it means FLIC’s design derives from the 
following:

1. It is designed to represent the chunking of a single object, not 
enumerating a collection or other uses.

2. It has a hierarchical (tree, but possibly digraph) representation 
since the data structure itself has to fit in the same chunking limits 
as the underlying data objects.

3. It needs the ability to extend/append to cover cases where the object 
itself supports append operations.

4. It may need some indexing goop and size information if seek 
operations to particular bytes in the object is needed. If everything is 
fixed-size-per chunk and that size is know a priori, pointer array 
indexing suffices.

These seem fundamental to the model, but there are some less obvious 
implications if one adheres closely to the iNode/filesystem analogy:

- The data structure is meant to be interpreted by a **single** client 
piece of code, not directly read/written by multiple pieces of code that 
don’t know about one another or are unsure what the thing its pointers 
point to are.

- It’s a data structure meant to be used by some middle system 
software, not random applications

- It is closely bound to the particular client wanting to use it - most 
different filesystems in fact use a different format for iNodes - they 
are somewhat different for EXTx, VFS, HFS+, etc. and the “on-disk 
structures” are incompatible. The implication is that if we want 
different structures for ICN collections as the technology matures, one 
would expect the FLIC data structure to change in incompatible ways.

- modern file systems actually eschew the first-level iNode entries for 
small files whose data fits in a single disk block and instead embed the 
data directly in the iNode containing the directory entry for the file. 
This has now mixed things together quite a bit in modern file systems, 
so iNodes are’t the simple low-level thing they once were and 
directories are not so cleanly layered on iNodes as they once were. The 
performance advantages are dramatic, and one might expect pretty big 
performance gains for doing the same for ICN. This means, at a minimum, 
that supporting packaging a data object “inside” a manifest might in 
fact be pretty important. There are a bunch of ways to do this, but not 
considering this now would in my view be a mistake.

So, if we take this view for the architecture for FLIC:

- FLIC is just a convention for a private data structure that can be 
used by an application
- FLIC is for breaking down single objects; not for describing any other 
kind of collection.
- There is no expectation that a FLIC Manifest is understandable by 
multiple applications of the same data
- It’s up to application “magic” using either namespace 
conventions or some separate discovery machinery to figure out what Name 
to use in an Interest to fetch a FLIC manifest (or piece thereof).
- We don’t need extension mechanisms, since an application can just 
make up its own manifest format by modifying FLIC as needed - the code 
will just “follow” the app in the same way that iNode formats are 
closely bound to the particular filesystem built on top.

If you find the above exposition at all convincing, I think there are a 
number of possible ways forward. Let me try to enumerate them:

1. Define FLIC as limited to the above, and punt anything else for 
“later”.
2. Keep the current FLIC design, but build in some extensibility 
features (but don’t define any extensions now). This would include at 
a minimum a versioning scheme and some TLVs so additional things other 
than pointer arrays of hashes can be expressed.
3. Define FLIC as above, but instead make it an “interior” data 
structure of a more general Manifest format that we work on in parallel. 
That more general data structure would encompass the stuff we have 
currently put in the base FLIC Manifest. A good design would allow the 
FLIC manifest to either be embedded inside the general Manifest, or 
externally referenced via one of the pointers.
4. Continue down the current path, by having one general Manifest format 
that is extensible and contains the features we considered important, 
like Name Construtors and annotated pointers.

In addition to the above, I think any general Manifest format should 
allow direct embedding of content objects to handle the cases of simple 
small data (e.g. IoT sensors etc.) so fetch doesn’t require extra 
RTTs. This might be useful even in the case of FLIC, to allow the same 
primitive data object to either be fetched independently, or with 
Manifest, and the signature bound to the manifest rather than the data 
object, making re-signing independent of the original data producer 
code.

I have my own views on which of these directions we should pursue, but 
for the purposes of this email the question I’d like to ask is whether 
the taxonomy above is a good way to think about this and are there other 
options for how to architecturally represent things I haven’t thought 
of?


DaveO