Re: [icnrg] Some comments on the CCNx Selectors draft (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/)

christian.tschudin@unibas.ch Mon, 03 June 2019 18:17 UTC

Return-Path: <christian.tschudin@unibas.ch>
X-Original-To: icnrg@ietfa.amsl.com
Delivered-To: icnrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1B3F412011C for <icnrg@ietfa.amsl.com>; Mon, 3 Jun 2019 11:17:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.197
X-Spam-Level:
X-Spam-Status: No, score=-4.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RriDP3FFApnd for <icnrg@ietfa.amsl.com>; Mon, 3 Jun 2019 11:17:37 -0700 (PDT)
Received: from smtp12-priv.unibas.ch (smtp12-priv.unibas.ch [131.152.226.209]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A732C120045 for <icnrg@irtf.org>; Mon, 3 Jun 2019 11:17:35 -0700 (PDT)
IronPort-PHdr: 9a23:/dhWtxFhvIlcCko9g/YfMp1GYnF86YWxBRYc798ds5kLTJ7ypcWwAkXT6L1XgUPTWs2DsrQY0rOQ6vu/EjVZvt6oizMrSNR0TRgLiMEbzUQLIfWuLgnFFsPsdDEwB89YVVVorDmROElRH9viNRWJ+iXhpTEdFQ/iOgVrO+/7BpDdj9it1+C15pbffxhEiCCybL9vMRm6txjdu8cUjIdtN6o91xnEqWZUdupLwm9lOUidlAvm6Meq+55j/SVQu/Y/+MNFTK73Yac2Q6FGATo/K2w669HluhfFTQuU+3sTSX4WnQZSAwjE9x71QJH8uTbnu+Vn2SmaOcr2Ta0oWTmn8qxmRgPkhDsBOjUk62zclNB+g7xHrxKgvxx/wpDbYIeJNPplY6jRecoWSXddUspNUiBMBJ63YYkSAOobJetWoYnzqUUToxS8BgesCu3gxTBUiXLt2K02z/4sEAHa0AE6Hd8DtmnfotXvNKcVVOC41LXHzTXZb/NXwjf99InIfQonof2WQbJwatbeyUkyFwzYj1WQr5foPy6T1uQMqGeU9fFgWfizhG4nrQx6vzahxsApiobTh4IVzEjJ9SRnz4YpK920Ukl7YcSrEJZWqiqUNJN2T9s/T2xmpSo20KAKtJ6lcCQQyJkqyATTZ+GDfoWJ5B/oSfyfLi1ihH1/fbKynxOy8U+9xeLiTsS0y1NKrjZdktnLq3ANywTf6siZRft5+UeswSuA1wXS6u1dPEA0jrHUK5o7zb42j5YTrFnDHjT3mEnrlqOZa0Ak+umy5+T6ZLXmp4STOJVvig3kLqsumtSzAeU+MgcQQ2iW4fmw2bP+8UHjXblHjuM6nrPZvZ3VP8gXu6q0Dg5N3oYm8Rm/DjOm0NoCnXkAKVJIYA6Ij4jzO1HPO/D4Efa/jE6qkDtx2/DGJaHuApXQLnfekbfhe61w61NayAoy1t9Q/YlUBqsdL/LzQkPxrsDXDgclMwyoxObqEMhy2ZkAWW2RBa+ZKrndsVmT6+IoOemDfokVtyv6K/gg/fLui2E2mUMFd6mzwZQXcGy4HuhhI0iBZHrsh9ABEXwJvgo5V+HqkEeNUSRPaHqoQ6084TQ7Apq8DYjfXoCtnKCB3CCjE51Xem9GDEqMEXjzeoWFQfcMdCySLtVmkjweWrjyA7MmgFuCvRH7x/JdaKLz4CQe/9q32NFr6urJnBca8iZ9Ccia1ieLQn0izU0SQDpj16BloFdhy16Fl7RjiPxFGd1Vz+5PUw0zLtjXz78pQ+vuUx7MK4/aAG2tRc+rVHRoFoo8
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2GdAgDNYvVc/8Q2mINmHAEBAQQBAQcEAQGBZYIRaoEEhDyTCIFoJY5VjAMJAQEBAQEBAQEBCCMMAQEChD4CgxMjOBMBAwEBBQEBAQEEAQECaRwMgjopARRNawEBAQEBASMCRC0BBAEjVhALOwcCAlcGgzaBew8PqSiBMYRGQYN6gSoGgTSJFYJFghaBRIFhSQcuPoJhAgMBgUcCgx6CNiIEi1uDL4QUlVsHAohOjQ4MlmKEEI97jzuBZiKBWIQtghkBFxQYiDWFQUCPI4JRAQE
X-IPAS-Result: A2GdAgDNYvVc/8Q2mINmHAEBAQQBAQcEAQGBZYIRaoEEhDyTCIFoJY5VjAMJAQEBAQEBAQEBCCMMAQEChD4CgxMjOBMBAwEBBQEBAQEEAQECaRwMgjopARRNawEBAQEBASMCRC0BBAEjVhALOwcCAlcGgzaBew8PqSiBMYRGQYN6gSoGgTSJFYJFghaBRIFhSQcuPoJhAgMBgUcCgx6CNiIEi1uDL4QUlVsHAohOjQ4MlmKEEI97jzuBZiKBWIQtghkBFxQYiDWFQUCPI4JRAQE
X-IronPort-AV: E=Sophos;i="5.60,547,1549926000"; d="scan'208";a="5925337"
Received: from dmi-usblan-uusi1.dmi.unibas.ch ([131.152.54.196]) by smtp12-ext.unibas.ch with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Jun 2019 20:17:32 +0200
Date: Mon, 03 Jun 2019 20:17:31 +0200
From: christian.tschudin@unibas.ch
X-X-Sender: tschudin@uusi
To: "David R. Oran" <daveoran@orandom.net>
cc: "Mosko, Marc" <mmosko@parc.com>, ICNRG <icnrg@irtf.org>
In-Reply-To: <89236739-5137-496D-8A6A-699730D1CC82@orandom.net>
Message-ID: <alpine.OSX.2.21.1906031812100.51254@uusi>
References: <89236739-5137-496D-8A6A-699730D1CC82@orandom.net>
User-Agent: Alpine 2.21 (OSX 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="0-1111046620-1559585851=:51254"
Archived-At: <https://mailarchive.ietf.org/arch/msg/icnrg/A8bN6QdexQXOWuWsMny17gvBEUw>
Subject: Re: [icnrg] Some comments on the CCNx Selectors draft (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/)
X-BeenThere: icnrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Information-Centric Networking research group discussion list <icnrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/icnrg>, <mailto:icnrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/icnrg/>
List-Post: <mailto:icnrg@irtf.org>
List-Help: <mailto:icnrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/icnrg>, <mailto:icnrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Jun 2019 18:17:40 -0000

Thanks David for creating a discussion around this!

Looking at discovery from a function invocation perspective certainly 
resonates well, of course. However, NFN focuses on producing immutable 
results out of immutable arguments, and still makes the most sense in 
such an environment. You target the discovery of potentially new 
content, such that the same query will return different results if asked 
again later, right? That leads me to this comment:

Your writeup did not mention time nor provenance, which I think are 
important in this discussion, another concept to mention would be "query 
expression".

With provenance I mean the state (context) where new content was 
published. This state acts as logical time and permits to "immutify" 
results. The exchange would be:

- fetch a repo's state, or time horizon, S
   (this is the only mutable result call needed and permits to bind to
   that state, instead of a server or service)

followed by one or more

- discover(repo=/ndn/ucla, horizon=S, selector=/unix/ls(/some/repo/path, "*.txt"))

This discovery call will return the same result even of the repo is 
replicated: any repo replica can execute the call in the specific 
context S, no need to pin your discovery to a specific server, and the 
result is fully cachable (if in clear).

A simple way to explain what a repo state S is, would be to point to the 
commit hash of a Git repo. A NDN repo would have to keep track of new 
content through commits, have a log of its actions. I think that the 
word "provenance" is appropriate here, namely how/where/when some 
content was added to the repo, or renamed, aliased etc.

Sure, manifests are good, but I think we can do better, and by better I 
mean repo-side filtering. Once results are immutable, we can have 
deterministic and stateless paging, cursors and other result mangling 
under the requestor's control:

The last 10 entries:
- /nfn/txt/tail(/ndn5/discover(horizon=S, selector=/unix/ls(/ccnx/some/prefix, "*.txt"), -10)
Just counting hits:
- /nfn/txt/wc(/ndn5/discover(horizon=S, selector=/unix/ls(/ccnx/some/prefix, "*.txt"))

I see your objection coming that general NFN is too heavy for minimal 
content discover, and I would agree. But what about a modest query 
_language_ for repo-side execution instead of exposing single RPCs that 
the caller has to chain in multiple RTs? We might pick a small set of 
result filters (selectors really), that are easy to implement in all 
repos, and permit to combine up to three filters in a query expression 
as an example. So no loops, no lambda... Filters could be:

   isDir(), newerThan(d), hasMagic(GIF), publishedBy(key), lexicoAfter(marker)

Then we add Python slicing as in [10:-5], or just classic head() and 
tail(), as well as AND, NOT, possibly OR.

Datalog could serve as a high-end starting point for such a discussion.

Best, c


On Sun, 2 Jun 2019, David R. Oran wrote:

> 
> All comments with <chair hat off>
> 
> Content discovery is important, so I’d like to see something adopted by the
> RG for formal progression to experimental.
> 
> I’d like to step back a bit though and ask if evolving the approach in CCNx
> 0.x and NDN is appropriate knowing what we’ve learned in the last few years.
> The current selectors draft
> (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/) basically
> fixes the original design mistake of forcing all forwarders to do discovery
> through prefix match on CS data. It does this by leveraging typed name
> components and using an encapsulation technique so that whatever object that
> matches the selectors can be returned as “native” - with its original name
> and signature.
> 
> On the other hand, it still has the following properties:
>
>  * 
>
>     returns only one object when multiple might match
>
>  * 
>
>     does not have a completely deterministic answer in the presence of
>     cache/repo/producer instance desynchronization (not clear this is
>     avoidable since we have a distributed system with at best eventual
>     consistency semantics)
>
>  * 
>
>     requires exclusions, which can make for big interests that in turn
>     causes fragmentation to be needed and complicates congestion control as
>     a consequence.
>
>  * 
>
>     seems to still permit cache exploration (since the selector interests
>     are not authenticated), which might be viewed as a privacy problem.
> 
> I don’t have a concrete proposal to make though. Here are a few thoughts
> about a possible alternative approach:
>
>  * 
>
>     Since we depend a lot more on Manifests now that the original NDN & CCNx
>     0.x designs, why not have the discovery return a Manifest instead of a
>     single encapsulated object?
>
>  * 
>
>     Since we have fast exact match and nameless objects along with now
>     permitting objects to have multiple names and hence multiple ways
>     objects can be discovered, perhaps we can exploit that to allow
>     different discovery patterns rather than smooshing everything into a
>     static/brittle set of selectors
>
>  * 
>
>     Some of the properties were driven by trying to make discovery cheap and
>     ubiquitous and hence forced into casting discovery as a direct single
>     Interest/Data exchange. Since we now know that pawing through a large CS
>     or Repo can be computational expensive (and even require a fair amount
>     of I/O), maybe this tradeoff isn’t attractive. We now have experience
>     with distributed method invocation (e.g. NFN and RICE) so maybe we do
>     discovery as an explicit remote method. This may solve a few problems:
>
>      +  We can do authentication and key exchange, so discovery operations
>         and results can be more private and cache exploration is no longer a
>         hazard
>      +  It might make it easier to build extensibility into the supported
>         discovery patterns as all of them might not need to be supported by
>         all discovery services. (Aside - it seems that a similar direction
>         is being taken in the fast-repo work for NDN, where a given
>         application can design patterns for the Repo to fetch the right
>         stuff for storage of that application’s data).
>      +  We can more easily manage the resources for expensive discovery
>         operations, and following the idea of using Manifests, make those
>         the result of the discovery computation.
>      +  By essentially “binding” to one of the discovery services/servers,
>         it may be possible to get a more internally-consistent set of
>         results.
> 
> Yes, this might make discovery a lot more “expensive” in terms of RTTs but
> in the end it might also make applications easier to write with patterns
> that look more like what the application considers “native” and the results
> as a Manifest easier to paw through and possibly iterate on.
> 
> Thoughts?
> 
> DaveO
> 
> 
>