Re: [icnrg] Some comments on the CCNx Selectors draft (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/)

"Mosko, Marc <mmosko@parc.com>" <mmosko@parc.com> Mon, 03 June 2019 19:40 UTC

Return-Path: <mmosko@parc.com>
X-Original-To: icnrg@ietfa.amsl.com
Delivered-To: icnrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 671251207CD for <icnrg@ietfa.amsl.com>; Mon, 3 Jun 2019 12:40:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=parc.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q9mFtqqpJtaF for <icnrg@ietfa.amsl.com>; Mon, 3 Jun 2019 12:40:09 -0700 (PDT)
Received: from NAM01-BY2-obe.outbound.protection.outlook.com (mail-eopbgr810078.outbound.protection.outlook.com [40.107.81.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 011281207D0 for <icnrg@irtf.org>; Mon, 3 Jun 2019 12:40:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=parc.onmicrosoft.com; s=selector1-parc-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qdJwGpCUwJs7SgHVEK7SiYh6WC3afKymfNTHnvT32nA=; b=Uv/zNqkWWzWkPA3sce4yC9Jx02Nh3R+0Ckm5HFILE6evY81wRNnRmTzUIVYT2E+TLcHBpHjxZdBMIcm/15ihTJU1N/w0BzU2FdPNFo72C+LKVwMCW54irb1tpbJ3HkAnEVjaydyejoUZvV8oNhfxv8sZ0sR0DaI8QmZdxNkkwDM=
Received: from BYAPR15MB3272.namprd15.prod.outlook.com (20.179.57.152) by BYAPR15MB2518.namprd15.prod.outlook.com (20.179.154.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1943.22; Mon, 3 Jun 2019 19:40:07 +0000
Received: from BYAPR15MB3272.namprd15.prod.outlook.com ([fe80::f011:6d15:e9d9:db3f]) by BYAPR15MB3272.namprd15.prod.outlook.com ([fe80::f011:6d15:e9d9:db3f%7]) with mapi id 15.20.1943.018; Mon, 3 Jun 2019 19:40:07 +0000
From: "Mosko, Marc <mmosko@parc.com>" <mmosko@parc.com>
To: "christian.tschudin@unibas.ch" <christian.tschudin@unibas.ch>, "David R. Oran" <daveoran@orandom.net>
CC: ICNRG <icnrg@irtf.org>
Thread-Topic: [icnrg] Some comments on the CCNx Selectors draft (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/)
Thread-Index: AQHVGUi4/3p9LU/ZnUuYtdeeoO8BzaaKPqWAgAAP3ak=
Date: Mon, 03 Jun 2019 19:40:06 +0000
Message-ID: <BYAPR15MB3272AFFFBDBF0CA06DAF52A0AD140@BYAPR15MB3272.namprd15.prod.outlook.com>
References: <89236739-5137-496D-8A6A-699730D1CC82@orandom.net>, <alpine.OSX.2.21.1906031812100.51254@uusi>
In-Reply-To: <alpine.OSX.2.21.1906031812100.51254@uusi>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=mmosko@parc.com;
x-originating-ip: [13.1.110.60]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 417700dc-7691-4e75-a744-08d6e85b4836
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600148)(711020)(4605104)(1401327)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7193020); SRVR:BYAPR15MB2518;
x-ms-traffictypediagnostic: BYAPR15MB2518:
x-ms-exchange-purlcount: 1
x-microsoft-antispam-prvs: <BYAPR15MB2518DB23C918FBAEA70F6376AD140@BYAPR15MB2518.namprd15.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 0057EE387C
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39840400004)(366004)(346002)(136003)(376002)(396003)(199004)(189003)(53234004)(51444003)(66946007)(66476007)(186003)(99286004)(66446008)(64756008)(66556008)(305945005)(476003)(73956011)(55016002)(76176011)(7696005)(6116002)(3846002)(6246003)(478600001)(4326008)(68736007)(33656002)(110136005)(8936002)(2501003)(7736002)(6306002)(561944003)(81166006)(86362001)(66066001)(229853002)(14454004)(316002)(9686003)(81156014)(14444005)(256004)(5660300002)(446003)(8676002)(52536014)(36542004)(11346002)(3450700001)(102836004)(2906002)(53936002)(76116006)(486006)(26005)(74316002)(25786009)(71200400001)(71190400001)(6436002)(53546011); DIR:OUT; SFP:1101; SCL:1; SRVR:BYAPR15MB2518; H:BYAPR15MB3272.namprd15.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: parc.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: 2F4FuSiuSyaOYYywtFYea0EhKaR+gZyzi7mFf8vDoLHQm065Yb5LAOulg/GAlB1DrMVX1eWNSilGbitIuukk1NvlixRsaqon+PAB0kmB8X9F1i9cgI8CXagrOardrhABvt/T04CLAp4H7/0i4Mzz9TKnksDCRmd2X1MMj5YRlTuKvAZiVoXxk9UbUxzC0XClcWkHjXCTESSQLrL1IkGGJrrrQI7tMxdNn5UW3H/iWReRQ4sFLFh44kPv5CR8C4rWWKr85YNjZG899t5ayTGLn4/xjJdcYLVQEK8VdU4kw20ixafK9MeUzr5Mz8vlMeC1en/ClrflLjTo/xjE1egg3SM6hY+vFpcgHeke629rd+QNuDO2GtD8KEw/gjQ2DybCmr8slgAuR6yfNwmTK6N/G0jDnGeTHPFWwxHU/5ar3UI=
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: parc.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 417700dc-7691-4e75-a744-08d6e85b4836
X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Jun 2019 19:40:06.9064 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 733d6903-c9f1-4a0f-b05b-d75eddb52d0d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: mmosko@parc.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR15MB2518
Archived-At: <https://mailarchive.ietf.org/arch/msg/icnrg/RzEMDkuV4-508rRk8NLPMgSMkIA>
Subject: Re: [icnrg] Some comments on the CCNx Selectors draft (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/)
X-BeenThere: icnrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Information-Centric Networking research group discussion list <icnrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/icnrg>, <mailto:icnrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/icnrg/>
List-Post: <mailto:icnrg@irtf.org>
List-Help: <mailto:icnrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/icnrg>, <mailto:icnrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Jun 2019 19:40:13 -0000

Dave & Christian,

Thanks for starting to tackle this problem.

In regard to the Selectors draft, I do not want to spend too much time defending it.  As said before, it was a proof-of-concept that one can do opt-in discovery using similar mechanisms to CCNx 0.x and older NDN.  Doing 1 by 1 query response and returning the whole content object each time is not, I think, the right approach.

I'm not going to dive in to a language or NFN vs keywords, etc., at this time.  I think we should first describe what it's supposed to do.

I think it is important that each node be able to control the extent to which it participates in discovery, including the encryption of queries and responses.  Possibly also authenticated discovery with various access control.

Here are some examples of the types of questions I think could be issued:
1) What is the latest version of a prefix?
2) What publishers have published under a name? (i.e., are there multiple keyids for the same name?)
3) regex style matching under a prefix.
4) Where can I get everything in a manifest
5) Which pieces of a manifest do you have (branches, leafs, other ways of answering)?
6) POSIX-like 'ls' functions
7) Discovery sources vs discovery content (e.g. where can I get something versus what can I get).  I don't mean to make this host-based networking, but I might want to get something from my cell provider versus a cache elsewhere or from a privacy-based cache operator.

Like DNS, there we have unauthoritative content stores (CS) and repositories; and authoritative repositories and publishers.  There should, I think, be a way to distinguish between the two and maybe to select whom one queries.

I think it is important to be able to break a response into multiple transactions, such as for scale and performance.  And those should be able to be consistent, such as a cursor.  This could be achieved by talking to only one entity using local context or, as Christian pointed out, using some shared state and talking with an enclave.

One could return a Manifest-like thing in response, but I think one wants to include possibly more information:
- A link (name, keyid, hash) to objects (e.g., "files")
- A sub-namespace (e.g., "directories")
- Information on caching (e.g. the ExpiryTime or remaining RCT)
- Information about inter-linking (manifest relationships)

I think using CCNxKE-like setup for private sessions for a query (and possibly subsequent data transfer) is a good thing.  For a properly configured enclave, one CCNxKE setup could allow for enclave-wide queries and responses.

Marc

________________________________________
From: icnrg <icnrg-bounces@irtf.org> on behalf of christian.tschudin@unibas.ch <christian.tschudin@unibas.ch>
Sent: Monday, June 3, 2019 11:17 AM
To: David R. Oran
Cc: Mosko, Marc <mmosko@parc.com>; ICNRG
Subject: Re: [icnrg] Some comments on the CCNx Selectors draft (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/)

Thanks David for creating a discussion around this!

Looking at discovery from a function invocation perspective certainly
resonates well, of course. However, NFN focuses on producing immutable
results out of immutable arguments, and still makes the most sense in
such an environment. You target the discovery of potentially new
content, such that the same query will return different results if asked
again later, right? That leads me to this comment:

Your writeup did not mention time nor provenance, which I think are
important in this discussion, another concept to mention would be "query
expression".

With provenance I mean the state (context) where new content was
published. This state acts as logical time and permits to "immutify"
results. The exchange would be:

- fetch a repo's state, or time horizon, S
   (this is the only mutable result call needed and permits to bind to
   that state, instead of a server or service)

followed by one or more

- discover(repo=/ndn/ucla, horizon=S, selector=/unix/ls(/some/repo/path, "*.txt"))

This discovery call will return the same result even of the repo is
replicated: any repo replica can execute the call in the specific
context S, no need to pin your discovery to a specific server, and the
result is fully cachable (if in clear).

A simple way to explain what a repo state S is, would be to point to the
commit hash of a Git repo. A NDN repo would have to keep track of new
content through commits, have a log of its actions. I think that the
word "provenance" is appropriate here, namely how/where/when some
content was added to the repo, or renamed, aliased etc.

Sure, manifests are good, but I think we can do better, and by better I
mean repo-side filtering. Once results are immutable, we can have
deterministic and stateless paging, cursors and other result mangling
under the requestor's control:

The last 10 entries:
- /nfn/txt/tail(/ndn5/discover(horizon=S, selector=/unix/ls(/ccnx/some/prefix, "*.txt"), -10)
Just counting hits:
- /nfn/txt/wc(/ndn5/discover(horizon=S, selector=/unix/ls(/ccnx/some/prefix, "*.txt"))

I see your objection coming that general NFN is too heavy for minimal
content discover, and I would agree. But what about a modest query
_language_ for repo-side execution instead of exposing single RPCs that
the caller has to chain in multiple RTs? We might pick a small set of
result filters (selectors really), that are easy to implement in all
repos, and permit to combine up to three filters in a query expression
as an example. So no loops, no lambda... Filters could be:

   isDir(), newerThan(d), hasMagic(GIF), publishedBy(key), lexicoAfter(marker)

Then we add Python slicing as in [10:-5], or just classic head() and
tail(), as well as AND, NOT, possibly OR.

Datalog could serve as a high-end starting point for such a discussion.

Best, c


On Sun, 2 Jun 2019, David R. Oran wrote:

>
> All comments with <chair hat off>
>
> Content discovery is important, so I’d like to see something adopted by the
> RG for formal progression to experimental.
>
> I’d like to step back a bit though and ask if evolving the approach in CCNx
> 0.x and NDN is appropriate knowing what we’ve learned in the last few years.
> The current selectors draft
> (https://datatracker.ietf.org/doc/draft-mosko-icnrg-selectors/) basically
> fixes the original design mistake of forcing all forwarders to do discovery
> through prefix match on CS data. It does this by leveraging typed name
> components and using an encapsulation technique so that whatever object that
> matches the selectors can be returned as “native” - with its original name
> and signature.
>
> On the other hand, it still has the following properties:
>
>  *
>
>     returns only one object when multiple might match
>
>  *
>
>     does not have a completely deterministic answer in the presence of
>     cache/repo/producer instance desynchronization (not clear this is
>     avoidable since we have a distributed system with at best eventual
>     consistency semantics)
>
>  *
>
>     requires exclusions, which can make for big interests that in turn
>     causes fragmentation to be needed and complicates congestion control as
>     a consequence.
>
>  *
>
>     seems to still permit cache exploration (since the selector interests
>     are not authenticated), which might be viewed as a privacy problem.
>
> I don’t have a concrete proposal to make though. Here are a few thoughts
> about a possible alternative approach:
>
>  *
>
>     Since we depend a lot more on Manifests now that the original NDN & CCNx
>     0.x designs, why not have the discovery return a Manifest instead of a
>     single encapsulated object?
>
>  *
>
>     Since we have fast exact match and nameless objects along with now
>     permitting objects to have multiple names and hence multiple ways
>     objects can be discovered, perhaps we can exploit that to allow
>     different discovery patterns rather than smooshing everything into a
>     static/brittle set of selectors
>
>  *
>
>     Some of the properties were driven by trying to make discovery cheap and
>     ubiquitous and hence forced into casting discovery as a direct single
>     Interest/Data exchange. Since we now know that pawing through a large CS
>     or Repo can be computational expensive (and even require a fair amount
>     of I/O), maybe this tradeoff isn’t attractive. We now have experience
>     with distributed method invocation (e.g. NFN and RICE) so maybe we do
>     discovery as an explicit remote method. This may solve a few problems:
>
>      +  We can do authentication and key exchange, so discovery operations
>         and results can be more private and cache exploration is no longer a
>         hazard
>      +  It might make it easier to build extensibility into the supported
>         discovery patterns as all of them might not need to be supported by
>         all discovery services. (Aside - it seems that a similar direction
>         is being taken in the fast-repo work for NDN, where a given
>         application can design patterns for the Repo to fetch the right
>         stuff for storage of that application’s data).
>      +  We can more easily manage the resources for expensive discovery
>         operations, and following the idea of using Manifests, make those
>         the result of the discovery computation.
>      +  By essentially “binding” to one of the discovery services/servers,
>         it may be possible to get a more internally-consistent set of
>         results.
>
> Yes, this might make discovery a lot more “expensive” in terms of RTTs but
> in the end it might also make applications easier to write with patterns
> that look more like what the application considers “native” and the results
> as a Manifest easier to paw through and possibly iterate on.
>
> Thoughts?
>
> DaveO
>
>
>