Re: [Sidrops] trying to limit RP processing variability

Rob Austein <> Tue, 28 April 2020 04:35 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id AEE6D3A0893 for <>; Mon, 27 Apr 2020 21:35:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id zOQPS3aFcqgt for <>; Mon, 27 Apr 2020 21:35:31 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id C204A3A088F for <>; Mon, 27 Apr 2020 21:35:31 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "", Issuer "Grunchweather Associates" (not verified)) by (Postfix) with ESMTPS id 26916139B8 for <>; Tue, 28 Apr 2020 04:35:30 +0000 (UTC)
Received: from (localhost [IPv6:::1]) by (Postfix) with ESMTP id 09FFB2014A7F5F for <>; Tue, 28 Apr 2020 00:37:36 -0400 (EDT)
Date: Tue, 28 Apr 2020 00:37:36 -0400
From: Rob Austein <>
In-Reply-To: <>
References: <> <> <> <>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/26.3 Mule/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="ISO-2022-JP"
Message-Id: <>
Archived-At: <>
Subject: Re: [Sidrops] trying to limit RP processing variability
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 28 Apr 2020 04:35:34 -0000

Apologies for coming into this very late, I had not been tracking the
discussion until Randy kicked me earlier today.

On Thu, 09 Apr 2020 08:06:54 -0400, Martin Hoffmann wrote:
> Robert Kisteleki wrote:
> > 
> > IMO an "RP has no obvious way to acquire missing objects" is not
> > entirely true.
> > 
> > If, at the previous run, the RP fetched the relevant (now missing)
> > object, then I see no reason to not use it again. Think of the
> > previous run as an object a cache if you will: if you're looking for
> > an object mentioned in the manifest, and you have it already (hash /
> > name / etc. matches) then you can reuse it.
> That is theoretical possible, but in practice you treat synchronising
> and validation of repository content as two separate steps. I.e, before
> you even start looking at a CA’s repository, you synchronize its
> content. This is enshrined in the way both rsync and RRDP work: They
> don’t update single files but entire directory trees all at once. This
> step includes deleting objects that have been deleted on the server.
> Since the complete RPKI repository has a hierarchical structure
> following the rsync URIs of objects, many RP implementations keep the
> objects in the file system only. This is in particular useful for
> rsync: Just let rsync update the directory in place. An additional
> bonus of this strategy is that you don’t need a fancy database.

The DRL validation engine maintains separate caches for:

* Unverified latest stuff fetched from the net;
* Stuff that passed RPKI validation last time; and
* Stuff that passed RPKI validation this time.

rsync's behavior is really only relevant to the first of these caches.
Even back when we kept all objects as separate disk files (DRL
"rcynic" versions 1 and 2) we kept separate caches, we just used a lot
of hard links (in part to avoid putting further strain on filesystems
which made bad assumptions about block/inode allocation ratios...).

rcynic version 3 ended up stuffing everything but the unverified cache
into a database, because adding RRDP support changed the internal
search requirements enough that keeping everything as one disk file
per RPKI object no longer made sense.

In all cases, the basic algorithm remained the same: we walk the tree
from the trust anchors down, looking for URIs from which we need to
fetch. If we can fill everything the manifests tell us to expect from
current data, great, otherwise we check the objects that passed
validation last time before giving up and amputating portions of the
tree.  Yes, this meant that fetch and validation are interleaved and
that validation sometimes has to pause while waiting for fetch.

Overall, this approach seems to work very well, in the sense of
pulling together what looks to the RP like a coherent view of the
world, and recovering automatically from minor synchronization glitches.

RRDP simplifies this a bit, since it tends to reduce the number of
discrete publication events on the CA side and gives the CA something
closer to a transactional publication mechanism.

> You could, of course, concoct a mechanism that marks files for deletion
> and only deletes them if they aren’t actually used in the next
> validation run. But, considering that this thread is actually
> subjected “trying to limit RP processing variability,” I am not sure
> this is a good idea. There is a strong likelihood that different
> strategies will behave slightly differently. If we really want to
> come to a point where every RP implementation produces the same output
> from given input, we need to defined simple rules that are easy to
> implement in a wide range of circumstances.

Your RP's view of the world is never going to be exactly the same as
my RP's view of the world.  This is just life with a distributed
database.  DNS and BGP aren't globally coherent in that sense either,
they're just (usually) close enough.

> Another consequence of doing this is that validation on a newly
> deployed RP software differs from one that has been running for a
> while. As a consequence, the datasets from two different caches
> configured in routers differ. So now you even have difference between
> caches running the same software.[0]

Correct.  Again, this is just life with caching and a distributed