Re: [sidr] Slides for "RPKI Over BitTorrent" presentation

Rob Austein <> Fri, 30 March 2012 18:10 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3612821F8625 for <>; Fri, 30 Mar 2012 11:10:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -102.415
X-Spam-Status: No, score=-102.415 tagged_above=-999 required=5 tests=[AWL=0.184, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id MwHft2+Q7Mqn for <>; Fri, 30 Mar 2012 11:10:22 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 2928321F8622 for <>; Fri, 30 Mar 2012 11:10:21 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "", Issuer "Grunchweather Associates" (verified OK)) by (Postfix) with ESMTPS id 1874428465 for <>; Fri, 30 Mar 2012 18:10:19 +0000 (UTC)
Received: from (localhost []) by (Postfix) with ESMTP id A9BE46EFC01 for <>; Fri, 30 Mar 2012 20:10:17 +0200 (CEST)
Date: Fri, 30 Mar 2012 20:10:17 +0200
From: Rob Austein <>
In-Reply-To: <>
References: <> <> <> <> <>
User-Agent: Wanderlust/2.15.5 (Almost Unreal) Emacs/22.3 Mule/5.0 (SAKAKI)
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset="US-ASCII"
Message-Id: <>
Subject: Re: [sidr] Slides for "RPKI Over BitTorrent" presentation
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Secure Interdomain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 30 Mar 2012 18:10:23 -0000

While I will admit to some astonishment that the following explanation
could possibly be news to long-time participants in this WG (given how
much time I've spent whining about this issue over the last five years
or so both in public and in private), let me quote from the slides:

    * How efficient [fetching RPKI repositories using rsync] is
      depends heavily on how the publication repositories are

    * In an efficiently organized repository, filesystem hierarchy
      follows X.509 certificate hierarchy, so that one can pick up
      significant subtrees with a single rsync connection.

    * To date, the RIRs have chosen to deploy flat hierarchies where
      there is no relationship at all between filesystem hierarchy
      within the repository and certificate hierarchy.

To make that more concrete, here's an example.  Let's assume we have
the following trivial hierarchy: Bob and Betty are issued by Alice,
Carol and Carl are issued by Bob, Dave, and Dana are issued by Carol,
Dara is issued by Carl, and and all of these are hosted in a single

In an inefficient, "flat" repository, the publication points for
objects issued by these entities would look something like this:


In a hierarchical repository, the same publication points would look
more like this:


Assuming top-down tree walk (the normal case), retrieving objects
issued by this set of entities takes eight rsync connections with the
flat repository, as opposed to one rsync connection with the
hierarchical repository.

In practice one might want a slightly more complex structure to limit
the size of individual directories, but it doesn't matter so long as
the filesystem hierarchy is organized in such a way that picking up
an issuer's publication point picks up a non-trivial number of its
subjects' publication points automatically.   It doesn't have to be
perfect, just has to do enough better than the flat model to amortize
the cost of setting up and tearing down the rsync connection over a
significantly larger number of files.

This is not about PKI, it's purely an rsync efficiency issue.

Presumably there are scaling limitations to the hierarchical approach,
but anecdotal evidence among the people I've asked ("I tried ... and
it worked") suggests that, if the underlying networks and filesystems
are in good shape, a single rsync connection ought to be able to
handle up to at least 10,000 small files, perhaps a lot more than
that.  Note that this is just talking about rsync itself: mileage
might vary significantly if the underlying networks or filesystems are
seriously broken.  Also note that these anecdotal estimates have not
been tested in any rigorous fashion as far as I know, so that's
another entry on my list of things we ought to be measuring.

Hope this helps to clarify the change I've been suggesting.