Re: [sidr] RPKI Repository Distribution Protocol - a proposal for an rsync replacement for the RPKI

Tim Bruijnzeels <tim@ripe.net> Thu, 22 November 2012 15:13 UTC

Return-Path: <tim@ripe.net>
X-Original-To: sidr@ietfa.amsl.com
Delivered-To: sidr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9A58421F88F7 for <sidr@ietfa.amsl.com>; Thu, 22 Nov 2012 07:13:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.392
X-Spam-Level:
X-Spam-Status: No, score=-2.392 tagged_above=-999 required=5 tests=[AWL=0.206, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yAMHe28YPJ2R for <sidr@ietfa.amsl.com>; Thu, 22 Nov 2012 07:13:11 -0800 (PST)
Received: from postgirl.ripe.net (postgirl.ipv6.ripe.net [IPv6:2001:67c:2e8:11::c100:1342]) by ietfa.amsl.com (Postfix) with ESMTP id 0BD4A21F88F8 for <sidr@ietf.org>; Thu, 22 Nov 2012 07:13:10 -0800 (PST)
Received: from ayeaye.ripe.net ([193.0.23.5]) by postgirl.ripe.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from <tim@ripe.net>) id 1TbYSf-0001Qj-1K; Thu, 22 Nov 2012 16:13:07 +0100
Received: from cat.ripe.net ([193.0.1.249] helo=[IPv6:::1]) by ayeaye.ripe.net with esmtp (Exim 4.72) (envelope-from <tim@ripe.net>) id 1TbYSe-0006CF-UZ; Thu, 22 Nov 2012 16:13:04 +0100
Content-Type: multipart/alternative; boundary="Apple-Mail=_4974855A-B4BB-4843-886B-E0D90D094A7E"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Tim Bruijnzeels <tim@ripe.net>
In-Reply-To: <480B1D2A-1FCE-4569-90F8-26CBF042FDB3@vigilsec.com>
Date: Thu, 22 Nov 2012 16:13:06 +0100
Message-Id: <ACCEE095-AFCA-4949-A253-167BA7C396C4@ripe.net>
References: <5ED5F195-497C-4043-B44B-A23395987F7E@cobenian.com> <480B1D2A-1FCE-4569-90F8-26CBF042FDB3@vigilsec.com>
To: Russ Housley <housley@vigilsec.com>, Bryan Weber <bryan@cobenian.com>
X-Mailer: Apple Mail (2.1499)
X-Anti-Virus: Kaspersky Anti-Virus for Linux Mail Server 5.6.48/RELEASE, bases: 20120425 #7816575, check: 20121122 clean
X-RIPE-Spam-Level: ---
X-RIPE-Spam-Report: Spam Total Points: -3.3 points pts rule name description ---- ---------------------- ------------------------------------ -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -0.4 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.0 HTML_MESSAGE BODY: HTML included in message
X-RIPE-Signature: 784d7acfe6559f2a0b602ec6519a071975475e4067ddb322851c89b34e6e9d77
Cc: sidr wg list <sidr@ietf.org>
Subject: Re: [sidr] RPKI Repository Distribution Protocol - a proposal for an rsync replacement for the RPKI
X-BeenThere: sidr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Secure Interdomain Routing <sidr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidr>, <mailto:sidr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sidr>
List-Post: <mailto:sidr@ietf.org>
List-Help: <mailto:sidr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidr>, <mailto:sidr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Nov 2012 15:13:15 -0000

Hi,
 
As some of you might know I have also done some thinking on analysing the problems with the current infrastructure, and also have some ideas about improvements and talked about this at the Vancouver interim. It's way too much for inline email, so I also took the liberty of writing my ideas down in an IETF format, that hopefully helps to discuss this in a more structured way:

 http://www.ietf.org/internet-drafts/draft-tbruijnzeels-sidr-delta-protocol-00.txt

This too is of course a first draft and intended for discussion..


On Nov 20, 2012, at 9:18 AM, Russ Housley <housley@vigilsec.com> wrote:
> At the meeting in Atlanta, we heard some very impressive performance numbers for the restructured RIPE repository with rsync.  

Although the performance has improved dramatically for some RPs now that the structure is hierarchical, I don't think today's performance is a good indicator for the future.

We now see:
- Around 50-60 unique IP addresses for RPs connecting every day
- One of these (gw1.antd.nist.gov) is responsible for roughly 9k connections per day
- Ignoring that one for now the remaining are responsible for around 1.8k connections per day: 
   => about 36 fetches / day for each client, or once every 40 minutes

We have around 4.5k objects in our repository.

In a full deployment scenario we can expect many more relying parties, one for each ASN seems likely, so that is 40k clients. And while we don't agree yet what the size of the total repository would be, indications are it's in the order of 1-2M at least, so roughly 1/5th of that places our repository at 200-400k. There are also indications that more frequent updates are actually desirable. Say once every 5 minutes vs 40..

In short that's 10^3 times the clients, 10^2 times the objects, and 10 times the connection frequency. 

With those numbers it's very hard to predict exactly what behaviour we will see, it's very likely non-linear, but if it were, we're talking 10^6 times today's load.

> While your code is not feature complete, do you have any expectations of similar performance?

When measured 1:1 rsync performance is actually pretty good.

As described in my document: One of the major problems I see is that in this space we're not talking 1:1, but something like 1 server : 40k clients.



> On Nov 19, 2012, at 11:02 PM, Bryan Weber wrote:
> 
>> SIDR working group,
>> 
>> I humbly submit a proposal for your consideration. It is for an rsync replacement that for now I am calling the 'RPKI Repository Distribution Protocol (RRDP)'. It is intended to replace the use of rsync by the RPKI validators or at least re-open some previous discussion on possible replacements.
>> 
>> The primary benefit is that RRDP is a combination of a very limited Distributed Version Control System (DVCS) with a transport agnostic communication protocol. This means that repository changes can be snapshotted and retrieved as an atomic change. This solves the problem of in progress transfers when a new repository generation is kicked off. It also means that different protocols (HTTP/SSL/TLS/etc.) can be used for the actual transport. Finally, it can work with the existing rsync URIs and it can work alongside rsync at the same time. Should the protocol be considered for real adoption this should help to ease in gradual adoption.
>> 

I like the idea of the snapshotted atomic changes. But I have problems with a communication protocol where the server needs to chat with up to 40k RPs. The CPU and memory usage stats per client may not be the same as with rsync (see slides interim), but the same fundamental problem remains: you're proposing a smart server that has to spend cycles talking to too many clients.

As described in my proposal I prefer a protocol that lets RPs work out which changes to fetch by themselves, and although I too do not want to restrict to one protocol, I want something that can leverage existing proven http CDN infrastructure to deliver data to clients.

>> A brief paper that describes the protocol can be found at http://www.cobenian.com/documentation/rrdp.pdf. This is very much a first draft, but I would appreciate any feedback/comments you have and any interest you might have in trying to use this protocol with a real validator. I am all ears to suggestions for improvements. I would especially love to make it more aware of the contents of RPKI manifests.
>> 

First of all I want to thank you for thinking about solutions, even if our ideas may differ somewhat.

I had drafts of my proposal on my machine for a while, presented parts of it in Vancouver, and have discussed with various people, but I did not send it to sidr yet. Now that you sent your version, I felt like sharing mine though ;)


>> The beginning of a reference implementation can be found at http://www.github.com/cobenian/rrdp. This is not feature complete and certainly not compliant with the specification at the moment, but it should be in the very near future (a few days to a few weeks).



Tim