Re: [Sidrops] I-D Action: draft-ietf-sidrops-prefer-rrdp-01.txt

Claudio Jeker <cjeker@diehard.n-r-g.com> Tue, 07 December 2021 11:57 UTC

Return-Path: <cjeker@diehard.n-r-g.com>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 593313A157E for <sidrops@ietfa.amsl.com>; Tue, 7 Dec 2021 03:57:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g8d-S6m5mDnc for <sidrops@ietfa.amsl.com>; Tue, 7 Dec 2021 03:57:11 -0800 (PST)
Received: from diehard.n-r-g.com (diehard.n-r-g.com [62.48.3.9]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 541FF3A157D for <sidrops@ietf.org>; Tue, 7 Dec 2021 03:57:10 -0800 (PST)
Received: (qmail 51094 invoked by uid 1000); 7 Dec 2021 11:57:08 -0000
Date: Tue, 07 Dec 2021 12:57:08 +0100
From: Claudio Jeker <cjeker@diehard.n-r-g.com>
To: George Michaelson <ggm@algebras.org>
Cc: SIDR Operations WG <sidrops@ietf.org>
Message-ID: <Ya9MFHcGAnJFYdH9@diehard.n-r-g.com>
References: <001b01d7e638$0f34baf0$2d9e30d0$@student.utwente.nl> <D5E672A8-8701-4FC1-89F6-14C6DADF6D1D@massar.ch> <51448777-038D-4B0D-97D9-129053B5115F@ripe.net> <873EE1A3-B86A-46A3-9D96-AD4BEA21103F@ripe.net> <2C2970A5-33D6-4C49-94E1-F4B87CD8DD14@ripe.net> <B0FD01A5-8A28-4F73-8F2C-0580F2DA7A8E@ripe.net> <Ya8Y3Hj1S3w8xi+y@diehard.n-r-g.com> <CAKr6gn1dzGRMk+zfxr3iEnMSs=NqYq_jFAg8-N41W4ayQv2RcA@mail.gmail.com> <Ya85FU/dvbGwCiGc@diehard.n-r-g.com> <CAKr6gn1TcAF-FZBJrNVg5VR+wVHatBY+X9N-Uct1Nr_2oR6qZA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAKr6gn1TcAF-FZBJrNVg5VR+wVHatBY+X9N-Uct1Nr_2oR6qZA@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/EMz3gaADSivkd6pXFzIJHMIvxSk>
Subject: Re: [Sidrops] I-D Action: draft-ietf-sidrops-prefer-rrdp-01.txt
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Dec 2021 11:57:17 -0000

On Tue, Dec 07, 2021 at 08:58:03PM +1000, George Michaelson wrote:
> On Tue, 7 Dec 2021, 8:36 pm Claudio Jeker, <cjeker@diehard.n-r-g.com> wrote:
> 
> > On Tue, Dec 07, 2021 at 06:29:33PM +1000, George Michaelson wrote:
> > > On Tue, 7 Dec 2021, 6:18 pm Claudio Jeker, <cjeker@diehard.n-r-g.com>
> > wrote:
> > >
> > > >
> > > > There are some much larger rsync repositories out there then the one
> > from
> > > > RIPE NCC. I think the main issue with rsync is that there is only one
> > real
> > > > implementation of the server and there is no official specification of
> > the
> > > > protocol.
> > > >
> > >
> > > Aside from the 5 RIR who do not directly produce roa, although they do
> > > publish roa for hosted services in distinct paths what are these much
> > > larger repositories? Do you mean APNIC or another RIR? Large to object
> > size
> > > or large to count or ... ?
> >
> > rsync is still used by various software distributions to sync large
> > repositories. For example the OpenBSD CVS repository is made available via
> > multiple rsync servers and at 7.7GB and 500'000 files it is a fair amount
> > larger than any RIR repo.
> >
> 
> So as job pointed out this is a generality about the applicability of rsync
> to massive filestore synchronisation which is what Tridge designed it for.
> 
> servers by changing file timestamps in their repositories. This breaks
> > rsync fast path.
> >
> 
> To be clear, you must mean when there's been no change to filestate because
> reissuance of files demands that be resigned and retimestamped.
> 
> Because of partial update problems a lot of people server side are now
> doing republication by the near-atomic operation on the root dir of the
> repository so that its an integral collection. Because of validation
> failure states for partially updated state. It's possible the mechanistic
> cheap ways of doing this accidentally change file and dir timestamps, worth
> noting. We could try harder to avoid this.

I just noticed that rsync refetches the hole repository more often than it
has to. This is because timestamps change but the file content is still
the same. So this adds extra load to the rsync servers.
 
> > > > rsync has a few benefits over RRDP so I would be very careful about
> > > > removing rsync support without fixing or adding an RRDP replacement.
> > >
> > >
> > > It pays to be explicit. What are these benefits? The only one I have seen
> > > discussed is the cost shifting to compute the object tree from deltas
> > > (shifted cost to client side in RRDP) vs the complete structure in rsync
> > at
> > > a cost of per fetch difference calculation, server side.
> >
> > RRDP lacks integrity checks. If a file is for whatever resason missing on
> > the RP then it will not be refetched for a very long time. Rsync has the
> > benefit that missing files are refetched automatically. After a rsync run
> > your repository is the same as the on the remote side and any local
> > modification is detected and fixed. RRDP has nothing like that. It only
> > carries a hash for delete and update operations but there is nothing that
> > ensures that other files are actually what their supposed to be.
> > So every file served via RRDP needs to be kept on the RP system even if it
> > has nothing todo with the RPKI.
> >
> 
> I think you're conflating issues here. RRDP is a delta protocol but
> includes pointers, URI to every object. All rpki objects being signed I
> don't understand the point about files not being what they are supposed to
> be. If you mean and adversarial attack, remember the crypto is there.

I can send you a delta that holds files that are not in the snapshot. You
will store those files in your cache until you fetch a new snapshot. This
way the disk on RP systems can be filled with very little effort from a
publication point.

RRDP is a weak delta protocol there is no way to know if all files are
present because there is no index which can be compared. In this regard
the consistency of the database is just presumed.
 
> But it is probably fair that neither rsync nor rrdp were designed for bad
> actor risks. I think people are driving hard to attack RRDP but remember
> we're here because of the trivially low bar to attacking rsync.
 
The problem here is that RPKI is setup in a way that makes it easy to
DoS RP software. It does not matter if RRDP or rsync are used.
 
> > There are also disadvantages. Rsync is inherently serialised. Http has at
> 
> > > least implicit parallelism in its transport protocol (HTTPS). And is CDN
> > > friendly. Rsync CDN support is tcp anycast and that's about it. And the
> > > ones you stated (lack of implementations, lack of ietf standard)
> >
> > RRDP is inherently serialised. You can not parallel process it. You can
> > run many rsync servers on different IPs. There is no need for tcp anycast.
> >
> 
> I think we're talking at cross purposes. Per rsync connection there is no
> internal threading it's a single linear sequence of fetches. If you run two
> rsyncs in parallel between some A and B you stomp on each others fetches.

How is this different to a HTTPS connection? There is not internal
threading there either. If you run two RRDP processes in parallel you also
stomp on each other fetches.
 
> But in principle (if not in practice) you can thread and multifetch the
> stream of an RRDP. In any case snapshot aside you fetch less and you do not
> do per file checking, the delta sync is a smaller total fetch size.
>  
> I don't think anyone has coded RRDP to exploit parallelism. So this is only
> in hypothesis. But, unless you move to zsync or other modern takes on rsync
> this is pretty hard to do.
 
You must apply all RRDP deltas in order. You can not apply them in
parallel or you may end up with a corrupted repository. In this regard I
think RRDP and rsync are behaving the same way.

>From a security standpoint neither RRDP nor rsync are good. In RPKI all
risks are on the RP side and very little on the side of publication points.
This is a fundamental issue. 

> I could run rsync as a CDN but there is probably not that much interest in
> > doing that from the big CDN players.
> >
> 
> As an RIR employee I can only agree. There is no first class support for
> rsync as a clouded service. We asked. You have to build your own. You can
> now embed rsync into services which do closest TCP provider mapping, but
> there's no meaningful edge cache. APNICs AS0 is in cloudflare and there is
> a significant RTT advantage in second fetcher because of the caching, which
> came along for free in http cloud services.

-- 
:wq Claudio