Re: [Sidrops] Publication Point -> RP synchronization in bandwidth constrained environments (note for RRDP v2)
Tim Bruijnzeels <tim@nlnetlabs.nl> Thu, 08 June 2023 15:14 UTC
Return-Path: <tim@nlnetlabs.nl>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7BB2FC151084 for <sidrops@ietfa.amsl.com>; Thu, 8 Jun 2023 08:14:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.097
X-Spam-Level:
X-Spam-Status: No, score=-7.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=nlnetlabs.nl
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rX0L5tpHOj_s for <sidrops@ietfa.amsl.com>; Thu, 8 Jun 2023 08:14:12 -0700 (PDT)
Received: from dane.soverin.net (dane.soverin.net [IPv6:2a10:de80:1:4091:b9e9:2219:0:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5CE40C151085 for <sidrops@ietf.org>; Thu, 8 Jun 2023 08:14:11 -0700 (PDT)
Received: from smtp.soverin.net (c04smtp-lb01.int.sover.in [10.10.4.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dane.soverin.net (Postfix) with ESMTPS id 4QcSRN1KXFzySg; Thu, 8 Jun 2023 15:14:08 +0000 (UTC)
Received: from smtp.soverin.net (smtp.soverin.net [10.10.4.100]) by soverin.net (Postfix) with ESMTPSA id 4QcSRM2XKMzFv; Thu, 8 Jun 2023 15:14:07 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nlnetlabs.nl; s=soverin; t=1686237248; bh=vrdaqc/iw4zJa0KGO5fzZrd0PG8ZTKB4kOfgItpxidQ=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=rApSy+q2pRHFJtsj5ZCSPtfBf3q+v418xVpRL0nKjCC3C1rlmI+i8Oq6mi54nblQS eHoc0eVDMRHPgqeqywiiXewDFXcMdnCpcVm65I+P4io6+Q1x20ZIbuCPm9TLjoxkPK qr7KOj8Hvj1PTM6eFMH/VCudIcTiEom4dx2+38QO0tJJpoj5GMuEQsUeMWTdlkUBI4 zhbu73DNCyT9c0Yd32gbtG5RRR5plWEE5At+ajltZpHunp3c7d4S9ICtzZGiyjnzLi hfC5LlepkamkPC/JPXyQikQTpFiIX5EkCWohb7EqDcoEaiDrKJF1twEzBb8tE8pRCw X8uPLoUYbbhIA==
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\))
X-Soverin-Authenticated: true
From: Tim Bruijnzeels <tim@nlnetlabs.nl>
In-Reply-To: <461BCBA1-19FD-4E6C-83C3-584D1597B2E9@ripe.net>
Date: Thu, 08 Jun 2023 17:13:55 +0200
Cc: Claudio Jeker <cjeker@diehard.n-r-g.com>, Mikhail Puzanov <mpuzanov@ripe.net>, Job Snijders <job=40fastly.com@dmarc.ietf.org>, sidrops@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <DCB50ED8-9906-41DE-892E-F76987C4808C@nlnetlabs.nl>
References: <ZHYYt77xdtrkNV1a@snel> <955CFF67-8D19-4B38-8585-3754F3119EDF@ripe.net> <ZIHLBTZJ06/J3bCa@diehard.n-r-g.com> <461BCBA1-19FD-4E6C-83C3-584D1597B2E9@ripe.net>
To: Ties de Kock <tdekock@ripe.net>
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/OwgbZYd_tlkp1KTPPVRHwN_gDAA>
Subject: Re: [Sidrops] Publication Point -> RP synchronization in bandwidth constrained environments (note for RRDP v2)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jun 2023 15:14:16 -0000
> On 8 Jun 2023, at 14:55, Ties de Kock <tdekock@ripe.net> wrote: > > > >> On 8 Jun 2023, at 14:35, Claudio Jeker <cjeker@diehard.n-r-g.com> wrote: >> >> On Thu, Jun 08, 2023 at 01:45:46PM +0200, Mikhail Puzanov wrote: >>> Hi Job, all, >>> >>> I think compression is probably the quickest way of mitigating the size problem: >>> >>> Repeating some of your experiments: >>> >>> tmp> ls -lh snapshot.xml* >>> -rw-r--r-- 1 mpuzanov staff 202M Jun 8 12:44 snapshot.xml >>> -rw-r--r-- 1 mpuzanov staff 89M Jun 8 12:58 snapshot.xml.bz2 >>> -rw-r--r-- 1 mpuzanov staff 90M Jun 8 13:01 snapshot.xml.gz >>> -rw-r--r-- 1 mpuzanov staff 124M Jun 8 12:57 snapshot.xml.lz4 >>> -rw-r--r-- 1 mpuzanov staff 83M Jun 8 12:58 snapshot.xml.lzma >>> >>> Even the good ol’ gzip can shrink RIPE NCC’s snapshot more than twofold. >>> Also compression should take care of the repetitive parts, i.e. >>> XML tags, newlines, etc. >>> >>> Extra 2 cents as an RP implementer: rpki-prover fetches every repository >>> in a separate OS process with constrained heap, constrained download size >>> and a timeout, so it pretty much ignores all CVE-2021-43174-related >>> considerations and compression was never disabled in it. If the fetching >>> process crashes or runs out of some resource the impact is limited to >>> the specific repository. So there is a way to have compression and >>> tolerate crashes. >> >> The inherent problem of RRDP is that when a CA is forced to issue a new >> RRDP session Technically it's the Publication Server that may perform a session reset, but you probably meant that. Publication Servers work hard to avoid session resets. It's really just the escape hatch in case a server can no longer provide deltas. E.g. they have had to do a disaster-recovery. >> (or all RP lose their RRDP state for the CA) every RP system >> will grab a snapshot. RRDP state for the repository, again you probably meant that. Yes, this puts a burden on the RP. See below. >> So you end up with a thundering herd and I doubt a >> reduction by half is enough to make that effect go away. > > The 45-70% reduction in traffic and costs is significant. Definitely helps I would think. >> The worst case RRDP bandwith requirements are significantly bigger then >> what is needed in the steady state. This is an inherently bad design. >> The only fix is to overprovision by a lot. > > The statement on over-provisioning holds for rsync. Rsync minimises bandwidth, but it requires that the server keeps an open session with each client and invests CPU and memory for recursive fetches. This is not a big issue for small repos, but requires significant resources for big repositories. RP to server numbers are highly asymmetrical. The server can be DoS'ed trivially. RRDP is designed to leverage CDNs for delivery of mostly static files. I.e. the state of the repository (snapshot) and changes between given states (deltas) are files that will never change. It puts the CPU and download burden on the RP where it can scale, in parallel. But, yes there are definitely lessons learned and things that can be improved in a future version. > In practice a RP needs to be out of sync for a significant amount of time before > a fallback to snapshot is needed (> 8h for RIPE NCC repo). In that period, >60% > of objects in the repository has churned because of manifest+crl rotation. First off: there is a just an awful lot of churn in the RPKI caused by manifest and crl size and renewal. We could think about other designs, but it's a much bigger discussion. The bandwidth issues that I have seen are caused by a combination of very large notification files that are served from a slow link (no CDN or other distribution used). It will help, a *lot*, if files are kept small. At the time that RRDP (RFC 8182) was written we lacked the operational experience to make better recommendations than the 'MUST' that is in there that says that the size of the deltas MUST NOT exceed the snapshot size. In practice this can lead to huge notification files. If you try to serve huge files to every RP over a slow network.. then we get the issues we have seen. There are a number of things that can be done in the short term, within the current RRDP standards. We finally (yes, should have done this earlier) wrote down recommendations in draft-timbru-sidrops-publication-server-bcp-00. As for lessons learned for a future RRDP. There are a number of things that I think could be done. To just name a few: - No new snapshot on every update This is hard for the server. Writing a full snapshot of multiple MBs for every change adds up. I think it would be better to have snapshots every so often and use deltas to get to the current state - Combine deltas Let the server pre-calculate combined deltas from serial X to current. This saves the RPs downloading many files. It saves bandwidth in cases where files are updated multiple times (e.g. a CA published multiple ROAs as separate events). - Binary format We can have a DER encoded structures (since we do DER anyway) that can contain all the data. This saves a lot of XML and base64 overhead. I have not done the analysis but I think that the actual DER objects don't compress well - so we may not need compression if we do this. - Pointers for stragglers? Notification files should be short. This means that RPs that have not synced for a long time will have to do a full re-sync. What may help is that there are pointers included to longer notification files for those that need it. Tim > > Kind regards, > Ties > _______________________________________________ > Sidrops mailing list > Sidrops@ietf.org > https://www.ietf.org/mailman/listinfo/sidrops
- [Sidrops] Publication Point -> RP synchronization… Job Snijders
- Re: [Sidrops] Publication Point -> RP synchroniza… Christopher Morrow
- Re: [Sidrops] Publication Point -> RP synchroniza… Job Snijders
- Re: [Sidrops] Publication Point -> RP synchroniza… Christopher Morrow
- Re: [Sidrops] Publication Point -> RP synchroniza… Di Ma
- Re: [Sidrops] Publication Point -> RP synchroniza… Lukas Tribus
- Re: [Sidrops] Publication Point -> RP synchroniza… Mikhail Puzanov
- Re: [Sidrops] Publication Point -> RP synchroniza… Ties de Kock
- Re: [Sidrops] Publication Point -> RP synchroniza… Claudio Jeker
- Re: [Sidrops] Publication Point -> RP synchroniza… Ties de Kock
- Re: [Sidrops] Publication Point -> RP synchroniza… Tim Bruijnzeels
- Re: [Sidrops] Publication Point -> RP synchroniza… Job Snijders
- Re: [Sidrops] Publication Point -> RP synchroniza… Mikhail Puzanov
- Re: [Sidrops] Publication Point -> RP synchroniza… Ties de Kock
- Re: [Sidrops] Publication Point -> RP synchroniza… Tim Bruijnzeels
- Re: [Sidrops] Publication Point -> RP synchroniza… Job Snijders
- Re: [Sidrops] Publication Point -> RP synchroniza… Ties de Kock