Re: [Sidrops] Publication Point -> RP synchronization in bandwidth constrained environments (note for RRDP v2)

Lukas Tribus <lukas@ltri.eu> Wed, 31 May 2023 11:56 UTC

Return-Path: <lukas@ltri.eu>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A659BC14CE54 for <sidrops@ietfa.amsl.com>; Wed, 31 May 2023 04:56:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.797
X-Spam-Level:
X-Spam-Status: No, score=-2.797 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ltri.eu
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2FmQ5S_pUs9E for <sidrops@ietfa.amsl.com>; Wed, 31 May 2023 04:56:47 -0700 (PDT)
Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 99ABEC14CE46 for <sidrops@ietf.org>; Wed, 31 May 2023 04:56:46 -0700 (PDT)
Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4QWSRF6cxKz9slL for <sidrops@ietf.org>; Wed, 31 May 2023 13:56:41 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ltri.eu; s=MBO0001; t=1685534201; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=MDbLYr1l6sFNjVgVsz8dsc6Z4TrmjXIS1XNjoGL4/I8=; b=wH8XKfDuuahgqqnKm0x37uzJ+LRTvuEbLhjIEWgpmTLTVVfmRy2W8NhcT8oQ/LTP1eKHrJ sKQqub/MbZuF5IskgqaHHoDm1uz9o/IAvh3F2QpRDGWgBpWfwR6mQFyVKggviVypTfOfys OsbRrHHuqy1bZBcFdXRj2kl72gzSW95ztriP4IkqPPPRyotMupr27zbnXIaa9Qmf/gKmeG ckmGmd24ELK+BblGfkI7pGMkkZC/fnm/6dy9N0VaLizKfrVcIT0jazqFE9EtlyFX4V56WO 6HBSQIuBJlLmIJ5I7dWcHU+Qv5hWhaPWszIAuefbTL4dqFpsXlSQ1M7zocHy5A==
X-Gm-Message-State: AC+VfDzzWBVMvpG9IK4pkDB+Pu3z3YEDf1QIRMicOVQrrWRt3xh4/Xv8 fGxCUtJYAAtHGWzYwLUfH8Bes5XeNzbIWPWMjZM=
X-Google-Smtp-Source: ACHHUZ5m6ScNTko+DciencgKASmXbRmkYNIVChILIsFloy0fZJe7JdHwU/hS4VJdvjelH1JvOjD7/ZqJN8iz/TQZ4xo=
X-Received: by 2002:a25:4041:0:b0:b8e:cb88:1b69 with SMTP id n62-20020a254041000000b00b8ecb881b69mr5817911yba.34.1685534200223; Wed, 31 May 2023 04:56:40 -0700 (PDT)
MIME-Version: 1.0
References: <ZHYYt77xdtrkNV1a@snel>
In-Reply-To: <ZHYYt77xdtrkNV1a@snel>
From: Lukas Tribus <lukas@ltri.eu>
Date: Wed, 31 May 2023 13:56:28 +0200
X-Gmail-Original-Message-ID: <CACC_My_e8Yt2QzeHcAzt69fywTbcGJ8dSoePEW0JFAHnh=nJTQ@mail.gmail.com>
Message-ID: <CACC_My_e8Yt2QzeHcAzt69fywTbcGJ8dSoePEW0JFAHnh=nJTQ@mail.gmail.com>
To: Job Snijders <job=40fastly.com@dmarc.ietf.org>
Cc: sidrops@ietf.org
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/pegPOHDItJDj3nepEDqY17VkASM>
Subject: Re: [Sidrops] Publication Point -> RP synchronization in bandwidth constrained environments (note for RRDP v2)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 May 2023 11:56:51 -0000

On Tue, 30 May 2023 at 17:39, Job Snijders
<job=40fastly.com@dmarc.ietf.org> wrote:
>
> Dear all,
>
> I wanted to share a note that if work ever starts on RRDP v2, that
> design team should consider conservation of bandwidth a priority.
>
> Recently one of the RIRs ran into a network congestion issue noticable
> on intercontinental transfers: their 36 megabyte RRDP snapshot was being
> served at a rate of ~ 25 kilobytes per second, and worse the TCP
> connections often timed out.
>
> As some RP implementations expect a minimum delivery rate (for example
> 100 kB/sec), some implementations allocate a timebox, and all
> implementations will switch to RSYNC if RRDP is interrupted for whatever
> reason, (causing a RRDP snapshot fetch to be queued up for the future).
> The more RPs try to fetch the RRDP snapshot for one reason or another,
> the more bandwidth is demanded, which in turn increases congestion.
>
> Some back of the napkin math: with 2500 RRDP clients interested in a 36
> megabyte RRDP snapshot, timeboxed into 15 minutes, the RRDP server would
> need to be able to sustain 1 Gb/sec for 15 minutes to send that data to
> to all RPs. If such a server for whatever reason ends up moving behind a
> 100 Mb/sec link - in the steady state serving mostly just deltas the
> available bandwidth still might seem good enough. However, the moment an
> event like a RRDP Session restart happens, things go awry and perpetual
> congestion could be the result.
>
> In this instance a (temporary) solution was to disable RRDP [1] (by
> serving a HTTP 204 error for the /notification.xml file), steering all
> RPs to use RSYNC, which seems to perform far better in bandwidth
> constrained environments.
>
> Yes, while "make the pipes bigger!" or "Just use a CDN!" might seem a
> logical and conclusive answer, I think there is room for optimization in
> the RPKI synchronization protocols. Bigger pipes cost bigger money, CDNs
> usually aren't free either, plus an additional layer of caching in the
> CDN also introduces additional complexity in operations.

I agree with protocol optimizations.

I would expect PP operators (or at the very least RIR based PPs) to do
a little more than just consider 2500 well behaving/ RFC compliant
RRDP clients though. An attacker shouldn't be able to knock out a PP
of a RIR in a few minutes, with a few dollars of budget, be it with
volumetric or application level attacks.

Now if organization XYZ decides it needs to host its own PP on a
single 1x Gbit/s or 100 mbit/s pipe, that is their choice of course.
But at RIR PP level this would be rather problematic.

I'm sure Afrinic is on top of this now, I'm just saying, we should
probably be careful about creating the perception that it's a good
idea to host a PP on infrastructure that is not up to the task.



Lukas