[Sidrops] Publication Point -> RP synchronization in bandwidth constrained environments (note for RRDP v2)

Job Snijders <job@fastly.com> Tue, 30 May 2023 15:39 UTC

Return-Path: <job@fastly.com>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A24CC14EB1E for <sidrops@ietfa.amsl.com>; Tue, 30 May 2023 08:39:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fastly.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wJ2IK6VBRrq5 for <sidrops@ietfa.amsl.com>; Tue, 30 May 2023 08:39:41 -0700 (PDT)
Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 139FCC1519B9 for <sidrops@ietf.org>; Tue, 30 May 2023 08:39:40 -0700 (PDT)
Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-5148ebc4b89so6737018a12.3 for <sidrops@ietf.org>; Tue, 30 May 2023 08:39:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; t=1685461178; x=1688053178; h=content-disposition:mime-version:message-id:subject:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=ippLgsEWLSguZM+soRhy4JUKKhickbBRhaRJ5XpIMdo=; b=rsEwzSzE57E2F1Gv8mdZe/hb0gQ9sWZGxhZzWirEsp6w202JVbgtuH0cKAssPG6f/q qmZhwp0vkZUcvXwf0ypF8acmR435XG9Uhlxl5SY3yoTfAzHj2Q98oc4saquQuJw9BmJm MIgB9ArAYXfxFMrK4oKrvJgZ80e5x182RkMRs=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685461178; x=1688053178; h=content-disposition:mime-version:message-id:subject:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ippLgsEWLSguZM+soRhy4JUKKhickbBRhaRJ5XpIMdo=; b=PW9Vb+nKDu4F0iyzItpBFt9H1jjdE5XwnDh1uf44RuViy6XzgJcWe2bC+Wq6RT35Ci 6boc1PYX+x1Y7BWCvMYoGEVGLLdq4JE4QlS9f6ETuIDhs5R9j8/ifkad8xcFSDhgid7K xGnz2n+gX2VvygfiHxOxA4DJzoWKfU/UEkvXPhl5n1AABOjyE6IYm/QiUl+oQz0BlPIh gH9xreHkT1ESqFcO0c+KY8arg2bo/08K1+nm0nV+2zaibF0DQ1001JB7gAZPR6fmVR1+ w93YwNG9SLTksqv6I09f3gfMO5C6aeoKu/Z60o9T3NGfdLUxq5IqtEn5Ol4YNkwSx+WJ WPnQ==
X-Gm-Message-State: AC+VfDyXzmO2linGETjFkUIDm52jj8lAZC8Y5vC/BAC+RniJMH43q8/G BhCnJ5XKeeEBvWhqTnL4nEZoNcgHgznA6LMUQCtIIikxpnwhDYf/z92GMS5qrDUb4YZOMaEHqBJ OAuwmfVwb/kjCQW0xviUnYHjXSuAkKLTHnnA2v1CbFOVwzKjm1kwb3Kapvigj
X-Google-Smtp-Source: ACHHUZ7CAZfzfSoriBUzbXo4muEZu5NHmzCBg1JAcMUpX4N2XJb8mM29RJ9m+kwf9KJ/QNL7nrZqcw==
X-Received: by 2002:aa7:d9da:0:b0:510:f6e9:6d92 with SMTP id v26-20020aa7d9da000000b00510f6e96d92mr2007891eds.0.1685461178305; Tue, 30 May 2023 08:39:38 -0700 (PDT)
Received: from snel ([2a10:3781:276:1:16f6:d8ff:fe47:2eb7]) by smtp.gmail.com with ESMTPSA id bo20-20020a0564020b3400b005147d242defsm4436444edb.26.2023.05.30.08.39.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 08:39:37 -0700 (PDT)
Date: Tue, 30 May 2023 17:39:35 +0200
From: Job Snijders <job@fastly.com>
To: sidrops@ietf.org
Message-ID: <ZHYYt77xdtrkNV1a@snel>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/s70Z3EOJX5TcRYKbNA6axbK_Tuo>
Subject: [Sidrops] Publication Point -> RP synchronization in bandwidth constrained environments (note for RRDP v2)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 May 2023 15:39:45 -0000

Dear all,

I wanted to share a note that if work ever starts on RRDP v2, that
design team should consider conservation of bandwidth a priority.

Recently one of the RIRs ran into a network congestion issue noticable
on intercontinental transfers: their 36 megabyte RRDP snapshot was being
served at a rate of ~ 25 kilobytes per second, and worse the TCP
connections often timed out.

As some RP implementations expect a minimum delivery rate (for example
100 kB/sec), some implementations allocate a timebox, and all
implementations will switch to RSYNC if RRDP is interrupted for whatever
reason, (causing a RRDP snapshot fetch to be queued up for the future).
The more RPs try to fetch the RRDP snapshot for one reason or another,
the more bandwidth is demanded, which in turn increases congestion.

Some back of the napkin math: with 2500 RRDP clients interested in a 36
megabyte RRDP snapshot, timeboxed into 15 minutes, the RRDP server would
need to be able to sustain 1 Gb/sec for 15 minutes to send that data to
to all RPs. If such a server for whatever reason ends up moving behind a
100 Mb/sec link - in the steady state serving mostly just deltas the
available bandwidth still might seem good enough. However, the moment an
event like a RRDP Session restart happens, things go awry and perpetual
congestion could be the result.

In this instance a (temporary) solution was to disable RRDP [1] (by
serving a HTTP 204 error for the /notification.xml file), steering all
RPs to use RSYNC, which seems to perform far better in bandwidth
constrained environments.

Yes, while "make the pipes bigger!" or "Just use a CDN!" might seem a
logical and conclusive answer, I think there is room for optimization in
the RPKI synchronization protocols. Bigger pipes cost bigger money, CDNs
usually aren't free either, plus an additional layer of caching in the
CDN also introduces additional complexity in operations.

Making RPKI PP/RP synchronization more efficient is advantageous to all
stakeholders in this ecosystem.

Why does RSYNC perform better in this type of situation?
=========================================================

1) The Base64 encoding in RRDP imposes a non-negligible overhead of 33%
   which doesn't exist in RSYNC (where the files are transferred in
their 'bare' form).
2) The XML formatting of the resulting document (elements, attributes &
   newlines) adds another single digit percentage of overhead compared
   to transfer of the bare files in bare form.
3) RRDP snapshot downloads are hard to resume: if the session is
   interrupted and then the RRDP serial increases, RPs will try to fetch
   the new snapshot as per instructions from the RRDP server. RSYNC on
   the other hand provides a far more fine-grained and seamless
   resumption mechanism.

Of course there are well-understood downsides to RSYNC too, what we're
looking at is a set of trade-offs: mostly centered on the trade-off
between bandwidth conservation in exchange for cpu cycles. The point of
this message is not to promote RSYNC but to take a look at the upsides
of RSYNC compared to RRDP v1.

In this particular event (measured at a distance of 182 ms latency),
disabling RRDP caused the time it takes to do a full synchronization
from 30+ minutes down to 17 seconds, subsequent resyncs now take 6
seconds, and congestion seems to have cleared.

Takeaways for RRDP v2
=====================

* The publication point operator should be able to signal in the
  (equivalent of) notification.xml file a set of recommended timers
  for fetching: it would be super nice if during congestion events the
  PP operator can inform the clients to please check back at at slower
  pace (say every 20 or 40 minutes) instead of whatever the RP's timers
  are. This would give the PP operator some tools to ameliorate the
  situation when lacking bandwidth or other resources.

* Replace the concept of distributing signed objects, certs & crls
  inside XML documents using Base64-encoding, with something more
  efficient. Perhaps packing all to-be-transferred files in a DER
  SEQUENCE, optionally wrapped in a self-signed CMS container for easy
  checksumming. Another feasible approach would be to design an
  RPKI-application specific compression algorithm : there is a ton of
  duplicity in for example AuthorityInfoAccess fields or the
  policyQualifier fields.

* Rsync offers the community individually addressable URIs for
  individual files, while RRDP v1 only offers 'the whole thing'.
  Individually fetchable files are fantastic for debugging, and makes it
  easier for CA/RP developers to reason and communicate about events.
  What APNIC is doing here by offering all files on the RSYNC server
  also as HTTPS download is a step in the right direction:
  https://rpki.apnic.net/member_repository/A91A0D9C/9E28300657A811E8B4AC0877C4F9AE02/

* Reconsider compression. I am aware of issues like CVE-2021-43174
  which prompted some developers to remove support for gzip compression
  in implementations, yet at the same time there is a lot to be gained
  from compression:

    $ ls -lahtrS afrinic.*
    -rw-r--r--  1 job  wheel   5.8M May 29 12:15 afrinic.tar.lzma
    -rw-r--r--  1 job  wheel   6.0M May 29 12:15 afrinic.tar.zst
    -rw-r--r--  1 job  wheel   6.1M May 29 12:15 afrinic.tar.bz2
    -rw-r--r--  1 job  wheel   6.3M May 29 12:15 afrinic.tar.gz
    -rw-r--r--  1 job  wheel  13.0M May 20 12:51 afrinic.rrdp.xml.gz
    -rw-r--r--  1 job  wheel  17.8M May 29 12:15 afrinic.tar
    -rw-r--r--  1 job  wheel  35.5M May 20 12:51 afrinic.rrdp.xml

  In the above terminal transcript the 'afrinic.tar' file simply is a
  tar archive of the validated cache directory specific to AfriNIC. This
  tar file contains all currently valid objects. The tar file is half
  the size of the RRDP snapshot, and compressing the tar with lzma,
  zstd, bz2, or gzip yields a considerable smaller outcome than gzip
  compressing the RRDP v1 XML snapshot. As noted above, an
  RPKI-application-aware compression algorithm would yield even smaller
  snapshots.

I'm sure other people have learned other lessons over the years working
with both RSYNC and RRDP: your and their input could perhaps contribute
to a RRDP v2 requirements document.

When we have RRDP v2, perhaps either RSYNC or RRDP v1 can be deprecated.

Kind regards,

Job

[1]: AfriNIC disabled RRDP https://status.afrinic.net/notices/caelaru4q1vhqbv2-rrdp-service-availability