Re: [Sidrops] [WG ADOPTION] Adoption call: draft-timbru-sidrops-publication-server-bcp - ENDS 02/08/2024

Ties de Kock <tdekock@ripe.net> Tue, 06 February 2024 07:58 UTC

Return-Path: <tdekock@ripe.net>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4C1D4C14F604; Mon, 5 Feb 2024 23:58:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.106
X-Spam-Level:
X-Spam-Status: No, score=-2.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ripe.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Fo4jRuo0Apvz; Mon, 5 Feb 2024 23:58:15 -0800 (PST)
Received: from mail-mx-1.ripe.net (mail-mx-1.ripe.net [IPv6:2001:67c:2e8:11::c100:1311]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 27459C14F60C; Mon, 5 Feb 2024 23:58:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ripe.net; s=s1-ripe-net; h=To:Message-Id:Cc:Date:From:Subject:Mime-Version:Content-Type ; bh=pv4hotR9PIBO57WMeOEiaJCn8kvkvIrWe+wuktd0910=; b=UsKW/BsXyQepi2zsTWXdBUYO vr3VgUeByZ8t1i7gE3JavxOF+yxeZoyoUoIu4zccbM9BS+bMfqgUEGP+JZsXcjOb56b+vH/2YMasD 9lEd9iBhXIceKdbI06UM3pGyMcRS3RKAKMvS9I1dkjJ6N4W90hi4lkehiVcij/caRz+oX0vRqpM6K tYQtV7rQzATffHc7H9f5Z5oBdqBHkasUozl4XlUmOelntAi9AV7c7YLEDKmZmFVtvG5yazFxlYJua jM+t59Q3wH3sHwH8JN6YMNIi+2F9PTc688LPEhQid1hzQ3GuyITxbYB4rhjeEQsngL4pa0jbN/DiW WGTcJUEqfw==;
Received: from imap-01.ripe.net ([2001:67c:2e8:23::c100:170e]:51250) by mail-mx-1.ripe.net with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from <tdekock@ripe.net>) id 1rXGLK-00CS9f-1J; Tue, 06 Feb 2024 07:58:10 +0000
Received: from sslvpn.ipv6.ripe.net ([2001:67c:2e8:9::c100:14e6] helo=smtpclient.apple) by imap-01.ripe.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from <tdekock@ripe.net>) id 1rXGLK-00AnLp-15; Tue, 06 Feb 2024 07:58:10 +0000
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.400.31\))
From: Ties de Kock <tdekock@ripe.net>
In-Reply-To: <ZcFNNfrkMFxKf5hN@snel>
Date: Tue, 06 Feb 2024 08:57:59 +0100
Cc: Russ Housley <housley@vigilsec.com>, IETF SIDRops <sidrops@ietf.org>, IETF SIDRops Chairs <sidrops-chairs@ietf.org>, sidrops-ads@ietf.org, Keyur Patel <keyur@arrcus.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <BBE2320C-4525-4713-B4AF-3F00ECD4228A@ripe.net>
References: <87h6j1kug1.wl-morrowc@ops-netman.net> <B60D7B39-FA81-45AF-BCBD-2784F91B43C3@vigilsec.com> <ZcFNNfrkMFxKf5hN@snel>
To: Job Snijders <job=40fastly.com@dmarc.ietf.org>
X-Mailer: Apple Mail (2.3774.400.31)
X-RIPE-Signature: 059faafd1cc22ebb05e1592c815fe1e1d1a97129ad623db501fe55345d2f8718
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/gKpld6uyLNIDTPFrdpjiKbfCM9k>
Subject: Re: [Sidrops] [WG ADOPTION] Adoption call: draft-timbru-sidrops-publication-server-bcp - ENDS 02/08/2024
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2024 07:58:19 -0000

Hi Job,

Thanks for your feedback. I will comment on two because I think they deserve
more discussion in public.

> On 5 Feb 2024, at 22:03, Job Snijders <job=40fastly.com@dmarc.ietf.org> wrote:
> 
> Dear all,
> 
> On Fri, Jan 26, 2024 at 02:55:55PM -0500, Russ Housley wrote:
>> <no_hats>
>> 
>> I this the WG should adopt this document.
> 
> I am of the same opinion.
> 
> In a previous message on this topic, Di Ma seemed to suggest that it
> would be mighty helpful to have a guide capturing operational lessons
> learned in recent years for the publication side of the house, and I
> concur. Let's blamelessly look back at all 'RPKI incidents' in the last
> few years and see if there were lessons learned to distribute via this
> proposed BCP.
> 
> In some sections the document repeats what's already specified in
> Standards Tracks RFCs, but I don't necessarily see that as a bad: the
> proposed BCP is well positioned to elaborate on what might happen if the
> suggestions in original specifications are not followed.
> 
> Some suggestions for draft-timbru-sidrops-publication-server-bcp-02:
> 
> - Make a recommendation to Publishers to not regenerate RRDP content for
>  every request, nor regenerate previous RRDP deltas when creating a new
>  snapshot. This might help implementers of RRDP publication software
>  perceive and treat older RRDP content as immutable. See
>  https://datatracker.ietf.org/doc/draft-spaghetti-sidrops-rrdp-desynchronization/
>  for more details about mutations in previously published RRDP content.

I think that it is clear to all that the content of a delta should not change
after being written for the first time. I think specifying the "how" for that
process is too prescriptive. For example, your proposed method would rule out
symlink pivoting for RRDP while you still need an atomic mechanism to prevent
partial reads of some files - e.g. notification.xml.

In practice, RRDP distribution uses multiple layers of caching (generating
software -> origin servers -> CDN internal caching nodes -> edge). Effectively
all write a copy. All can corrupt it. I think the hashes in RRDP are the
perfect machanism for detecting corruption. Personally, I have found a memory
corruption issue due to those hashes...

> 
> - Section 6 could perhaps (in addition to the phrase 'consistent')
>  describe the desired outcome using the phrase 'atomic'. I think within
>  our community we often describe the cause of 'failed fetches' as the
>  repository being updated in a non-atomic fashion. 
> 
> - The draft might benefit from a "Recommended Reading" section (in
>  addition to the Glossary) to help the reader understand the context.
> 
> - The proposed BCP could perhaps include a section on "Monitoring
>  Publication Points", because I am not sure all stakeholders are aware
>  that it's massively beneficial to deploy multiple instances of
>  different RPKI Cache implementations at a given publication point, and
>  have monitoring software compare various entrypoint permutations:
>    - rsync only
>    - rrdp first, rsync second
>    - rrdp snapshot only
>    - rrdp only (following deltas)
>  The above of course for multiple distinct implementations and versions
>  of implementations.
> 
> - Continuing in context of the previous suggestion - especially at the
>  RIR-level - publication point operators should strive to support all
>  commonly deployed validator implementations (within reason) by making
>  those part of smoketests before deployment. No big website got big by
>  *only* testing with Internet Explorer. I imagine Publication Point
>  operators want to be compatible with a wide range of potential RP
>  implementations, so testing with a wide range is helpful to help find
>  the lowest common denominator: if one RP implementation is stricter
>  than other RP implementations, the Publication Point adhering to the
>  higher standard helps all deployed clients.
> 
> - Perhaps also a section about common hosting problems: if you offer
>  your publication point over IPv6, make sure that port 443 + 873 are
>  open on IPv6. Monitor the validity of TLS HTTPS certificates (for both
>  address families).


> - Perhaps a note could be added about temporarily disabling RRDP being a
>  viable option to conserve bandwidth in (rare) scenarios where the
>  Publication Point operator's network is congested. In low-bandwidth
>  scenarios rsync performs far better than RRDP (a particular race
>  condition in RRDP is avoided, rsync synchronization trivially is
>  resumed). Notwithstanding that both PP and RP side of the house would
>  do well to implement bandwidth conservation strategies such as
>  implementation of HTTP content encoding compression. I do think it is
>  helpful to remind interested readers that ultimately the objective is
>  to /somehow/ serve RPKI data (be it via RRDP or RSYNC) in a timely
>  fashion.

That is a recommendation that comes with big caveats. While the behaviour of
RRDP degrades badly (we describe the negative feedback loop in the document),
in a steady-state situation, rsync uses significantly more bandwidth (and IO).

I wonder what the success rate of rsync clients is during fallback in a
bandwidth-limited scenario. More data on this would help make a recommendation.
The fixed cost of transferring file names is high (e.g. 640KB for
rpki.afrinic.net, 263KB+1.6MB for rpki.apnic.net, 5.2MB for rpki.arin.net,
1.2MB for rpki.lacnic.net, 5.5MB for rpki.ripe.net) Based on that I would
expect many rsync clients to hit timeouts as well.

Kind regards,
Ties

> 
> Tiny tiny nits:
> 
> - s/Certificate Authorities/Certification Authorities/
> - s/next update time/nextUpdate time/
> - s/Unique Hostname/Distinct Hostname/I

Thanks! Will incorporate this into the main branch.