Re: [Sidrops] I-D Action: draft-ietf-sidrops-prefer-rrdp-00.txt

Job Snijders <job@fastly.com> Mon, 29 March 2021 19:25 UTC

Return-Path: <job@fastly.com>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C3DF3A1F0F for <sidrops@ietfa.amsl.com>; Mon, 29 Mar 2021 12:25:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fastly.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9AG7m6yTUHkQ for <sidrops@ietfa.amsl.com>; Mon, 29 Mar 2021 12:25:18 -0700 (PDT)
Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 11CAE3A1F0B for <sidrops@ietf.org>; Mon, 29 Mar 2021 12:25:17 -0700 (PDT)
Received: by mail-ed1-x52d.google.com with SMTP id bf3so15472288edb.6 for <sidrops@ietf.org>; Mon, 29 Mar 2021 12:25:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=pt+fdZ4qMd5XqaQ1xkdhA/3rI594ictOXKsFpqSRJbc=; b=wX9deCru7mhCKtM1rD69cvsz2hBmFAa08Kk4bDDbiP7U2wN9rgbJW8wPHlYFKWc5cs 4JRRwHHwsq8XCqQB1E0eC8s2fPf4sutUww9JDafKl4hZTRVBSxlw07XEZhwUzjVPydbF tMQHXhyfLxf4Xt6vVDqCCuhWySLUjf80MfaJc=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=pt+fdZ4qMd5XqaQ1xkdhA/3rI594ictOXKsFpqSRJbc=; b=anKsbU8BcaGWRKiVC3/KBYFGAMp0N6LgrF4eERRBpMKEp27FZabZcozJHIUDAROdYT L/dcYdVu1v7aLI7K1eMkTWOZBx51gWYxGm74dqvx9MnaW5jqYcmULU2AoZ6S0tZYVBxW XQIDzo7L8LpzMjzi+93Gtt2N77Yt4U3u8gfzFH5UAIIJksN7f91DrRDRr41ug45PMh9W +GblG3DA1pY0l6civD5IrKzSxQYfSZAs3y71/gY9lmavghw4+tXIF7CfA0XK+gTMAX73 AWX2wt3dl4nVoazf1JTEqMhMMLIcKKXCCNlu+CYcSYOI3OGvkO64ppfSydjGkhuIiLbW h07Q==
X-Gm-Message-State: AOAM532Xi67fRQJToi09p3gls9BH/ZmQQQicW0Y75/4sz+AU8ueF0b3Y SYqeDsNNuHmGZvTLNfJ1Bi5ZDA==
X-Google-Smtp-Source: ABdhPJzrlculEGE6SXdNgoplj6JMVcorcFS55iknmCtfcFGgadkUDGnBwrfqHnXOcYJwMVIF09cqZg==
X-Received: by 2002:a05:6402:b85:: with SMTP id cf5mr30608551edb.248.1617045915922; Mon, 29 Mar 2021 12:25:15 -0700 (PDT)
Received: from snel (mieli.sobornost.net. [45.138.228.4]) by smtp.gmail.com with ESMTPSA id u24sm9584325edt.85.2021.03.29.12.25.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Mar 2021 12:25:15 -0700 (PDT)
Date: Mon, 29 Mar 2021 21:25:13 +0200
From: Job Snijders <job@fastly.com>
To: Tim Bruijnzeels <tim@nlnetlabs.nl>
Cc: Randy Bush <randy@psg.com>, SIDR Operations WG <sidrops@ietf.org>, Ties de Kock <tdekock@ripe.net>
Message-ID: <YGIpmVuy0TqZkWVG@snel>
References: <161403751321.2598.9484858333244233389@ietfa.amsl.com> <76D4E3AD-97BD-40D5-804C-3ED6B875044F@nlnetlabs.nl> <2B8DFACB-E2C6-48F6-B2DF-D762FCAF2384@ripe.net> <YF4Irln8qM4w8i3o@snel> <m2tuoxst8t.wl-randy@psg.com> <YF5YxjxVVgjugJHQ@snel> <m28s69s2v2.wl-randy@psg.com> <C0B7D2C0-4328-47D0-8788-108AC4A1D48D@nlnetlabs.nl>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <C0B7D2C0-4328-47D0-8788-108AC4A1D48D@nlnetlabs.nl>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/YYuMqt6Jxh8K5OSFIF8x9oZ4tE8>
Subject: Re: [Sidrops] I-D Action: draft-ietf-sidrops-prefer-rrdp-00.txt
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Mar 2021 19:25:24 -0000

On Mon, Mar 29, 2021 at 06:33:17PM +0200, Tim Bruijnzeels wrote:
> > On 27 Mar 2021, at 04:16, Randy Bush <randy@psg.com> wrote:
> > 
> >> Is the below the short summary people agree to?
> >> 
> >> 	- Try RRDP first
> >> 	- Immediately fall back to RSYNC if RRDP didn't work
> >> 	- Don't contact RRDP URIs more than once every 10 minutes
> >> 	- Don't contact RSYNC URIs more than once every 30 minutes
> >> 	- Don't contact RRDP or RSYNC URIs less than once ever 60 minutes
> > 
> > i certainly do not agree to that last line, and doubt you do
> > 
> > as far as the other numbers, see draft-ietf-sidrops-rpki-rov-timing for
> > my current opinion. but i am willing to listen.

The 24-hour sliding timers we generally observe in certificate issuance,
if combined with fetching RPKI data once a day, can cause needless
brittleness in the face of transient issues between the CA and RP.

With that in mind, I think all of us agree that syncing only once a day
is not good. However, encouraging RP implementers to sync every minute
also appears to result in problematic load.

Perhaps my command of English language is lacking, what I meant to say
with "Don't contact RRDP or RSYNC URIs less than once ever 60 minutes"
is that RPs are encouraged to try to sync with the repositories at least
once an hour, or maybe at least every 3 hours - regardless of the
syncing protocol.

In draft-ietf-sidrops-rpki-rov-timing we appear to be addressing two
groups of problems: those who are polling too fast and those who are
polling too slow. If we can get the global RPKI deployment to a state
where operators see effects in the Default-Free Zone of newly issued (or
revoked) objects within 1 or 2 hours, that would be an excellent
achievement.

Historic perspective:
Changes via 'LOA based' BGP prefix filtering would take days or weeks.
With IRR in many places the cycle was brought down to 24 hours.
With RPKI, hopefully we can achieve 'a few hours'.
The industry is making progress compared to years ago!

> @Job, can you elaborate why you believe that RRDP should not be
> contacted more frequently than once every 10 minutes?

> Are you worried about stampeding RPs in case they all check this
> frequently and then fall back together? 

No, that does not worry me.

As suggested here [0] an RP which previously was able to sync with the
CA via RRDP (or rsync), does not need to do a 'full rsync transfer' as
if the local cache is empty. Coming down 'from RRDP to RSYNC' does not
need to be expensive on the RSYNC server, if the RPs don't pretend they
have zero prior knowledge.

> My gut feeling says that the 2 minute range for checking - when
> everything works - is about right. Don't get me wrong, I am not
> fundamentally opposed to 10 minutes, but I want to understand the
> reason.

Some in the community appear to think that sticking to a two minute
cycle is a silly level of load for benefit. Changing the timer to 10
makes load grow 5x slower. I agree with this position.

It seems to me the collective first went 'too slow' in the RSYNC-only
world, and then RRDP came along and the gas pedal was pressed to the
floor. The timers in the RRDP spec being so different from the RSYNC
timers, and it being somewhat underspecified how exactly an RP
implementer should make the RSYNC and RRDP world co-exist, appear to
have led to implementation challenges in most validators.

> Back to now though.. and the more practical question: why 10 minutes?

Because every two minutes seems too fast. Such fast paced polling
intervals combined with some other (not yet discovered) software defect
can easily lead to significant load. The following github issue
describes an example of short poll timers (combined with another issue)
exacerbating a situation. https://github.com/RIPE-NCC/rpki-validator-3/issues/307

I understand the global CDNs can carry any load, but it would be nice if
small people like myself can also host their RPKI objects and
participate in the Internet from a small device. I think RPs have an
obligation to be as efficient as possible and minimize the load in all
regards.

Obviously a push protocol like BGP would resolve a lot of the issues
arising from the current polling-based mechanisms. That's for another
year.

Kind regards,

Job

[0]: https://mailarchive.ietf.org/arch/msg/sidrops/GJv01_9Nm_hZwoVNLxwbACfrPPk/