Re: [Sidrops] Benjamin Kaduk's Discuss on draft-ietf-sidrops-6486bis-09: (with DISCUSS and COMMENT)

Job Snijders <job@fastly.com> Thu, 17 March 2022 14:40 UTC

Return-Path: <job@fastly.com>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8D5903A0BD2 for <sidrops@ietfa.amsl.com>; Thu, 17 Mar 2022 07:40:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.109
X-Spam-Level:
X-Spam-Status: No, score=-7.109 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fastly.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z4wFlZmJQsVW for <sidrops@ietfa.amsl.com>; Thu, 17 Mar 2022 07:40:42 -0700 (PDT)
Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 390BA3A0A94 for <sidrops@ietf.org>; Thu, 17 Mar 2022 07:40:42 -0700 (PDT)
Received: by mail-ed1-x52c.google.com with SMTP id y22so6853592eds.2 for <sidrops@ietf.org>; Thu, 17 Mar 2022 07:40:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fb6zJwh/ErI4Gn3xghrcAVD2ZsVeeDlFIUsrrbXyegk=; b=R6o8pHYA8KC67rxvplV51HEG8QDLoLR6OtVr7h+37T39M/UNE8lSePwGROi9bsCmze 3o20rzCM1lQoI3JyyaOQWIzJ6lUazp8FQoTk20eVOiSKVeQEKF8ilpQ9o4U5G7979+ST 5E0cmXFtKY4g/9m4rom98NL7lkug5f+ttXZ3Y=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fb6zJwh/ErI4Gn3xghrcAVD2ZsVeeDlFIUsrrbXyegk=; b=vBMtuLyzWK5ns7jO4MsY3yIz99I08CRNyx+bdMUNffzng5itpUbJLb8V37ZQL+tauu OjflBNbYP/0bf369mcyUzH4NokYNiBlAEayqqBGB+DdjV0Jbf6/wZBKavrYsksFZ2epD alSHCt2uk/dcOomv6RzyPKD9QMzva28P1wlWrGCcVxO3xQ/vrpRLmZBxt8zi8PCXEviY AL8Zd5zW6hh82Xzlsonb8b00fiQrkx+zTx9B1Zt1KnJl7LAgEQMwwUGY0m37UZELqcMP f2Cyjc4LsgzBZw1Yt9i/JhCdskhsWguzRy5sBWIKYc6XeE64kQaaAmP+wF4ZXD9HIz09 0cCQ==
X-Gm-Message-State: AOAM530oCRFwRP5jGzNBvmJs6qyeC25R2qcf40UQGXPB6TMpK1J5qvZ5 UW+sDaM2uzizK5zJYYtmn5n7Yw==
X-Google-Smtp-Source: ABdhPJzKyfaD+75KjtGXvwVPZVEsDGY/+RKTk+Kfoj+hO+f7Oi6H+F0YZQiLoRxxLTTixpYygaIklQ==
X-Received: by 2002:a05:6402:368f:b0:418:e826:1f19 with SMTP id ej15-20020a056402368f00b00418e8261f19mr4830614edb.98.1647528039914; Thu, 17 Mar 2022 07:40:39 -0700 (PDT)
Received: from snel ([2a10:3781:276:2:16f6:d8ff:fe47:2eb7]) by smtp.gmail.com with ESMTPSA id u5-20020aa7d985000000b004024027e7dasm2626967eds.28.2022.03.17.07.40.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Mar 2022 07:40:39 -0700 (PDT)
Date: Thu, 17 Mar 2022 15:40:37 +0100
From: Job Snijders <job@fastly.com>
To: Rob Austein <sra@hactrn.net>
Cc: Benjamin Kaduk <kaduk@mit.edu>, The IESG <iesg@ietf.org>, sidrops-chairs@ietf.org, morrowc@ops-netman.net, sidrops@ietf.org, draft-ietf-sidrops-6486bis@ietf.org
Message-ID: <YjNIZf/GcXnjGD2y@snel>
References: <164366773060.21391.16732854790829264927@ietfa.amsl.com> <YgZTmoUhfxlsQKMJ@snel> <20220225235526.GY12881@kduck.mit.edu> <20220227002536.3516E2EA009D@minas-ithil.hactrn.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20220227002536.3516E2EA009D@minas-ithil.hactrn.net>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/YhMOdHIW4ACR4wZVMvOMQW__Db4>
Subject: Re: [Sidrops] Benjamin Kaduk's Discuss on draft-ietf-sidrops-6486bis-09: (with DISCUSS and COMMENT)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Mar 2022 14:40:58 -0000

Dear Rob, others,


On Sat, Feb 26, 2022 at 07:25:36PM -0500, Rob Austein wrote:
> Apologies for jumping in late here, I've been distracted by $dayjob
> (which has not been RPKI for many years now) and only just saw this.


The document's ink didn't dry yet, it wasn't even sent to the printer,
so there still is time! :-) I consider your feedback most welcome. See
below for my response. 


> I fear that I must object, strongly, to Job's proposed change:


Indeed I was the one to recently submit a suggestion to replace
"RECOMMENDED" with "MUST", ... I am not the first to do so! :-)


> > > ----- section 4: Manifest Definition ----
> > > 
> > > OLD section 4.2.1 second paragraph:
> > >    Because a "one-time-use" EE certificate is employed to verify a
> > >    manifest, it is RECOMMENDED that the EE certificate have a validity
> > >    period that coincides with the interval from thisUpdate to nextUpdate
> > >    in the manifest, to prevent needless growth of the CA's CRL.
> > > 
> > > NEW Section 4.2.1:
> > >    Because a "one-time-use" EE certificate is employed to verify a
> > >    manifest, the EE certificate MUST be issued with a validity period
> > >    that coincides with the interval from thisUpdate to nextUpdate in the
> > >    manifest, to prevent needless growth of the CA's CRL.
> > > 
> > > ----- Section 5: Manifest Generation ----
> > > 
> > > OLD Section 5.1 last paragraph:
> > >        It is RECOMMENDED that the validity interval of the EE
> > >        certificate exactly match the thisUpdate and nextUpdate times of
> > >        the manifest.
> > >        Note: An RP MUST verify all mandated syntactic constraints, i.e.,
> > >        constraints imposed on a CA via a "MUST".
> > > 
> > > NEW section 5.1:
> > >        The validity interval of the EE certificate MUST exactly match
> > >        the thisUpdate and nextUpdate times specified in the manifest's
> > >        eContent. (An RP MUST NOT consider misalignment of the validity
> > >        interval misalignment in and of itself to be an error.)


For people just now catching up on this thread, here is some background:

https://datatracker.ietf.org/doc/html/rfc6486#section-4.2
Feb 2012 - (paraphrased) "Manifest nextUpdate MUST equal CRL nextUpdate"

The first iteration of the bis-effort draft-ymbk-sidrops-6486bis-00
(July 2020), and the adopted version draft-ietf-sidrops-6486bis-00
(August 2020) up to draft-ietf-sidrops-6486bis-05 (July (2021): all had
something along these lines:

    "Each manifest encompasses a CRL, and the nextUpdate field of the
     manifest SHOULD match that of the CRL's nextUpdate field, as the
     manifest will be re-issued when a new CRL is published."

Then, in draft-ietf-sidrops-6486bis-06 the MUST was watered down to a
RECOMMENDED: https://www.ietf.org/rfcdiff?url2=draft-ietf-sidrops-6486bis-06.txt

The 'watering down' happened because it was observed that if RPs would
consider validity time window misalignment a hard error, many currently
deployed CAs would flip to invalid for no security benefit, see this
thread: https://mailarchive.ietf.org/arch/msg/sidrops/VNG77j05I2JXOwv4qkSy-DCgRNE/

My sense of the working group's consensus is that there is a desire
to proscribe slightly different behavior for RPs and CAs to ensure that
currently deployed CAs are not pointlessly tossed out, but on the other
hand encourage future CAs to align their validity time window. This -bis
effort (un)fortunately deals with a "brownfield" situation.


> This business of pinning the manifiest EE certificate's validity
> period to the manifest's thisUpdate/nextUpdate interval has always
> been a bad idea, I've objected to it in the WG before, and we have a
> long sad history of operational disasters resulting from
> implementations that followed that recommendation.
> 
> The typical failure mode (demonstrated at least once each by three of
> the RIRs) is to pick a thisUpdate/nextUpdate cycle on the order of a
> day or so (reasonable), then go home for a long weekend, during which
> something breaks and no new updates occur.


I believe that at this point in time, all RIRs have had to "learn the
hard way" they must take extreme care to properly monitor and maintain
their tree. Be it because of CRL expiration, manifest thisUpdate/nextUpdate,
or some other potential for issues: there are many moving parts. In the
last 2 years all RIRs have experienced the wrath of upset Relying
Parties (or subordinate CAs) when things went south.

Luckily, nowadays auxiliary tooling such as BGPAlerter exists which
warns (days of even a few hours ahead of time) of the imminent failure
of RPKI trees.  TA operators should take note of this utility, if they
haven't already done so.
https://github.com/nttgin/BGPalerter/blob/main/docs/configuration.md#monitorroas

At the end of the day, these (and a few other) timers make the event
horizon collapse - at some point. The decisions in the RP decision tree
are composed of "hard decisions", in the sense that ultimately the
verification outcome is quite polarized: valid or invalid.


> The manifest thisUpdate/nextUpdate semantics are (deliberately) very
> much like the CRL semantics: failure to issue a new manifest by the
> nextUpdate time does not automatically invalidate all the existing
> data, it's just a hint that something might be wrong.  Think of it as
> a staleness indication.  Slightly stale, log a warning but no big
> deal (yet).  Gets stale enough, maybe you don't want to eat that.


The above paragraph (for me) is a description of a "difficult to
implement" procedure, because it does not describe a "hard decision
tree" (as is used in today's commonly deployed RP implementations).

I consulted with TA operators who told me they appreciate cutting down
on divergence between RP implementations, and all TA operators nowadays
CLOSELY monitor their trees. 

I do not intend to update the OpenBSD validator to consider any manifest
"current", despite the Manifest's eContent nextUpdate being in the past.
I do not anticipate other currently deployed RP implementations to
update their implementation from "hard error" to "soft warning" in this
context either.


> Certificate expiration, on the other hand, is a hard failure.  When a
> manifest's EE certificate expires without being replaced, the data
> listed in the manifest is just gone.
> 
> So when an RIR does this with the manifest directly under their root
> certificate, a large portion of the total database goes away.  Oops.


Right, "don't touch the stove when it is hot". :-)


> Manifests only have EE certificates because they're a patch on the
> side of X.509. If X.509 had allowed us to sign manifests using the CA
> certificate, as with CRLs, we would have done that.  But it doesn't,
> so we have the manifest EE certificates, OK, but that's no reason to
> kill large portions of the database every time an operator near the
> root of the tree says "oops".
> 
> All of this has been discussed before, and should be both in WG
> archives and in notes from face to face meetings (probably both the
> SIDR and SIDROPS WGs, not a new topic).
> 
> My own RPKI CA implementation uses the validity interval of the parent
> CA certificate as the validity interval of the manifest EE
> certificate, and has for about fifteen years now.  I'm not aware of
> any operational problems that have arisen as a result.
> 
> Yes, there's a potential issue of a CRL getting too long, but if it
> happens at all, it happens at a predictable rate, and one can always
> rekey if it's really becoming big enough to be a problem.
> 
> Job's proposed change makes the bad idea a requirement rather than
> just bad advice, hence my objection.


Indeed, you have not seen operational issues, because the proposed
change does not require your CA implementation be changed, as you can
rely on the RPs not rejecting the cryptographic products from your CA
implementation based on a validity time window misalignment.

CA operators are free to set the Manifest's eContent nextUpdate and the
CRLs nextUpdate as far into the future as they wish, each CA needs to
make a trade-off about how they schedule on-call on the weekends. CA
operators can nowadays rely on RP implementations only permitting the
narrowest validity window transitively concluded from the entire chain.

I appreciate your concern, but think the danger you warn about should be
addressed through other methods: be it through pain or monitoring. I
hope the above message makes it clear that your concern has been
considered. 

Kind regards,

Job