Re: [Acme] ACME Renewal Information (ARI) API Proposal

Matt Holt <matt@lightcodelabs.com> Tue, 24 March 2020 00:17 UTC

Return-Path: <matt@lightcodelabs.com>
X-Original-To: acme@ietfa.amsl.com
Delivered-To: acme@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8C28D3A09B5 for <acme@ietfa.amsl.com>; Mon, 23 Mar 2020 17:17:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=lightcodelabs.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zdhAtHcWQdDT for <acme@ietfa.amsl.com>; Mon, 23 Mar 2020 17:17:03 -0700 (PDT)
Received: from sender4-of-o51.zoho.com (sender4-of-o51.zoho.com [136.143.188.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9A1413A0DF3 for <acme@ietf.org>; Mon, 23 Mar 2020 17:16:57 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1585009015; cv=none; d=zohomail.com; s=zohoarc; b=QXppEKpFieUKIToHsG2pTqD4OAEu+gJSpDVTKk8qttYOp0CykVmqnGkUXjUCmTx36EhAwnxOyE1a0dNFN0sf03XfgKQNpk5qNY6conZpHbrSw/v1zuG2YmIAMBDM42Xc818JMZdymGHmosFGch4ZcHsu25DqgBKhLpOjiZh09Y4=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1585009015; h=Content-Type:Date:From:MIME-Version:Message-ID:Subject:To; bh=aAYuc4cfv7YwFtCZ+cOe2YZ3IJ6RGBtwqah2m7eVwo0=; b=KnC6vMQ1XhQoU2yeOvWBwlxpVytXShm9EGDnPi0FJlFzCGFBDBE0yuDlH9s4CiE1hYncjGf/PUYLi8DNFFhzhU7VC/yK52D+Rm8CVZLpiYs7tubQi02daBhokU4xrtB4FM8DpIV9OWPr9JGcVpNs3VSr0ONShuTxOQYsaHw1RPo=
ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=lightcodelabs.com; spf=pass smtp.mailfrom=matt@lightcodelabs.com; dmarc=pass header.from=<matt@lightcodelabs.com> header.from=<matt@lightcodelabs.com>
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1585009015; s=zoho; d=lightcodelabs.com; i=matt@lightcodelabs.com; h=Date:From:To:Message-Id:In-Reply-To:Subject:MIME-Version:Content-Type; bh=aAYuc4cfv7YwFtCZ+cOe2YZ3IJ6RGBtwqah2m7eVwo0=; b=d47pJe3bz7mEXfYqipHXpKtdciBY67CvErfzwvOxasQDvpOVYQ7jtrqQkYnGm/9S T2w+yQK0pB7k4v5ghT5rE8GO1b4KnQovWetpUrKESsQauWwUMix9yscD4Qwu+XyybaO cKukpF53DVSKgSNH5DPpip8ZXe/yC9907WfHnJF8=
Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1585008983306789.9309353012634; Mon, 23 Mar 2020 17:16:23 -0700 (PDT)
Date: Mon, 23 Mar 2020 18:16:23 -0600
From: Matt Holt <matt@lightcodelabs.com>
To: acme <acme@ietf.org>
Message-Id: <17109e5bd06.e013f69e1323656.6560789508248824271@lightcodelabs.com>
In-Reply-To:
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_4312376_314671543.1585008983303"
Importance: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
Archived-At: <https://mailarchive.ietf.org/arch/msg/acme/kSm_atiR9ageSoEheNp6xYyJZOU>
Subject: Re: [Acme] ACME Renewal Information (ARI) API Proposal
X-BeenThere: acme@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Automated Certificate Management Environment <acme.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/acme>, <mailto:acme-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/acme/>
List-Post: <mailto:acme@ietf.org>
List-Help: <mailto:acme-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/acme>, <mailto:acme-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Mar 2020 00:18:44 -0000

By way of introduction, my perspective is primarily that of an ACME 
client developer, so you'll notice my bias toward simpler client 
implementations as much as possible. However, I also am a web server 
developer (the Caddy Web Server), so I can also appreciate the concerns 
of server developers.



First, thanks to Roland and Jacob for submitting such a well-crafted proposal. It is easy to read, understand, and it is mindful of certain complexities and unknowns that will need further discussion.



The proposal suggests two problems that it attempts to solve:

1. Notifying subscribers of impending revocations

2. Scheduling regular certificate renewals



I do think both of these can be problems, but I am not sure if this proposal -- or any ACME extension, for that matter -- is the best solution to them.





## Impending revocations



In terms of trust, what is the difference between knowing a certificate is going to be revoked soon, and a certificate that is already revoked? In a binary sense, if you know a certificate is going to be revoked, it's as good as revoked. Why should you continue to trust a certificate when the CA already knows it shouldn't continue to be trusted?



The proposal treats this endpoint as non-confidential, so we can assume the CA-suggested renewal windows are public information, just as OCSP responses are. Given that some vendors are already shipping their own revocation lists to their clients ahead of CRLs, it's quite likely that some relying parties may even use the proposed endpoint to get ahead of OCSP and CRLs and apply its information toward a trust decision.



Fundamentally, the proposed extension isn't too different from OCSP already: it's a (signed? unsigned?) response from the CA that tells you whether the certificate is still believed to be trustworthy.



Before going too deep into implementation details, I think the philosophical paradox this proposal introduces should be resolved.





## Scheduling certificate renewals



I have written a lot of code that renews certificates. The proposal mentions that there are two main ways to schedule certificate renewals: 1) run a timer/cron at static intervals, or 2) choose a renewal time based on the certificate's actual NotBefore and NotAfter dates. I would add at least a third way, which is what Caddy/CertMagic does: 3) scan all managed certificates at short, frequent intervals, and if a certificate's lifetime is N% spent, initiate a renewal right then. This is similar to (2) mentioned in the proposal, but with a subtle difference: it's much simpler in that it doesn't require setting a timer or scheduling each certificate individually, but you still get the benefits of (2) and no downsides of (1). Method (3) also does not require sleeping/making reservations, which is difficult to preempt.



The downside that the proposal seems concerned with is "load clustering 
for the issuing CA" -- I read that as "thundering herd"-type problems. This is obviously a problem with (1), but for methods (2) and (3):



1. Staggering the start of ACME clients should disperse this load naturally. In other words, not all ACME clients will start their poller/scanning routine at the same time if they are duration/interval-based. Clients should avoid using wall-clock times like "minute 30" or "hour 12" for the same reasons (1) should be avoided.



2. As certificate lifetimes get shorter, the herds will thunder no matter how staggered they are.



If the problem of load clustering is really the crux of this, then is it
 possible for ACME servers to reply with a Retry-After header on 
existing endpoints if they are getting overwhelmed?





## Optional extension



This extension is very helpful for attentive, responsible clients. But for ACME clients that are... I'll say "minimally implemented"... they may not take advantage of this endpoint, and unfortunately, it's those clients which will need it the most.





## OCSP stapling sorta works



For the record, a case study: Caddy/CertMagic wasn't impacted by the recent Let's Encrypt revocation event because it attempts certificate renewal immediately upon discovering a "Revoked" OCSP status. (It staples OCSP to all certificates by default, caches the responses to disk, and keeps them refreshed about 1/2way through their lifetime.) When this happens, it does not staple that response to the current certificate -- which keeps its current Valid response for ~3 more days, while CertMagic attempts renewal. After renewal succeeds, the certificate is replaced, with a fresh new OCSP staple of course. No relying party ever sees a Revoked certificate, even with immediate revocation.



The point is, I think existing infrastructure can work for this problem.





## Vision








In my opinion, the burden is on the clients to just be a little more fault-tolerant. They should staple OCSP responses. They should do so conservatively. They can call `renewCert()` when they see a Revoked response.



Ultimately, revocation is just the means to an end: short certificate lifetimes.