[Acme] ACME Renewal Information (ARI) API Proposal

Roland Shoemaker <roland@letsencrypt.org> Mon, 23 March 2020 22:00 UTC

Return-Path: <roland@letsencrypt.org>
X-Original-To: acme@ietfa.amsl.com
Delivered-To: acme@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A9C53A0F76 for <acme@ietfa.amsl.com>; Mon, 23 Mar 2020 15:00:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=letsencrypt.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kuT2aO4g9cYF for <acme@ietfa.amsl.com>; Mon, 23 Mar 2020 15:00:47 -0700 (PDT)
Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 03EBA3A0FBF for <acme@ietf.org>; Mon, 23 Mar 2020 15:00:44 -0700 (PDT)
Received: by mail-wm1-x344.google.com with SMTP id r7so1074375wmg.0 for <acme@ietf.org>; Mon, 23 Mar 2020 15:00:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=letsencrypt.org; s=google; h=mime-version:from:date:message-id:subject:to; bh=J/TVJhtG3TTll2E/cEQ3bQDXIlc54Wb0qzyabSYnI2k=; b=Y+SIWWOeQrvC+N/eQfxvIgIudc9MeANugN007iX6VqMD4aL4/M7+UpLOjlDkLSvexi /oaavqASSYTGc4v0XqGJdl07tpzrDDd7xf3KI/QUEcUHqcXSLbaj8UGc86I9XIyuVzyB Im+455pt7NQh7yJyF5saZLDTDmkgk37brhr4o=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=J/TVJhtG3TTll2E/cEQ3bQDXIlc54Wb0qzyabSYnI2k=; b=d8oC8W+DVOdKdSHK6A2NeOu45Oduc6pzk9LhKB+px0n4Eb2CxoTHh2FBtKLofegq9i ZQj/6+/onPGq4Df6TAM5tVhWUHnIICjmMSVRtCs1GIfDk3hzXC+A6j8EgVmQz0+31WGS SiHiqRlr6kDHcww1TWCh3+Y2T1IAfY6SCWm/a76ZoIyKI+RjajA/oqH0T9cDBohDxbYf Xih1uxwI++pgIG/I2RXiSj34KFFQhONBwyiIZKfJgSxbRupdsRNPb36RSB8d9+QlClZp iHC6M/P+2pKXkvGPqqMpPwiVIoRdagQvlwUYhIMinN9KZ2/rdgDAZEUbeuhfxYRVNOFm B2Uw==
X-Gm-Message-State: ANhLgQ0J6SodimzGomKwBN7yGV7BtdYfNmq5fJOvCfuKdElWVF3SULdf h8YTuDod0oQax0BCUv7Q1sNXyByn1jBr8HD8gnpVfnm3B7dt+Q==
X-Google-Smtp-Source: ADFU+vtlPyD5jM11scoJEkevY/U3mUJXbi/wm4M8UEKhJjQ34katVA+mFZHrszEFYyIBTkd7C1vwbPnElBmS42myHUk=
X-Received: by 2002:a1c:7701:: with SMTP id t1mr1543638wmi.69.1585000842728; Mon, 23 Mar 2020 15:00:42 -0700 (PDT)
MIME-Version: 1.0
From: Roland Shoemaker <roland@letsencrypt.org>
Date: Mon, 23 Mar 2020 15:00:31 -0700
Message-ID: <CAF1ySfFWU4Bai7SPyXfF5v5y_zUa89fkt7tEDvfQmzEBE9LkKA@mail.gmail.com>
To: IETF ACME <acme@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000033391c05a18cc444"
Archived-At: <https://mailarchive.ietf.org/arch/msg/acme/b-RddSX8TdGYvO3f9c7Lzg6I2I4>
Subject: [Acme] ACME Renewal Information (ARI) API Proposal
X-BeenThere: acme@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Automated Certificate Management Environment <acme.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/acme>, <mailto:acme-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/acme/>
List-Post: <mailto:acme@ietf.org>
List-Help: <mailto:acme-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/acme>, <mailto:acme-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Mar 2020 22:01:16 -0000

Hey all,

At Let's Encrypt we've been thinking about designing some kind of renewal
information API as an extension to ACME for a while now. Recent events have
brought this back to the forefronts of our minds. Below I've attached a
proposal I've written up detailing our proposal. I'd really like to get
input on this proposal, especially from those working on ACME clients as
this work mostly represents thoughts from ACME server developers, and as
such may not accurately capture issues faced by clients.

If the working group is interested in this as a work product I'll spend
some time developing an ID based on this outline.

Thanks!
Roland

----

This proposal aims to address two issues that affect both ACME and the
wider web PKI.

The first of these issues is how a CA should inform subscribers of a CA, or
third-party, initiated certificate revocation event. In most cases this is
done via email, or other out-of-band notification channels, which may be
appropriate for CAs that rely on manual processes but seems clunky for ACME
based CAs which heavily rely on automation. For automated ACME clients the
probability that a user will act upon a revocation notice (or even receive
one if they do not provide an account contact) is lower than manually
maintained certificates, leading to the possibility of serving a revoked
certificate until their next renewal window. In cases where the CA has a
buffer before performing the revocation, being able to inform the client of
this impending event would allow for seamless renewal before the revocation
took place, and in the case where the revocation has already taken place
this would help to significantly reduce the impact to the subscriber.

The second issue is how ACME clients should determine when to renew a
regular non-revoked certificate. Most clients take one of two routes. They
are either manually configured to renew at a specific interval (i.e. via
`cron` or similar) or parse the issued certificate to determine the
expiration date and choose some date preceding it to attempt renewal. While
the latter option is better than the former, each can cause issues for both
the client and the issuing CA. The first option causes significant barriers
for the issuing CA changing certificate lifetimes, as the static renewal
window makes assumptions about that lifetime that must be manually updated.
Both options can also cause load clustering for the issuing CA. Being able
to indicate to the client a period in which the issuing CA suggests renewal
would allow dynamic changes to the certificate lifetime and smearing of
load.

## ACME vs. OCSP

The two obvious options for transmitting renewal suggestions to the
subscriber are via an extension to the ACME protocol, or an extension to
the OCSP protocol. Each option has advantages and disadvantages.

For OCSP one of the obvious advantages is that the protocol is already
designed to carry revocation information, which some clients already poll
for. An extension could be added to OCSP responses containing 'recommended
renewal' windows and/or indicators of impending revocation. Using OCSP also
has the advantage of existing serving and caching infrastructure.

The disadvantages of using OCSP mainly revolve around usage of the protocol
by relying parties. In order to avoid intolerance for new OCSP extensions
we would likely want to require clients to indicate that they want this
information, via either an OCSP request extension or an HTTP header, which
could increase caching requirements for the issuing CA and possibly require
changes to their existing caching infrastructure.

For ACME the most obvious advantage is that every ACME client already
understands the protocol, and should have a relatively easy path to being
extended to understand a new API endpoint. Using ACME also allows a more
descriptive, extensible API, rather than requiring us to stuff more
information into a strictly defined ASN.1 extension. Using ACME would allow
for requesting information on multiple certificates in a single request,
which while technically possible via OCSP is in reality rarely supported.

The disadvantages of using ACME mainly revolve around increased load on the
ACME API for the issuing CA. ACME currently has no endpoints that are
designed to be routinely polled, adding one could introduce a significant
load vector which infrastructure has not been designed for. Another
disadvantage is that if the API was authenticated it wouldn't be possible
to viably cache the renewal information at a CDN layer.

On balance it seems like ACME is the better choice for this API.

## Push vs. Pull

The CA could either push information to ACME clients, for instance via
webhook, or it could rely on clients polling for information.

The push method is challenging because many ACME clients run behind
firewalls or don’t have full access to provide external-facing services.
For instance, an ACME client might only have the ability to provision files
under /.well-known/acme-challenge/, or it might only have access to modify
DNS records.

The pull method, on the other hand, is straightforward. ACME clients, by
necessity, need to send HTTPS requests to the CA. They can use that same
channel to poll periodically.

The disadvantage of polling is that it provides less timely results than
pushing. The most relevant constraint is the Baseline Requirement that CAs
must revoke within 24 hours on key compromise, or when validation
information “cannot be relied on.” Polling must be frequent enough that the
ACME client will receive notification within this 24-hour window, with
enough remaining time for manual escalation if the automated client fails
to act. Polling on a 12-hour interval should provide this.

## Cacheability

An important question to answer is if the results of this API need to be
cacheable, and if so what level of cacheable it should have. One reason for
designing the API around cacheability would be high request load for
repeated requests. Users for repeated identical requests are likely to have
a relatively low cardinality and these requests are not likely to be made
rapidly, suggesting that the API doesn't need to be highly cacheable. That
said given the information returned by the API isn't likely to be dynamic
(for instance in the lifetime of the certificate it is unlikely to change,
barring a revocation event) it seems likely that the issuing CA would like
some way to cache the results in order to reduce unnecessary resource usage.

## An OCSP-based design (rejected)

Here we’ll sketch out an OCSP-based design for contrast with the design
proposed below.

OCSP is frequently fetched by Relying Parties (RPs). We do not want to
increase the bandwidth usage for normal RP fetches, since that would worsen
performance for many normal web browsing requests. Also, when ACME clients
poll, they will want different caching semantics than RPs. CAs will want
ACME clients to get fresh information about every 12 hours, while OCSP
responses are commonly cacheable up to their NextUpdate, which according to
the Microsoft Root Program can be up to 7 days after ThisUpdate. While CAs
could shorten their NextUpdate interval to accommodate ACME clients, this
would be an unnecessary coupling of concerns.

Under this proposal, ACME clients that poll OCSP for renewal information
MUST add an HTTP Header, “ACME-Renewal: 1” to their requests. CAs that use
CDNs to serve OCSP responses MUST treat the ACME-Renewal header as part of
their cache key, so that responses to ACME clients can have a different
Cache-Control: max-age than those sent to RPs.

In the normal case, when no renewal is needed soon, the OCSP response will
be unchanged. For the “renewal needed soon” case, we have two choices to
convey that information: An OCSP extension, or an HTTP header. The OCSP
extension has the advantage that it’s signed, but has the disadvantage that
it requires extra signatures from a CA’s HSM, at a time when the HSM may
already be burdened by signing bulk revocation responses.

In the case where a CA wants an ACME client to renew a certificate, the CA
responds to all requests that have “ACME-Renewal: 1” in the header with a
response that has the header “ACME-Renewal: renew-by=<datetime>;
key-rotate=<true/false>”. The ACME client then attempts renewal by the
specified datetime.

In both cases the CA MUST include the “Vary” header in its response, and
must include “ACME-Renewal” among the header names listed.

Advantages of this proposal:


It does not require a discovery mechanism for ACME clients to find out
where to check the status of a certificate; the OCSP URL is already
available in the certificate itself.
ACME clients can also check the revocation status of the OCSP response. For
CAs that don’t support renewal notifications, these clients could trigger
renewal immediately on noticing a certificate was revoked.

Disadvantages of this proposal:


It combines two different types of requests with different caching at a
single URL, inviting subtle mistakes with cache keys.
Because OCSP URLs embedded in certificates necessarily use HTTP, the
response triggering renewal is unauthenticated. A MITM attacker could use
this to trigger early certificate renewal.

We reject this design.

## Proposed API

Here we propose a roughly sketched out ACME API extension, taking into
account the topics discussed above.

Conformant ACME servers should include a new key in the JSON objects for
finalized orders with the key “renewalInformation”. The value of this field
should contain a unique URL from which renewal information can be
retrieved. To request renewal information conforming ACME clients should
make a GET request to this URL.

The ACME server should respond to a request with a JSON object containing
renewal hints for the associated certificate.

{
    "suggestedRenewalWindow": {
        "start": "...",
        "end": "..."
    },
    "keyRotate": true
}

The structure of the certificate objects is as follows:

suggestedRenewalWindow (object, required): A JSON object containing two
strings, "start" and "end", which indicates the window in which the CA
recommends renewing the certificate. Conformant ACME clients should pick a
random time within this window at which to renew the certificate. If this
window is in the past, conforming clients SHOULD immediately attempt to
renew the certificate.

keyRotate (boolean, optional): A boolean indicating if the ACME server
requires that the renewed certificate MUST use a new key pair.

The HTTP response should contain a Retry-After heading indicating the
polling interval that the ACME server recommends. Conforming ACME clients
SHOULD use this value to determine their polling schedule, using the
returned date as a lower bound for requesting information again, rather
than using a fixed interval.

This API is explicitly unauthenticated, and does not use the ACME
POST-as-GET scheme, as none of the information used by this API is
considered confidential.

Conforming ACME servers may construct the renewal URLs included in order
objects in any fashion they wish as long as the URL is stable for the
lifetime of the certificate. Conforming clients should store this URL
locally so that the ACME server does not need to be queried in order to
learn the URL as the server may delete, or otherwise make unavailable, the
related order object while the certificate is still valid.

### Discoverability & URL Construction

Determining how the ACME server offers renewal information, and how  the
ACME client discovers this information, is a big question. We’ve proposed
one design above, but acknowledge that there are trade-offs in our design
which may make more sense to ACME server implementers than ACME client
implementers. This section details those trade-offs with our proposed
design, and another initial design we rejected.

The design proposed uses a static URL, the format of which is not
specified. These URLs are provided via the order object, and as they have
no specified structure, cannot be derived from the certificate itself. This
means that clients must store the URL locally in order to access the API or
access the order to learn the URL, although as orders may expire, or become
inaccessible during the lifetime of the certificate, this is not an ideal
approach. This also means that ACME servers must continue to serve this
specific URL for the lifetime of the certificate and cannot dynamically
change where they serve this information from.

Our initial design specified the construction of the URL, using a directory
resource to point to the API endpoint and the SHA256 hash of the
certificate as the token. The upside of this is that it would allow clients
to construct the URL without any required local state other than the
certificate itself. It would also allow the ACME server to change where it
was serving the API endpoint from dynamically, as the client would need to
query the directory to learn the first portion of the URL to append their
hash to. The main downside here is that it requires clients to make two
requests each time they want to access the API, one to the directory
endpoint, and then another to the API endpoint. Depending on the design of
the ACME server this could cause a significant increase in load,
specifically to the directory endpoint. Another downside is that this
requires that we specify the construction of the token beyond simply being
unique, which adds complexity to the specification.

We could possibly merge these two designs, such that the specification
specifies how to construct the URL, and provides a directory entry, but
RECOMMENDS that the ACME client store this URL locally in order to reduce
load on the ACME server. In the case where the client makes a request and
receives a 404, for instance because the server has changed where it serves
the API endpoint from, it would then re-query the directory in order to
reconstruct the URL. This would provide the benefits of both designs, with
the benefit of inducing lower load on the ACME server, but would require a
somewhat more complex client design.

## Acknowledgements

This document draws heavily from an internal write-up of the issue by Jacob
Hoffman-Andrews.