Re: [Sidrops] Reason for Outage report

Martin Hoffmann <martin@opennetlabs.com> Thu, 27 August 2020 13:40 UTC

Return-Path: <martin@opennetlabs.com>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6401D3A097E for <sidrops@ietfa.amsl.com>; Thu, 27 Aug 2020 06:40:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qTGOG4q6jodZ for <sidrops@ietfa.amsl.com>; Thu, 27 Aug 2020 06:40:50 -0700 (PDT)
Received: from dicht.nlnetlabs.nl (dicht.nlnetlabs.nl [IPv6:2a04:b900::1:0:0:10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 330093A097C for <sidrops@ietf.org>; Thu, 27 Aug 2020 06:40:49 -0700 (PDT)
Received: from glaurung.nlnetlabs.nl (82-197-214-124.dsl.cambrium.nl [82.197.214.124]) by dicht.nlnetlabs.nl (Postfix) with ESMTPSA id 8C9DF1A645; Thu, 27 Aug 2020 15:40:45 +0200 (CEST)
Authentication-Results: dicht.nlnetlabs.nl; dmarc=none (p=none dis=none) header.from=opennetlabs.com
Authentication-Results: dicht.nlnetlabs.nl; spf=none smtp.mailfrom=martin@opennetlabs.com
Date: Thu, 27 Aug 2020 15:40:45 +0200
From: Martin Hoffmann <martin@opennetlabs.com>
To: Mikael Abrahamsson <swmike=40swm.pp.se@dmarc.ietf.org>
Cc: "sidrops@ietf.org" <sidrops@ietf.org>
Message-ID: <20200827154045.52498a15@glaurung.nlnetlabs.nl>
In-Reply-To: <alpine.DEB.2.20.2008271422560.11025@uplift.swm.pp.se>
References: <DE33EFAE-FBD2-478F-92A9-1FBD81CCC43F@arin.net> <727F6FBD-F73C-4F58-AE2D-0276B2A183A3@arin.net> <20200826160001.GF95612@bench.sobornost.net> <20200826202442.232829fc@grisu.home.partim.org> <alpine.DEB.2.20.2008271422560.11025@uplift.swm.pp.se>
Organization: Open Netlabs
X-Mailer: Claws Mail 3.17.6 (GTK+ 2.24.32; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/ylKZW-JzFu5BZShgAlwiTslAapc>
Subject: Re: [Sidrops] Reason for Outage report
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Aug 2020 13:40:52 -0000

Mikael Abrahamsson wrote:
> 
> At this point in time, everybody I am aware of who are implementing
> RPKI in the routing system are doing invalid=drop, and nothing else.
> 
> The overall goal right now is to make sure the ROA is correctly
> validated, all the way. If a ROA is gone, it doesn't cause an outage.
> It causes lack of protection. It's not a competition which validator
> can validate the most ROAs, the competition should be to get things
> *right*.
> 
> If something doesn't validate correctly, drop it. Drop all it depends
> on. If all ROAs from a RIR are gone for an hour or a day, it's not
> the end of the world. It's not an outage.
> 
> We need to make sure the entire ecosystem gets things right, correct,
> and have procedures in place to do things right, all the time. Other
> parts of the Internet ecosystem had teething problems in the
> beginning, but they worked it out. RPKI ecosystem needs to do the
> same.
> 
> The goal of the validator should be to validate. It should be picky.
> It should throw things away that doesn't look right. You're
> advocating for something else, and I don't understand why.

Two separate things.

First, you will notice that I talked about delaying modifications, not
refusing modifications. I strongly believe that a change in
invalidation strategy should be discussed in the community and the best
possible strategy found. We are talking about security here which is
never easy and the obvious answers are frequently wrong.

For instance: If a ROA is found to be invalid, should all the ROAs
published by the issuing CA and its child CAs be dropped or should not
in fact all statements made for resources owned by the CA be filtered
completely, even those made by unrelated CAs? Only the latter covers
for cases where a route might accidentally become invalid because its
prefix was authorized to two or more CAs.

Further, the current proposal suggests to reuse non-expired cached data
instead. Routinator, for example, does not in fact do that and needs
significant changes to its architecture to make that possible. Do I
want to make those changes before the community agrees that using old
objects is a good idea? Doing so may make it easier to block object
revocation. Do we prefer that over just not having certain objects at
all? I don’t think there is consensus yet.

Second and entirely independent of the first, the matter of the
validity of that ARIN manifest. Some of us have concluded that it
didn’t need rejecting because it was not in fact invalid. While it was
indeed encoded in a way that is not allowed by the respective specs for
encoders, the specs instruct decoders to be more forgiving. Such
forgiveness is a decision an implementer makes. The particular mistake
did not alter the meaning of these fields or endangered the
cryptographic integrity of the certificate as an RPKI resource
certificate, hence the decision to accept such certificates.

This assessment might be different were the certificate used in another
context and thus it may indeed need to be rejected by a general purpose
library, but a library that only and exclusively deals in RPKI resource
certificates does not.

Should it be picky nonetheless and drop anything that looks vaguely
suspicious? Would you be surprised to hear that if we indeed did that,
out of the currently 175,330 VRPs, a mere 3,571 would be left. Even if
we interpreted unclear RFCs in favour of implementers, we still would be
down to 54,258 or about a third.

Would dropping all these VRPs out of caution make the Internet more
secure? It is kind of hard to argue for that. I believe that whether a
security system in itself is more or less secure is an entirely
irrelevant question. It needs to be assessed in terms of whether
it makes the thing it serves more secure. And so need concrete
implementations and all the decisions made along the way.

Kind regards,
Martin