[Sidrops] weak validation is unfit for production (Was: Reason for Outage report)
Job Snijders <job@ntt.net> Thu, 27 August 2020 14:28 UTC
Return-Path: <job@ntt.net>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E96D23A0C17 for <sidrops@ietfa.amsl.com>; Thu, 27 Aug 2020 07:28:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n9kxsZufpJ5S for <sidrops@ietfa.amsl.com>; Thu, 27 Aug 2020 07:28:31 -0700 (PDT)
Received: from mail4.sttlwa01.us.to.gin.ntt.net (mail4.sttlwa01.us.to.gin.ntt.net [204.2.238.64]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 330DF3A0C16 for <sidrops@ietf.org>; Thu, 27 Aug 2020 07:28:31 -0700 (PDT)
Received: from bench.sobornost.net (233-vpn.londen03.uk.bb.gin.ntt.net [165.254.197.233]) by mail4.sttlwa01.us.to.gin.ntt.net (Postfix) with ESMTPSA id CDED2220136; Thu, 27 Aug 2020 14:28:29 +0000 (UTC)
Received: from localhost (bench.sobornost.net [local]) by bench.sobornost.net (OpenSMTPD) with ESMTPA id ff14e15e; Thu, 27 Aug 2020 14:28:28 +0000 (UTC)
Date: Thu, 27 Aug 2020 14:28:27 +0000
From: Job Snijders <job@ntt.net>
To: sidrops@ietf.org
Message-ID: <20200827142827.GC88356@bench.sobornost.net>
References: <DE33EFAE-FBD2-478F-92A9-1FBD81CCC43F@arin.net> <727F6FBD-F73C-4F58-AE2D-0276B2A183A3@arin.net> <20200826160001.GF95612@bench.sobornost.net> <20200826202442.232829fc@grisu.home.partim.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20200826202442.232829fc@grisu.home.partim.org>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/cpIS9s33ZDVQahxP2nXaifzOPoo>
Subject: [Sidrops] weak validation is unfit for production (Was: Reason for Outage report)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Aug 2020 14:28:33 -0000
Dear all, It pains me to write this email. It appears there is an increasingly acrimonious situation in which RIPE NCC, Cloudflare, and NLNetLabs representatives not only produce and publish insecure software, but also argue towards erosion of the robustness of the object security RPKI depends on. I'm drawing harsh conclusions: the reality is that we are now 6 months into /what should've been/ a simple bug report, but turned into a trench war. Folks are digging in their heels deeper. Attempts in this group to bridge the knowledge gap have failed so far. On Wed, Aug 26, 2020 at 08:24:42PM +0200, Martin Hoffmann wrote: > Job Snijders wrote: > > The current versions of routinator and ripe ncc's validator have weak > > (lacking) support for manifest handling, there are other issues in > > both softwares that don't yield errors where they should yield errors > > related to manifest handling. Neither implementation handles > > manifests correctly at the moment, so neither software currently can > > be used to confirm the correct publication of manifest related > > data. :-( > > To the best of my knowledge, Routinator and the RIPE NCC RPKI > Validator handle manifests according to the specifications laid out in > the relevant standards track IETF documents. The implementers of RIPE NCC's validator, Routinator, and OctoRPKI entirely missed the point of WHY RPKI Manifests exist at all. The bigger picture is ignored, one can't look at normative terms in a vacuum. I quote from the INTRODUCTION of RFC6486: "A manifest is intended to allow an RP to detect unauthorized object removal or the substitution of stale versions of objects at a publication point." A Manifest makes it possible for a validator software to react sanely when data tampering is detected. Manifests exist to *protect* both the issuing CA and the RP, failure to acknowledge the purpose of manifests is akin to the famous quote "the operation was successful - but the patient died". Did any CA ever wish for an incomplete view of their routing intentions to be transformed into routing decisions? Zero CAs want this. One has to look further than the normative terms, one has to realize what the implications are to routing in the global system and inevitably the conclusion is to err on the side of caution. To be cynical about what data is provided via an untrusted network input channel. Why implement a virus scanner, which can detect virus files, but subsequently doesn't do anything about it? Manifests are the *only* mechanism to verify a publication point's completeness and integrity. Neither Routinator nor RIPE NCC's software attach any consequence to integrity issues at a publication point. Both continue to emit as many VRPs as possible, regardless of whether the publication point is complete to begin with! The datastructure of Route Origin Authorizations (ROAs) allows only a single origin ASN per .roa file, this means network operators who wish to grant permission to multiple ASNs (a common example: their own and their customers' ASNs) to originate parts of their IP space, they *have* the create multiple .roa files. The IP Block owner's routing intentions can only be considered when the full bundle of .roa files is available. Logically, when some .roa files are missing (which according to a valid current manifest must be present), the remaining .roa files at the publication point become useless as they represent an *incomplete* overview of routing intentions; even worse those files flip from 'useless' to 'dangerous' when they are injected as VRPs into the operator's routing system. Manifests are analogous to to Debian's "Release + Release.gpg" APT archive concepts. APT (or yum/dnf) do *not* proceed to install packages when critical dependencies are missing, or when the SIGNED checksums do not match the checksum of the downloaded .deb file. An administrator has to *explicitly* override (-y --force) to install such packages when dependencies or checksums don't match. Let me demonstrate what happens when I cherry-pick just a few words you wrote, and withhold some of your other words. You wrote this email: https://mailarchive.ietf.org/arch/msg/sidrops/7JxOCNBvYbwDHL7hcPHsfvxto0Q/ *** start of modified email *** On Wed, Aug 26, 2020 at 08:24:42PM +0200, Martin Hoffmann wrote: > Routinator and the RIPE NCC RPKI Validator have issues. *** end of modified email *** Do you see the issue now? I didn't even change the order of your words, I merely withheld some of the text you wrote, and the resulting text is entirely contradictory to what you intended to write! Let's be honest, neither RIPE NCC nor NLNetLabs have real experience using RPKI ROV 'invalid == reject' in their own networks. RIPE NCC so far has refused to implement ROV in AS 3333 out of fear, and NLNetLab's own ASN is a simple single-homed stub network. Why are both organisations ignoring the community's pleas to fix a security issue? Why the hubris? Do you really think you know better? Why does Alexander Band say that fixing this is "not a priority", why is RIPE NCC refusing to commit a one-line patch to fix their validator? Is loss of face the issue? The longer the delay to provide a fix, the longer NLNetLabs and RIPE NCC keep hurting their users (and dependents). Is this what one calls 'good for the Internet'? The issue was brought to attention MONTHS [1] ago, it should've been a few days to get it patched. > Given that this topic is currently discussed in this very working > group and there wasn’t outright consensus on how software should behave > in these cases, it seems only prudent to delay modifications until > after such consensus has been achieved. The only ones arguing against the consensus are RIPE NCC and NLNetLabs employees. Go figure. Staff and knowledge were exchanged between the two software houses, a path is visible how the misconceptions continued to proliferate. It is not too late to change course, but catch-up is needed. Believe it not, RIPE NCC, Cloudflare, and NLNetLabs are now at an existential crisis: your credibility is on the line. Are you going to produce routing security software which actually improves security, or not? Will you attempt to absorb decades of PKI and X.509 experience, or throw it all in the wind? Currently routinator + ripe ncc's validator + octorpki set their users up for failure. Operators using these softwares ARE AT NEEDLESS RISK. Regards, Job [1]: https://github.com/NLnetLabs/routinator/issues/319 https://github.com/RIPE-NCC/rpki-validator-3/issues/232 https://github.com/RIPE-NCC/rpki-validator-3/issues/158 https://github.com/cloudflare/cfrpki/issues/38
- [Sidrops] Reason for Outage report (was: Re: ARIN… John Curran
- [Sidrops] ARIN RPKI Service Impact - 12 August 20… John Curran
- Re: [Sidrops] ARIN RPKI Service Impact - 12 Augus… Christopher Morrow
- Re: [Sidrops] ARIN RPKI Service Impact - 12 Augus… John Curran
- Re: [Sidrops] ARIN RPKI Service Impact - 12 Augus… Randy Bush
- Re: [Sidrops] ARIN RPKI Service Impact - 12 Augus… Job Snijders
- Re: [Sidrops] ARIN RPKI Service Impact - 12 Augus… John Curran
- Re: [Sidrops] Reason for Outage report (was: Re: … Job Snijders
- Re: [Sidrops] Reason for Outage report (was: Re: … Martin Hoffmann
- Re: [Sidrops] Reason for Outage report (was: Re: … Mikael Abrahamsson
- Re: [Sidrops] Reason for Outage report (was: Re: … John Curran
- Re: [Sidrops] Reason for Outage report Martin Hoffmann
- Re: [Sidrops] Reason for Outage report (was: Re: … Mikael Abrahamsson
- Re: [Sidrops] Reason for Outage report Mikael Abrahamsson
- [Sidrops] weak validation is unfit for production… Job Snijders
- Re: [Sidrops] Reason for Outage report (was: Re: … Tim Bruijnzeels
- Re: [Sidrops] Reason for Outage report (was: Re: … Jakob Heitz (jheitz)
- Re: [Sidrops] Reason for Outage report (was: Re: … Randy Bush
- Re: [Sidrops] weak validation is unfit for produc… Benno Overeinder
- Re: [Sidrops] weak validation is unfit for produc… Tim Bruijnzeels
- Re: [Sidrops] Reason for Outage report (was: Re: … Tim Bruijnzeels
- Re: [Sidrops] Reason for Outage report (was: Re: … Randy Bush
- Re: [Sidrops] Reason for Outage report (was: Re: … Tim Bruijnzeels
- Re: [Sidrops] Reason for Outage report (was: Re: … Tim Bruijnzeels
- Re: [Sidrops] weak validation is unfit for produc… Stephen Kent
- Re: [Sidrops] weak validation is unfit for produc… Stephen Kent
- Re: [Sidrops] Reason for Outage report (was: Re: … Job Snijders
- Re: [Sidrops] weak validation is unfit for produc… Tim Bruijnzeels
- Re: [Sidrops] Reason for Outage report (was: Re: … Randy Bush
- Re: [Sidrops] weak validation is unfit for produc… Job Snijders
- Re: [Sidrops] weak validation is unfit for produc… Lukas Tribus
- Re: [Sidrops] weak validation is unfit for produc… Nathalie Trenaman
- Re: [Sidrops] weak validation is unfit for produc… Job Snijders
- Re: [Sidrops] weak validation is unfit for produc… Stephen Kent
- Re: [Sidrops] weak validation is unfit for produc… Tim Bruijnzeels