Re: [Sidrops] weak validation is unfit for production (Was: Reason for Outage report)

Nathalie Trenaman <> Sat, 29 August 2020 10:22 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id B3B873A12CF for <>; Sat, 29 Aug 2020 03:22:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.397
X-Spam-Status: No, score=-4.397 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id scGQk6tsTShv for <>; Sat, 29 Aug 2020 03:22:57 -0700 (PDT)
Received: from ( [IPv6:2001:67c:2e8:11::c100:1371]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1C6723A12CE for <>; Sat, 29 Aug 2020 03:22:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=s1-ripe-net; h=To:Cc:Date:Subject:Mime-Version:Content-Type:Message-Id: From; bh=t7NCcX5eG8EcpXdxZiGhKakpK04E3i+Zn00UBImjKjU=; b=HtWucMMyh5Jj6IKvfyNh f+rFO/6JPzKpriARwtU5b1X4r4zWDveXEanG8laflIj6/hM6Cnv9xRR6ZS/wKgbIfud+FA2GvwCzm WiKKuQGir5BU6N0tTmjbPRDy5Ro0uSMMidF7+bVLSS4hRA28dEsr3CHx0shLYJmSmfxriA0Aum6iA 7J99AMnLwTcb5uTNCevM7ZseNefZXprJSK8PXp7+QkE64/i8ccHX7SuPs1NVqgyxLCXwr2aK+IStq 1n5rAiMimfMdBzkOqZommC8jPaitVcgyGGwpaD8sTnJpfg8i1/QdsiG+ooVctHCuCavUZogzCETcZ ELWDh9A2M/slqA==;
Received: from ([]) by with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from <>) id 1kBy0d-0005xw-Fw; Sat, 29 Aug 2020 12:22:55 +0200
Received: from ([2001:67c:2e8:9::c100:14e6] helo=[IPv6:2001:67c:2e8:1200::3ec]) by with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from <>) id 1kBy0d-0005OF-An; Sat, 29 Aug 2020 12:22:55 +0200
From: Nathalie Trenaman <>
Message-Id: <>
Content-Type: multipart/alternative; boundary="Apple-Mail=_9496009B-383C-43CA-AD04-D557A9925626"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.\))
Date: Sat, 29 Aug 2020 12:22:54 +0200
In-Reply-To: <>
To: Job Snijders <>
References: <> <> <> <> <>
X-Mailer: Apple Mail (2.3608.
X-ACL-Warn: Delaying message
X-RIPE-Signature: b23882c8c47abee4cf35af21618ca92a6d61947082153b9bc04ce371fdf8724e
Archived-At: <>
Subject: Re: [Sidrops] weak validation is unfit for production (Was: Reason for Outage report)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 29 Aug 2020 10:23:00 -0000

Dear all,

Last month, we released a “strict mode” for our validator. It takes into account 6486bis except the use of cached data, which I did not see consensus on yet (correct me if I’m wrong please). 
The reason why we did not release strict mode as default is because we first needed to have a clear picture what it was we were actually losing in terms of objects and the impact this would cause.
Let me put it very clear that none of us at RIPE NCC are objecting against any consensus. 

Where Job sees around 400 objects missing, we do see a lot more. Here you can see an instance of our validator in the default mode: <>
While here you can see an instance running in strict mode: <>

We have been monitoring the amount of missing objects and we see a difference of around 4500 objects. Mainly from the APNIC region. 
This is why I believe Tim’s suggestion for a flag day sounds reasonable to me. We have to inform CAs about the impact changing the behaviour of RPs has on them. 

Furthermore, I would really appreciate it if we can keep the discussion constructive. 

Nathalie Trenaman

> Op 27 aug. 2020, om 16:28 heeft Job Snijders <> het volgende geschreven:
> Dear all,
> It pains me to write this email. It appears there is an increasingly
> acrimonious situation in which RIPE NCC, Cloudflare, and NLNetLabs
> representatives not only produce and publish insecure software, but also
> argue towards erosion of the robustness of the object security RPKI
> depends on.
> I'm drawing harsh conclusions: the reality is that we are now 6 months
> into /what should've been/ a simple bug report, but turned into a trench
> war. Folks are digging in their heels deeper.  Attempts in this group to
> bridge the knowledge gap have failed so far.
> On Wed, Aug 26, 2020 at 08:24:42PM +0200, Martin Hoffmann wrote:
>> Job Snijders wrote:
>>> The current versions of routinator and ripe ncc's validator have weak
>>> (lacking) support for manifest handling, there are other issues in
>>> both softwares that don't yield errors where they should yield errors
>>> related to manifest handling. Neither implementation handles
>>> manifests correctly at the moment, so neither software currently can
>>> be used to confirm the correct publication of manifest related
>>> data. :-(
>> To the best of my knowledge, Routinator and the RIPE NCC RPKI
>> Validator handle manifests according to the specifications laid out in
>> the relevant standards track IETF documents. 
> The implementers of RIPE NCC's validator, Routinator, and OctoRPKI
> entirely missed the point of WHY RPKI Manifests exist at all. The bigger
> picture is ignored, one can't look at normative terms in a vacuum.
> I quote from the INTRODUCTION of RFC6486:
>    "A manifest is intended to allow an RP to detect unauthorized object
>    removal or the substitution of stale versions of objects at a
>    publication point."
> A Manifest makes it possible for a validator software to react sanely
> when data tampering is detected. Manifests exist to *protect* both the
> issuing CA and the RP, failure to acknowledge the purpose of manifests
> is akin to the famous quote "the operation was successful - but the
> patient died". Did any CA ever wish for an incomplete view of their
> routing intentions to be transformed into routing decisions? Zero CAs
> want this.
> One has to look further than the normative terms, one has to realize
> what the implications are to routing in the global system and inevitably
> the conclusion is to err on the side of caution. To be cynical about
> what data is provided via an untrusted network input channel. Why
> implement a virus scanner, which can detect virus files, but
> subsequently doesn't do anything about it?
> Manifests are the *only* mechanism to verify a publication point's
> completeness and integrity. Neither Routinator nor RIPE NCC's software
> attach any consequence to integrity issues at a publication point. Both
> continue to emit as many VRPs as possible, regardless of whether the
> publication point is complete to begin with! 
> The datastructure of Route Origin Authorizations (ROAs) allows only a
> single origin ASN per .roa file, this means network operators who wish
> to grant permission to multiple ASNs (a common example: their own and
> their customers' ASNs) to originate parts of their IP space, they *have*
> the create multiple .roa files. The IP Block owner's routing intentions
> can only be considered when the full bundle of .roa files is available.
> Logically, when some .roa files are missing (which according to a valid
> current manifest must be present), the remaining .roa files at the
> publication point become useless as they represent an *incomplete*
> overview of routing intentions; even worse those files flip from
> 'useless' to 'dangerous' when they are injected as VRPs into the
> operator's routing system.
> Manifests are analogous to to Debian's "Release + Release.gpg" APT
> archive concepts. APT (or yum/dnf) do *not* proceed to install packages
> when critical dependencies are missing, or when the SIGNED checksums do
> not match the checksum of the downloaded .deb file.  An administrator
> has to *explicitly* override (-y --force) to install such packages when
> dependencies or checksums don't match.
> Let me demonstrate what happens when I cherry-pick just a few words you
> wrote, and withhold some of your other words. You wrote this email:
> *** start of modified email ***
>    On Wed, Aug 26, 2020 at 08:24:42PM +0200, Martin Hoffmann wrote:
>> Routinator and the RIPE NCC RPKI Validator have issues.
> *** end of modified email ***
> Do you see the issue now? I didn't even change the order of your words,
> I merely withheld some of the text you wrote, and the resulting text is
> entirely contradictory to what you intended to write!
> Let's be honest, neither RIPE NCC nor NLNetLabs have real experience
> using RPKI ROV 'invalid == reject' in their own networks. RIPE NCC so
> far has refused to implement ROV in AS 3333 out of fear, and NLNetLab's
> own ASN is a simple single-homed stub network. Why are both
> organisations ignoring the community's pleas to fix a security issue?
> Why the hubris? Do you really think you know better? Why does Alexander
> Band say that fixing this is "not a priority", why is RIPE NCC refusing
> to commit a one-line patch to fix their validator?
> Is loss of face the issue? The longer the delay to provide a fix, the
> longer NLNetLabs and RIPE NCC keep hurting their users (and dependents).
> Is this what one calls 'good for the Internet'? The issue was brought to
> attention MONTHS [1] ago, it should've been a few days to get it patched.
>> Given that this topic is currently discussed in this very working
>> group and there wasn’t outright consensus on how software should behave
>> in these cases, it seems only prudent to delay modifications until
>> after such consensus has been achieved.
> The only ones arguing against the consensus are RIPE NCC and NLNetLabs
> employees. Go figure. Staff and knowledge were exchanged between the two
> software houses, a path is visible how the misconceptions continued to
> proliferate. It is not too late to change course, but catch-up is
> needed.
> Believe it not, RIPE NCC, Cloudflare, and NLNetLabs are now at an
> existential crisis: your credibility is on the line. Are you going to
> produce routing security software which actually improves security, or
> not? Will you attempt to absorb decades of PKI and X.509 experience, or
> throw it all in the wind? 
> Currently routinator + ripe ncc's validator + octorpki set their users
> up for failure. Operators using these softwares ARE AT NEEDLESS RISK. 
> Regards,
> Job
> [1]:
> _______________________________________________
> Sidrops mailing list