Re: [Sidrops] weak validation is unfit for production (Was: Reason for Outage report)

Job Snijders <job@ntt.net> Sat, 29 August 2020 18:52 UTC

Return-Path: <job@ntt.net>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0C463A0F13 for <sidrops@ietfa.amsl.com>; Sat, 29 Aug 2020 11:52:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AH2CdzKOL7ef for <sidrops@ietfa.amsl.com>; Sat, 29 Aug 2020 11:52:28 -0700 (PDT)
Received: from mail4.sttlwa01.us.to.gin.ntt.net (mail4.sttlwa01.us.to.gin.ntt.net [204.2.238.64]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D0FFC3A0F10 for <sidrops@ietf.org>; Sat, 29 Aug 2020 11:52:28 -0700 (PDT)
Received: from bench.sobornost.net (129-vpn.londen03.uk.bb.gin.ntt.net [165.254.197.129]) by mail4.sttlwa01.us.to.gin.ntt.net (Postfix) with ESMTPSA id 587E7220161; Sat, 29 Aug 2020 18:52:22 +0000 (UTC)
Received: from localhost (bench.sobornost.net [local]) by bench.sobornost.net (OpenSMTPD) with ESMTPA id 6261e9e5; Sat, 29 Aug 2020 18:48:19 +0000 (UTC)
Date: Sat, 29 Aug 2020 18:48:19 +0000
From: Job Snijders <job@ntt.net>
To: Nathalie Trenaman <nathalie@ripe.net>
Cc: sidrops@ietf.org
Message-ID: <20200829184819.GL88356@bench.sobornost.net>
References: <DE33EFAE-FBD2-478F-92A9-1FBD81CCC43F@arin.net> <727F6FBD-F73C-4F58-AE2D-0276B2A183A3@arin.net> <20200826160001.GF95612@bench.sobornost.net> <20200826202442.232829fc@grisu.home.partim.org> <20200827142827.GC88356@bench.sobornost.net> <60849B88-EC02-4B86-8FF4-2AD7401567B0@ripe.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <60849B88-EC02-4B86-8FF4-2AD7401567B0@ripe.net>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/HtNdbUp26RTL8MdxU16J1ju6vLc>
Subject: Re: [Sidrops] weak validation is unfit for production (Was: Reason for Outage report)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 29 Aug 2020 18:52:31 -0000

On Sat, Aug 29, 2020 at 12:22:54PM +0200, Nathalie Trenaman wrote:
> Last month, we released a “strict mode” for our validator. It takes
> into account 6486bis except the use of cached data, which I did not
> see consensus on yet (correct me if I’m wrong please).

I think you are wrong on the cached data consensus, but it doesn't
matter, it is a small implementation detail. If an implementer doesn't
want to take advantage of their local file cache, they don't need to.
RPs are expected to be able to function with empty caches as well.

> Where Job sees around 400 objects missing, we do see a lot more. 
> 
> We have been monitoring the amount of missing objects and we see a
> difference of around 4500 objects. Mainly from the APNIC region.

The difference between rpki-client and ripe ncc validator
'insecure-mode=off / rsync-only' seems to be about 2,500 VRPs. However,
the ripe ncc validator logs 5 only errors: https://rpki.meerval.net/trust-anchors/monitor/2

One of the errors appears to be logged at "2020-08-29 18:08:48" and
indicates: "CRL next update was expected on or before 2020-08-29T22:46:19.000Z"
It is not clear why it errors on things still in the future. The 2,500
VRP difference might be caused by a software issue rather than a
specifications issue?

For example, I can't find a reason why a VRP for Prefix:
203.163.202.0/23 Maxlength: 24, Origin: AS10085, TA: APNIC is not
emitted by the ripe ncc validator. It is also curious to see RRDP
fetches are performed while 'rpki.validator.rsync-only=true' is set.

Are you sure the current version implemented things correctly?

> This is why I believe Tim’s suggestion for a flag day sounds
> reasonable to me. We have to inform CAs about the impact changing the
> behaviour of RPs has on them. 

A flag day to release a security update? Flag day's generally are used
for other types of events.

Regards,

Job