Re: [Sidrops] weak validation is unfit for production (Was: Reason for Outage report)

Tim Bruijnzeels <tim@nlnetlabs.nl> Mon, 31 August 2020 08:47 UTC

Return-Path: <tim@nlnetlabs.nl>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 27BCC3A1142 for <sidrops@ietfa.amsl.com>; Mon, 31 Aug 2020 01:47:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nlnetlabs.nl
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6FURNtAr5CW3 for <sidrops@ietfa.amsl.com>; Mon, 31 Aug 2020 01:47:43 -0700 (PDT)
Received: from dicht.nlnetlabs.nl (dicht.nlnetlabs.nl [IPv6:2a04:b900::1:0:0:10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 55C0B3A115D for <sidrops@ietf.org>; Mon, 31 Aug 2020 01:47:43 -0700 (PDT)
Received: from [IPv6:2001:981:4b52:1:3078:6343:6a85:9a0d] (unknown [IPv6:2001:981:4b52:1:3078:6343:6a85:9a0d]) by dicht.nlnetlabs.nl (Postfix) with ESMTPSA id 90FAE1E289; Mon, 31 Aug 2020 10:47:39 +0200 (CEST)
Authentication-Results: dicht.nlnetlabs.nl; dmarc=fail (p=none dis=none) header.from=nlnetlabs.nl
Authentication-Results: dicht.nlnetlabs.nl; spf=fail smtp.mailfrom=tim@nlnetlabs.nl
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nlnetlabs.nl; s=default; t=1598863659; bh=MdXQ5KlaWaKdM11tcBnYEsFKXvAoUWaTJbvCWUJ9I2M=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=h9ESeoYvXbwrua+7dJspIdkQX5gpNxNK+WEuIIRFUbHZGmLsLfqDbUnZdX979tzva IEuPP0YniYeiAk5pI5bnbcsGAJmMKyB+8mju9Fxr5pySM37LXKcSSa9+1mjUhVQegX +LZirL8YhDdBEDqI9t500AdUVu8CG77eVSu9I17k=
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
From: Tim Bruijnzeels <tim@nlnetlabs.nl>
In-Reply-To: <20200829184819.GL88356@bench.sobornost.net>
Date: Mon, 31 Aug 2020 10:47:39 +0200
Cc: Nathalie Trenaman <nathalie@ripe.net>, sidrops@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <5CDFEEF2-C16A-47B7-B520-36A13318BC1A@nlnetlabs.nl>
References: <DE33EFAE-FBD2-478F-92A9-1FBD81CCC43F@arin.net> <727F6FBD-F73C-4F58-AE2D-0276B2A183A3@arin.net> <20200826160001.GF95612@bench.sobornost.net> <20200826202442.232829fc@grisu.home.partim.org> <20200827142827.GC88356@bench.sobornost.net> <60849B88-EC02-4B86-8FF4-2AD7401567B0@ripe.net> <20200829184819.GL88356@bench.sobornost.net>
To: Job Snijders <job@ntt.net>
X-Mailer: Apple Mail (2.3608.120.23.2.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/vh3lXRfeT6pkjBbNatLdROOmXEU>
Subject: Re: [Sidrops] weak validation is unfit for production (Was: Reason for Outage report)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Aug 2020 08:47:53 -0000

Hi,

> On 29 Aug 2020, at 20:48, Job Snijders <job@ntt.net> wrote:
> 
>> This is why I believe Tim’s suggestion for a flag day sounds
>> reasonable to me. We have to inform CAs about the impact changing the
>> behaviour of RPs has on them. 
> 
> A flag day to release a security update? Flag day's generally are used
> for other types of events.

To be clear: I suggested a flag day for encoding issues which are not *critical*. No CA is big enough to fail if there is a critical issue.

Fact is that if you take the position that any object which is not 100% specification compliant MUST be considered invalid, then NONE of the current validators are secure.

The ARIN issue was related to wrong encoding. The relevant RFC is pretty easy to misinterpret. But while there is a mandated encoding here, the error did not change the semantics. Furthermore the signature algorithm is defined by the RPKI RFCs - and the object has a whole could be validated. There never was a chance of different interpretations. There never was a chance of the signature being valid over modified content. So, wrong as this may have been, qualifying this as a critical issue is somewhat questionable. And given that the RFC was not very clear, it was right that RP implementers questioned the issue raised before implementing.

I could make equally bold statements on how routinator with the --strict option is in fact the only secure validator, because all others accept BER (not DER) encoded MFT and ROA objects. BER encoding is more complicated to decode - one issue in particular is that you do not necessarily know the size of compound values up front, and this has led to security issues in other applications. Surely, this is at least as bad, quite possibly worse. In practice, of course, we downloaded the objects and we can check their size before ending up here. So, while it is wrong, I would actually claim that this is not *critical*.

This should not stop us from improving. On the contrary. We can all learn from each other here. It is a very good goal to aim for 100% compliance, but it's going to require work from everyone to get there. This is why I suggested a flag day. Let's just focus our energy on working with each other, shall we?


- RP compliance checking

It is not easy to get to 100% checking in RPs. You can only test for things that you can think of - or things your library developers (openssl, bouncy castle etc) have thought of..

In days long past Rob Austein and Andrew Chi (then BBN) and I (then RIPE NCC) met to discuss validation corner cases. Andrew was able to generate all kinds of broken objects that our library simply could not generate (we could generate objects that we believe to be valid only). So, I was very grateful that he could do this and we could test our validation software. These objects are still used for regression testing in the RIPE NCC validator today. You can find them (100s of corner cases) here:
https://github.com/RIPE-NCC/rpki-commons/tree/master/src/test/resources/conformance

The ARIN encoding issue is not in there. Still, maybe this kind of set up can be useful for RP implementors again. It could be revived, extended, and hopefully be of use for all.


- CA compliance

Most implementations have code bases going back to 2011 or earlier. Most have automated tests in place using rcynic and/or the ripe ncc validator. Nowadays there are many more options available. My advice would be that all CA developers start testing with all RP implementations they can get their hands on, and use 'strict mode' where available.


As for processes.. if we can get RP and CA developers to agree, I think we can move forward on this without the need for an RFC even. It's a matter of organising things for sure. But I believe that other flag days (e.g. DNS) were just co-ordinated between all parties involved.


Kind regards,
Tim