[Sidrops] Internally inconsistent RRDP publication (Was: another TA oopsie)
Job Snijders <job@fastly.com> Mon, 02 January 2023 16:11 UTC
Return-Path: <job@fastly.com>
X-Original-To: sidrops@ietfa.amsl.com
Delivered-To: sidrops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C1058C1522B6 for <sidrops@ietfa.amsl.com>; Mon, 2 Jan 2023 08:11:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Level:
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=fastly.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ATEa6KoCkewX for <sidrops@ietfa.amsl.com>; Mon, 2 Jan 2023 08:11:48 -0800 (PST)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6A8E0C1522C6 for <sidrops@ietf.org>; Mon, 2 Jan 2023 08:11:48 -0800 (PST)
Received: by mail-ed1-x52a.google.com with SMTP id u18so38999576eda.9 for <sidrops@ietf.org>; Mon, 02 Jan 2023 08:11:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastly.com; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=A3b5iN6OFB5wmlAI5RUCxiS4/Wz1qhc4eXqfKoiaPTg=; b=qc7L41Xgjy6wterWI6TOIu7h9I5PAnil6g/DNoKOREa7q47tnEm1Yd0i4PxZ7XnoDe IyzcztTtL6hmrjKnbA+IltEHTmpbvwLIoaf+MFEq6Grp65lYaeZPC63YmsZN/CcTB5sW fkTT5ez9KUK49cyBw7fswMiEnoBsZNeF5EZRU=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=A3b5iN6OFB5wmlAI5RUCxiS4/Wz1qhc4eXqfKoiaPTg=; b=MRgJcliILjsxm3jP+WTqu9IYeZroP9QRW/237bouWv8xth4VI/cz+CuCbd2e0VMotb 5iJCld4jT5qRfRfwczRoU8gaL+wKYhdM4bzsvo55vS1wjtmyiWn6UjJUC6bdjuoBwNFP cw1y8t8mhuvVtnhwL8up51Up4PwUHCA9/PHTbWF/3MSV2CF09KiDZ+xvO8uGsuTks84e DzZPBCB3cWO9xBTBzucf8ycZINayR6QYavOqQbIIKf83BjjeTxusu/sVmcZvNkumByv/ SWoN+O5MjRbQZutbNlDIW7vDk1TEnMgYsMtYK9A8Wl7yvNkpIIwa2JKUZitgLO9ocSUM CnBA==
X-Gm-Message-State: AFqh2kptH8zXT+RUn68mP5+tpq64QXJ/VixM1alC+Q1jg54IHbwp4ES2 8OsUyRavfByc1as9FtSij+saInZbvxNCixVZ4mXTWTo3JwFi16tQjSymib0QBTzYTZqnwDsV4HJ sO40Xg/+vDI9Z8U4m+IfHekRx8tOJDQICs+B8GXkiNNzqJC8FtZqsQg6NQg==
X-Google-Smtp-Source: AMrXdXu8CJXqncEETl8hlU39BC0GqL3Ac+uAo09l1AlVqIlZ9MYg6yQx2lc86Q4aOqVHPxnyIq558w==
X-Received: by 2002:a05:6402:b8f:b0:46b:aedf:f32b with SMTP id cf15-20020a0564020b8f00b0046baedff32bmr41953651edb.4.1672675905876; Mon, 02 Jan 2023 08:11:45 -0800 (PST)
Received: from snel ([2a10:3781:276:1:16f6:d8ff:fe47:2eb7]) by smtp.gmail.com with ESMTPSA id ev26-20020a056402541a00b004815f3b32a6sm12108273edb.70.2023.01.02.08.11.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Jan 2023 08:11:45 -0800 (PST)
Date: Mon, 02 Jan 2023 17:11:43 +0100
From: Job Snijders <job@fastly.com>
To: SIDR Operations WG <sidrops@ietf.org>, African Network Operators <afnog@afnog.org>
Message-ID: <Y7MCP20lO2P06V6O@snel>
References: <m28rimrkft.wl-randy@psg.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <m28rimrkft.wl-randy@psg.com>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/sidrops/cDj_Y_CAw1dGOJZ9oxjcNKlbdTc>
Subject: [Sidrops] Internally inconsistent RRDP publication (Was: another TA oopsie)
X-BeenThere: sidrops@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: A list for the SIDR Operations WG <sidrops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sidrops>, <mailto:sidrops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/sidrops/>
List-Post: <mailto:sidrops@ietf.org>
List-Help: <mailto:sidrops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sidrops>, <mailto:sidrops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jan 2023 16:11:52 -0000
Dear all, I took a look at what might have transpired. It appears there was an internally-inconsistent RRDP publication. Similar to the RSYNC protocol, the RRDP protocol does not offer any assurances about internal consistency. In this message I offer a step-by-step explanation and at the end of the email I theorize on how this could've happened. Impact: ======= The problem revolves around a 'top level' manifest [1] which contained references to files which were not yet available via RRDP. The K1eJenypZMPIt_e92qek2jSpj4A.mft manifest referencing non-existing files negatively impacts about 77.33% of ROAs subordinate to the Afrinic trust anchor. Depending on the RRDP refetch timers of a validator, the impact may have lasted anywhere between 1 and 60 minutes. This impacted all RFC-compliant validators, the event was 'timing dependent' rather than 'implementation dependent': connecting at the wrong time caused problems. Step by step replay: ==================== A validator fetching Afrinic's RRDP Notification file at 2023-01-01T03:21:51Z, might have fetched a notification XML file which contained a listing of deltas up until serial 58617 (in the RRDP session ID 11218e02-4ae9-4c95-a8fa-49df27f15272). https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58616/snapshot.xml https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58617/delta.xml The SHA256 hash of "K1eJenypZMPIt_e92qek2jSpj4A.mft" at serial 58616 was 435c65e0f7bc43eaea3234b3ad08b849735c1899c8e218ff2395d37cad720493, and the manifestNumber was 13F1. At rrdp_serial 58616 / K1eJenypZMPIt_e92qek2jSpj4A.mft / manifestNumber 13F1, the listed SHA256 hash of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" was 1768a7544c15081ddcd358a78b915a7221f3aee6cebb196a743b89a834364ca4. And indeed, if one downloads the above mentioned "snapshot.xml" file and unpacks the RRDP XML one will find a file by that name which matches that digest. The state at RRDP serial 58616 was internally consistent. Now, let's unpack the RRDP Delta which would bring the RRDP session to 58617, the delta file contains 4 <publish/> elements: 58617 rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.crl (a4f73c2009f4095970f0f7cb4bb938eb03ff71e35925cd8bca39a64330f935c1 replaces 502d94adf603c4451a912828dfe9d7a46ebf45ec20f901381618fc71323da927) 58617 rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.mft (e745ccf5741fbe65c2e2b78a74ba3be4a82c9fd5330544e16332e725861f66e5 replaces 435c65e0f7bc43eaea3234b3ad08b849735c1899c8e218ff2395d37cad720493) 58617 rpki.afrinic.net/repository/member_repository/F36D8ADD/99DB6EFC6AC711EBB90AF548F8AEA228/JrOnWLLY0r61xvaBylvZJYx593c.crl (331a8991ca11ccd9bbf30e89e8e35d3b6ee0a18c23cca1289dfcf07bdee3d05f replaces 5a7399b06a692dd76e3b94fa52112c12f483db1499e5c899ff27b57952e48635) 58617 rpki.afrinic.net/repository/member_repository/F36D8ADD/99DB6EFC6AC711EBB90AF548F8AEA228/JrOnWLLY0r61xvaBylvZJYx593c.mft (cf22f16de6695f8509a6590f710778cc61a1bbdf1c11ae150dcfff1910032cae replaces 29219ecb0f79922d6f1e5d4b3d4305333d32f33720cf13ae17d84dd2fcdf2ff0) Let's focus on K1eJenypZMPIt_e92qek2jSpj4A.mft. The econtent of the manifest files whose SHA256 digests are 435c65e0f7bc43eaea3234b3ad08b849735c1899c8e218ff2395d37cad720493 and e745ccf5741fbe65c2e2b78a74ba3be4a82c9fd5330544e16332e725861f66e5 decode as following: K1eJenypZMPIt_e92qek2jSpj4A.mft @ 13F1: https://sobornost.net/~job/manifest-13F1.txt K1eJenypZMPIt_e92qek2jSpj4A.mft @ 13F2: https://sobornost.net/~job/manifest-13F2.txt Thus, we conclude: RRDP serial 58616 contained a manifest with number 13F1 RRDP serial 58617 contained a manifest with number 13F2 Both 13F1 and 13F2 are signed by the proper keys, but manifestNumber 13F2 is higher than 13F1; thus 13F2 is the manifest that must be used. Manifest 13F2 references a new version of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" by hash 8aa55347427b75faa64fdfd212ca013957f785e18ce887bbe56d0ae20552e66c, however, at RRDP serial 58617 the delta XML does *NOT* contain any new version of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer"! In fact, an update for "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" only became visible at a later point in time: at RRDP serial 58618. Looking at https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58618/delta.xml we finally see a version of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" which matches the hash on the manifest that was published at serial 58617. In other words, AFRINIC published a RRDP delta (and snapshot) which were cryptographically valid, but internally inconsistent. Researchers can see this themselves if they analyse: https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58617/snapshot.xml The version of "K1eJenypZMPIt_e92qek2jSpj4A.mft" inside the 58617 snapshot points to "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" expecting a file with sha256 message digest 8aa55347427b75faa64fdfd212ca013957f785e18ce887bbe56d0ae20552e66c but the hash of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" actually is 1768a7544c15081ddcd358a78b915a7221f3aee6cebb196a743b89a834364ca4 As per RFC 9286 - the above scenario is considered a "publisher error" or a "substitution attack" (RPs can't know the difference between publisher errors and attacks); the RP is expected to proceed with the process described in Section 6.6 of RFC 9286. serial 58616 was good serial 58617 was bad serial 58618 was good While the issue was 'rectified' in the next publication, any clients that latched on to 58617 might take between 1 and 60 minutes to return for new data; completely unaware that the contents of the 58617 update were cryptographically valid, but logically mostly broken. How can this happen? ==================== This type of internal inconsistency could arise from deployment scenarios in which the RRDP XML files are synthesized from a bare directory on the filesystem - without additional context about internal consistency (e.g. when exactly the Signer software has written a coherent state to the filesystem, and it is safe to transform the files into RRDP). Software like https://github.com/NLnetLabs/rrdpit inherently is unaware whether the Signer software has finished writing to the filesystem (or still is 'half way' in the writing process). This means that a tool like "rrdpit" MUST only be invoked when the signer software is completly finished. Generating RRDP XML files while the Signer software still is 'half way' done writing; can result in accidentally smearing out what should've been the contents of a single RRDP XML Delta file, across multiple RRDP delta files. Why am I suspecting that a tool like "rrpdit" is used? ====================================================== The AfriNIC RRDP snapshots contain unexpected files, such as "rsync://rpki.afrinic.net/repository/AfriNIC-simple.tal"; the signer implementations I am aware of would not include .tal files in the RRDP feed. This leads me to believe that a non-atomic/fragile-to-inconsistency process is used to convert a (rsync?) directory to RRDP files. Is "rrdpit" bad? ================ No. It is a very useful utility (I myself have used in it various lab tests), but needs to be handled with care: the utility is not aware of internal inconsistencies and cannot compensate for internal inconsistencies. The "rrdpit" utility is not appropriate for all deployment scenarios: it probably is best to use the native RRDP functionality of a Signer! How to avoid this? ================== If AFRINIC is using the "rpki.net" (or a derivative) signer software, they might benefit most using the embedded RRDP functionality of the "rpki.net" software stack. If AfriNIC does not want to expose a webserver on the signer machine itself, they can simply rsync the ready-made RRDP XML files (produced by "rpki.net") to a webserver; (this approach contrasts with rsyncing the rsync files and using "rrdpit" - or equivalent tooling). Conclusion ========== For a brief period of time AFRINIC published a set of RRDP files that led to an inconsistent stage, resulting in the temporary loss of 77% of ROAs. As I don't know the internals of AFRINIC's setup, so the above could all be a fitting - but wrong - theory. I am speculating with the public information available to me. I'm available for any questions, or to advise on this matter and review the current process workflow. Kind regards, Job [1]: https://console.rpki-client.org/rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.mft.html On Sat, Dec 31, 2022 at 07:40:54PM -0800, Randy Bush wrote: > From: PacketVis <notifications@packetvis.com> > Subject: bgp ta-malfunction - low severity - PacketVis > > Possible TA malfunction: 77.33% of the ROAs disappeared from AFRINIC. > > See more details about the event: > https://packetvis.com/#/bgp/event/2a35a5824772ae3b651293ec5d9b6367-37572a3c-b445-4075-9741-a419b516ca36/6d742c0ae811df9c41ab427a8ac09e07a93388c7 > > _______________________________________________ > Sidrops mailing list > Sidrops@ietf.org > https://www.ietf.org/mailman/listinfo/sidrops
- [Sidrops] another TA oopsie Randy Bush
- Re: [Sidrops] [afnog] another TA oopsie Cedrick Adrien Mbeyet
- Re: [Sidrops] [afnog] another TA oopsie Randy Bush
- Re: [Sidrops] [afnog] another TA oopsie Cedrick Adrien Mbeyet
- [Sidrops] Internally inconsistent RRDP publicatio… Job Snijders
- Re: [Sidrops] [afnog] Internally inconsistent RRD… Cedrick Adrien Mbeyet