Re: [rfc-i] archiving outlinks in RFCs

tom petch <daedulus@btconnect.com> Wed, 26 April 2023 08:46 UTC

Return-Path: <daedulus@btconnect.com>
X-Original-To: rfc-interest@ietfa.amsl.com
Delivered-To: rfc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0F64CC151540; Wed, 26 Apr 2023 01:46:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.698
X-Spam-Level:
X-Spam-Status: No, score=-1.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, NICE_REPLY_A=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (1024-bit key) reason="fail (body has been altered)" header.d=btconnect.onmicrosoft.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HNdWfP-G94hr; Wed, 26 Apr 2023 01:46:53 -0700 (PDT)
Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on0706.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe0d::706]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E48E8C14CF1C; Wed, 26 Apr 2023 01:46:52 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LC3EBWc0YYrlJBJRataeS0SEoqLXmyyfbPK6PK4jcwqivtBvzCDBPp79plcPEne5F+2TcRk1xSrhoaD/M9bcFWedmERe36Fp++939s7MJl8y6fklgIZy7JgjIJyrNodKlnxkiSJZLbBa4MRf5BuH11r0ntribwUQ9pPHxufMVw9QY8nmV5giNyoc9bD2RJj0+r6QXrLf1w8LPZ058bukXmGCj8vpJAtSB4/vUYXsFUlCftpjXmJ7pPEXuoK//fA+4qKKMEoYSbWQcNN6uY5Kx2HXzRWROg6fpN4ygOFfkOAzfeYC3/WLTPn/xUCoHoYmUP+Wnh7HtUCerIcFCCSi4A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Lh736WxAlfuahXFko0REJUn6zf8ufTQM060LU1aMxrM=; b=BGFvKO66tch53/AM6xM20cQ45lPMsPpqdHNikaWzwqz9LkkK+lm7BjCm4jZUEJUmVS8QkdR6nr+jYqxqtY8sAITT5QBiuLk46IMz/BwsxayCbUSCZ7ue9+vRjT1nEZpY042rGesrWjjLmA1tBPxMAFyMdgLZ1w/15ZZ1k467vRwxbrj2Vql2xbbhbZPCTl1H3uU6bH8HM7vp2PqxnSNySHqtcCxrg3Qc7xiQ47c+DQ+pax9eUswRp8goxdo1VsA47F6ffS2Ri4vc0Qlk7mS/xGanSCet22JodlxYEOHH5pNX+Q4PKLOQo8aItl3x1am7Icac3oaXaj65cCfx54DSbQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=btconnect.com; dmarc=pass action=none header.from=btconnect.com; dkim=pass header.d=btconnect.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=btconnect.onmicrosoft.com; s=selector2-btconnect-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Lh736WxAlfuahXFko0REJUn6zf8ufTQM060LU1aMxrM=; b=MQtpFcvnwd1b06OQ40YwlcbbrZ+qnUf5HkVlSsQ+dc53LZTq0qARTV5ki4KqiiLxg0NpRuUh5yqka8c84BymJZIZ+bYhbFudLWk04bDBkYU/EQO2uwv1kW4fAQFAcJotCZitochDTO+ceISq1aLreSCfdlW+cPtoCxXyjRZngCw=
Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=btconnect.com;
Received: from VI1PR07MB6704.eurprd07.prod.outlook.com (2603:10a6:800:18b::8) by PAWPR07MB9252.eurprd07.prod.outlook.com (2603:10a6:102:2eb::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6319.33; Wed, 26 Apr 2023 08:46:48 +0000
Received: from VI1PR07MB6704.eurprd07.prod.outlook.com ([fe80::68ad:81aa:4d34:c857]) by VI1PR07MB6704.eurprd07.prod.outlook.com ([fe80::68ad:81aa:4d34:c857%7]) with mapi id 15.20.6340.021; Wed, 26 Apr 2023 08:46:48 +0000
To: Alexis Rossi <rsce@rfc-editor.org>, rfc-interest@rfc-editor.org
References: <E024D9AC-2B92-4720-9713-519592D2362B@rfc-editor.org>
From: tom petch <daedulus@btconnect.com>
Message-ID: <6448E47C.9030101@btconnect.com>
Date: Wed, 26 Apr 2023 09:44:44 +0100
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.5.0
In-Reply-To: <E024D9AC-2B92-4720-9713-519592D2362B@rfc-editor.org>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
X-ClientProxiedBy: LO4P265CA0144.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2c4::17) To VI1PR07MB6704.eurprd07.prod.outlook.com (2603:10a6:800:18b::8)
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: VI1PR07MB6704:EE_|PAWPR07MB9252:EE_
X-MS-Office365-Filtering-Correlation-Id: 72a7c78a-3598-4fca-d02b-08db4632c5a8
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: BRiz9xjv2iCIfwrpl0vHhPJWFtJXZbtnsbIPiVppYFYgKeUM3svRt1XMvq+AK6kKldWjNS/0VWodTSstRVcaUSGcJhrwHsARIQbHRxpykm7vyWGbkSaxN3LpF5m1EpYEk4uR/OE3uVJLUuDzNNbe3SEDkqps0fiAbzKZbBeJx57nyK8Pcab53aU/w6s24luKkNpnBSMH6WHwRULsnF2Z+eW7zvLUUcKL/q/pFXkRrbJSyfV2tBiaMeCcNNbnv8mtCn064Rl1yCAgC5uwGegvbWC2ghUgmX5YzGcX4I8nLRIdaBSoX6VKhSYAnbwjuIiZQtiYUUjCQ7qvuk4udng/xz4GzSn28un8Z0hGaH+AmQGVKTjwqlBfbEgaSq36u+EbhKPCwIXAKs/xqkTrCX8ZQBaOmhGtW1XhOAYviLGH0C2MwyEG9Nwl1aUUmCjBEVDGfllpZ+G1dxIMk1SW0Z65hygyXGuPb1u9WKipGXQ5qmcTW/kHwVh8ZB2P4oA6Fw1ZQMT7Pub28mopt56WlwqqmWlbUHYhXflvDBjDRZWCchs8yQH9TeUGax+tpT4lh3kXqGWoh/XpV1/kmuB8WaCYRY2/mIviqsKt71Lldl3i9a8=
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR07MB6704.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(136003)(366004)(376002)(396003)(346002)(39860400002)(451199021)(83380400001)(82960400001)(5660300002)(8676002)(38350700002)(38100700002)(66574015)(316002)(41300700001)(2616005)(66946007)(2906002)(66556008)(66476007)(33656002)(6666004)(450100002)(478600001)(8936002)(86362001)(36756003)(6486002)(52116002)(87266011)(966005)(186003)(53546011)(26005)(66899021)(6512007)(6506007); DIR:OUT; SFP:1102;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: lVsI6Hhp8mHBgTxDJgXnPIigVGvble30ydErNkndA956xCchRw9SCiNeYWpXKC9vP01VcrIowz4VrPHfOZibAQYFJgTjCdMBA54w54jp2K6/Y9C3Kg3I84g5TJ0FUhTqHuhj8BRSFGhux31dXhnhlMZabXFKttPldwusibVe7ANZ9EqgnfXv0IeRK9BM8fJrMUqVlMTPcJvJtyxCvdIQ5KQIsZrSP/PoNpWXtMTz4Zo6l2iM5mqMlJchECAlHUvazrOGtiy4cAY005s/ps91R8x6G3u9Q6BDZNZD0cSu08bSF6b9bPmy0T0qbpIiO+dQ2woiSQMADNtt0C3obbZxbLX6ldo3h2aL/+zOMfgtB0b408w0RFota5IxATmNwV5EqKPpxxkp8lVKy8W/wjJ9XHKJRVk6THiQ9KsVs4i1/JL6JvkZHi+6NduY0JVMpAKlJ32OHiZk29WB2wVQ3JSQ7735GllzoYywz2BNHVIifi6rqYwZUJtQFOIIvniSfryOR76vuJVOz6iuRInuLIthjzwLy29SFhXNQTUTOya+JfMnbkY3SqW8aYj2ahQZc+YT8PwGRfcPkzFn7gN5T9j/O2k3glPLz61mbIR60zXCXiuF9Y0pIeA++nGn65P4oJD22cTbi6TqN/dx5ibAXhC2tsDGz5He1d9SclkSZBDX/OdlxVJKBVeBeC3Zud0lyeh98JN5KE7GJP/avv5V2CwsXZeI4YJ2BUoAMJDA+hKh7O2n0jQ/mpasPzghbQxSWVW38xiEg3ASkdTX3T1Z0gGsTj+sIaphF/Db9P6z3tlVQTFluEkZ6zsvXEoXY35uigr8DkRoq0epitil1IfH67aiE+dJrhSg1NdTHSCFtg7f6aksHjj42a+84lMq+nCfW9bZH+0zuAI8eag0hiiOf5TmSa+pSL6gJEWtW0WFIWWH42T9wqqOw/iRNc5ycqTJ1nHSFFcjEFBN7LjLuKP+CYSUmzhg3PsqNWIy4ZGMNpHRve0YRIB51vFrkgWDkAGw0vpbvDGsKuTu1yAyRILBflHENglXtjoi/NkUmLzqAdaeH32kE0abjW5WAItgc5exA/Pc+kZAqQ/lM6RKre8XfcKHa8M9KBdnzZqMVJJDqjMouVghxb8+cQ3NvMahl3SU9zY7VwMl8AwNltN9D6Xx6zQtqT/BpmTsAs7qFfhkXKKKckqYQLA+PFvU6WE5BDtTSClQyCSGdMgYWvQpyhOHCN6BOLYymoTg2w35Om43710oHMELcovYvUM6Sl0ZfR1r9Qv6B4Kxs27u+U4SGMC/pe2D6sss2Mq0qyk3wh1mTEuTUb2oXn85qvUofvsbLmQNZ8jUAL0Tz0vbgmWP/EJ2BKqm67FjPHezNoJIP91YTKlOUeqBfwfoUZhUPxHcdnwovA35oYcI+dKM6MkVewx+u6/v8s1+uJNIE0lrSavydpa3KauHhOgzzAX3zX9fGRc4DLrsEUCK3wgtvWs0/6sPMfAc/lIYDmtbUmpkwC+GbqbLZS+E64aCGW8DAU00mYpZq/k64QvGQ5f6a4iN7EIpYxmS/CCUOJA0cAgAdS+ncoS5n2Vg/UTJydwawE/Eiv1tsHYjgzChz/mEQ3KT91+Tryr8zg==
X-OriginatorOrg: btconnect.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 72a7c78a-3598-4fca-d02b-08db4632c5a8
X-MS-Exchange-CrossTenant-AuthSource: VI1PR07MB6704.eurprd07.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Apr 2023 08:46:48.2932 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: cf8853ed-96e5-465b-9185-806bfe185e30
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: y9ufOdsXfMAK6vMaI015f3jeApA+bcGq6sBsfxhvC5UgksWj0QYwOOnMzHptFayYMdsSsu8ZJVKHr30GVuDmCA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR07MB9252
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-interest/zL1kXkAAQfjDu-l4_DS7wk-5DYA>
Subject: Re: [rfc-i] archiving outlinks in RFCs
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Apr 2023 08:46:58 -0000

On 25/04/2023 19:50, Alexis Rossi wrote:
> Hi all,
>
> I wanted to let the community know about something I’ve been working on. As you might know, one of my previous jobs was running the Wayback Machine, so when I started working with with this collection of RFCs one of my first thoughts was, “I wonder how many broken links are in these RFCs from the past few decades?”
>
> In general, the average lifespan of a URL before the content changes or disappears is on the order of 100 days. Fortunately for us, the links used in RFC references seem to be much more stable than that. For instance, so far I’ve only found one broken link in an RFC from the past 6 months [1].
>
> Even though we favor these more stable URLs, some of them will eventually change or go 404 and having archival documents with link rot is something we can take steps to avoid in the future.
>
> The first thing I wanted to do was just make sure we were archiving these outlinks somewhere. This won’t fix a broken link in the RFC, but at least the resource can be saved elsewhere for someone curious enough to go look (and potentially we could fix links in some version of the RFC in future).
>

What is an 'outlink'?

Not a term I can recall seeing in an RFC or on a discussion list before.

Tom Petch


> The main services that are well qualified for this purpose are Archive-It.org <http://archive-it.org/> (run by the Internet Archive) and Perma.cc <http://perma.cc/> (run by Harvard Law School Library). I chose Archive-It, and when I approached them they offered us an account [2] with enough free data storage for our needs. Yay for non-profits supporting each other!
>
> So far I have used Archive-It to:
> Archive rfc-editor.org <http://rfc-editor.org/>, iab.org <http://iab.org/>, irtf.org <http://irtf.org/>, and ietf.org <http://ietf.org/> (minus datatracker and the mail archive)
> There are lots of references to these sites in RFCs, but I also wanted to preserve the contents for their own sake. I plan to revisit these sites once per year.
> I am avoiding datatracker (except for outlinks from RFCs) because of concern about the extra traffic causing problems for the team that maintains the site.
> I have not concentrated on archiving the mail archive yet, though I know some of it has been saved incidentally.
> Archive outlinks from RFCs
> About once per quarter I’ll grab the outlinks from newly published RFCs and get them crawled.
> I am also going backwards through the entire series - I’ve started with the most recent RFCs (links are more likely to still be live) and am working my way back in time.
>
> There may be more room for improvements here, for example including archived links in RFCs from the start w here appropriate, or potentially defining a way for links to be self-healing in published RFCs.
>
> Please let me know if you have ideas or feedback on this.
>
> Thanks,
> Alexis
>
>
> [1] RFC9311 published in September 2022, in Section 11 (Informative References) this link is 404: https://www.ietf.org/how/meetings/98/bits-n-bites/ <https://www.ietf.org/how/meetings/98/bits-n-bites/>[2] https://archive-it.org/organizations/2540 <https://archive-it.org/organizations/2540>
>
>
>
> _______________________________________________
> rfc-interest mailing list
> rfc-interest@rfc-editor.org
> https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest
>