Re: [rfc-i] archiving outlinks in RFCs

Alexis Rossi <rsce@rfc-editor.org> Thu, 04 May 2023 00:37 UTC

Return-Path: <rsce@rfc-editor.org>
X-Original-To: rfc-interest@ietfa.amsl.com
Delivered-To: rfc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46C37C14CE5F for <rfc-interest@ietfa.amsl.com>; Wed, 3 May 2023 17:37:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id njPHYPD1RKcj; Wed, 3 May 2023 17:37:54 -0700 (PDT)
Received: from smtpclient.apple (157-131-78-231.fiber.dynamic.sonic.net [157.131.78.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPSA id BA627C14CF1E; Wed, 3 May 2023 17:37:54 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.3\))
From: Alexis Rossi <rsce@rfc-editor.org>
In-Reply-To: <028f048c-e474-8081-261e-7cc32c63970d@cs.tcd.ie>
Date: Wed, 03 May 2023 17:37:52 -0700
Cc: RFC Interest <rfc-interest@rfc-editor.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <3433B3E5-B93E-46E3-A500-4CB531125693@rfc-editor.org>
References: <E024D9AC-2B92-4720-9713-519592D2362B@rfc-editor.org> <30c30c2f-4e96-560a-73dd-a51ba8d04714@comcast.net> <771B7586-FFBB-49E4-9B99-5578863FBD8B@rfc-editor.org> <CABcZeBOevOj8cWY7dacWxzwZS82+iAjf1p+DZWF=7WZ9JydnrQ@mail.gmail.com> <48de4d92-e279-4c26-ab3c-15dd854b56f8@betaapp.fastmail.com> <CABcZeBPqePQwPAq5pWda1pGaY_=kLkcOxCjZWmOv9yRZ_MNb7g@mail.gmail.com> <CA+9kkMBVMTG7Zku4gt_DwCNWArYTauR_O0u70zceCMtN2GNN_Q@mail.gmail.com> <796.1682529129@localhost> <CA+9kkMBiqZCqbDviOVQFmjROYJtViz=S7ZsW6T41mv4XGbZ3=g@mail.gmail.com> <04BE48FA-322D-457A-9D7B-A9DA8FCE8E50@rfc-editor.org> <CA+9kkMCKM7A81+EU0OegtE5UbjLoVwsK7FVig8toddj-1APwxw@mail.gmail.com> <CANMZLAakmafNpe91TGG0eioR_yHt=n=ncV7nKLMCvCaQevoH8A@mail.gmail.com> <1718A586-7CFE-42CB-8206-DD7B18383BC9@ietf.org> <CA+9kkMCm1C762sTXiiP=MLLP9huuzdTbjJ-zROEXXJKGuwoGdg@mail.gmail.com> <93dd2fb8-f986-ed10-9369-529ab6bd320c@huitema.net> <BB283056-9CDA-4B3F-BEC7-BBAA036A3D29@rfc-editor.org> <028f048c-e474-8081-261e-7cc32c63970d@cs.tcd.ie>
To: Stephen Farrell <stephen.farrell@cs.tcd.ie>
X-Mailer: Apple Mail (2.3696.120.41.1.3)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-interest/z179x9YZGE0z-NPd6XFElReciys>
Subject: Re: [rfc-i] archiving outlinks in RFCs
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
X-List-Received-Date: Thu, 04 May 2023 00:37:59 -0000


> On May 2, 2023, at 3:56 PM, Stephen Farrell <stephen.farrell@cs.tcd.ie> wrote:
> 
> 
> 
> On 02/05/2023 23:15, Alexis Rossi wrote:
>> Trying to steer clear of implementation details, I think these are
>> the goals from our discussion:
> 
> Apologies if I've lost track, but I thought part of this was
> to look back at what URLs in RFCs no longer work and also
> report on those as a way of figure out how to meet your goal
> #1 below. If that's the case, great, and maybe that'd be a
> good goal #0. If that's not the case, then it'd be a pity if
> the community don't get a chance to learn from that data.
> 
> I think there's value in reporting on what's found before
> trying to decide how to go about the two goals below, esp
> the 2nd one.
> 
> Cheers,
> S.

I think these can be independent of each other.

The Archive-It workflow is not particularly handy for producing this specific data, unfortunately. I think the appropriate thing to do to find this data is for us to determine how we want to find broken links in an automated way, and use that method to gather data when it is first run. That will only tell us about things like 404s and redirects, not about URLs where the content has changed or gone behind a paywall or similar (I think only human reports can do that part). 

As it is pretty straightforward to change the text on authors.ietf and the style guide, I don’t see a need to hold off on changing the text according to what we believe now and then update it later if we gather more helpful information. Unless you disagree with the goals below, of course.

> 
> 
>> Goal #1: Encourage authors to choose the appropriate URL when
>> creating an RFC.
>> - References to URLs with “live” content that is intended to change
>> over time should point to live URLs.
>> - References to URLs where the information you want to cite is the
>> exact same information you want all future readers to access should
>> use archived URLs in their references (ie take a “snapshot” of the
>> info as it is at this point in time and use that archived snapshot as
>> the reference).
>> Goal #2: Allow RPC to fix broken links in a version of published RFCs
>> with appropriate approval.
>> - When the RPC receives notification of a broken link, they can
>> identify a suggested replacement, obtain approval from the
>> appropriate entity, and update an html version of the RFC with the
>> approved link.
>> - Approval of replacement links for a document is provided by the
>> same entities who approve errata for the document.
>> If we think those goals are correct, here’s a proposal for how we
>> might get started:
>> - Update authors.ietf.org [1]  and Web Portion of the Style Guide [2]
>> to provide guidance on choosing live vs archived URLs.  Suggested
>> text: “For URL references, consider whether the resource contains
>> “live” information that updates over time, or whether you are
>> referencing information that should not change. For live information
>> we recommend that you use the live URL in your reference.  For URLs
>> where the information you want to cite is the exact same information
>> you want all future readers to access, you should use an archived
>> URL. An archived URL can be created using the Save Page Now feature
>> of the Wayback Machine [3], or you may ask the RPC to create one for
>> you prior to publication. It is not necessary to create or use
>> archived URLs for links to RFCs, Internet Drafts, RFC errata, or
>> emails on our mailing lists.”
>> - Add issue to 7322-bis [4] so that the updated Style Guide RFC
>> includes guidance on choosing the appropriate URL for your
>> references, per above.
>> - Update Web Portion of the Style Guide (or other place? I’m unclear
>> on where this should live) to allow RPC to fix broken URLs.
>> Suggested text: “When a link in a published RFC no longer leads to
>> the intended content, the RPC will attempt to replace it with either
>> an updated or archived version of the link in an html version of the
>> RFC. The RPC will consult with the appropriate entity to verify that
>> the replacement link is correct before the change is published.
>> Approval of replacement links for a document is provided by the same
>> entities who approve errata for the document. If there is no clear
>> approval entity for the document, RPC may ask RSAB for guidance.
>> Links changed via this process will be clearly marked. If no
>> replacement link can be found, the html version will identify the
>> link as broken.”
>> [1] https://authors.ietf.org/en/references-in-rfcxml
>> <https://authors.ietf.org/en/references-in-rfcxml> [2]
>> https://www.rfc-editor.org/styleguide/part2/
>> <https://www.rfc-editor.org/styleguide/part2/> [3]
>> https://web.archive.org/ <https://web.archive.org/>[4]
>> draft-rpc-rfc7322bis
>> <https://datatracker.ietf.org/doc/draft-rpc-rfc7322bis/> (which lives
>> in github at https://github.com/rfc-editor/draft-rpc-rfc7322bis
>> <https://github.com/rfc-editor/draft-rpc-rfc7322bis>)
>> _______________________________________________ rfc-interest mailing
>> list rfc-interest@rfc-editor.org https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest
> <OpenPGP_0xE4D8E9F997A833DD.asc>