Re: [rfc-i] archiving outlinks in RFCs

Martin Thomson <mt@lowentropy.net> Thu, 04 May 2023 01:59 UTC

Return-Path: <mt@lowentropy.net>
X-Original-To: rfc-interest@ietfa.amsl.com
Delivered-To: rfc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 41D39C151548 for <rfc-interest@ietfa.amsl.com>; Wed, 3 May 2023 18:59:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.798
X-Spam-Level:
X-Spam-Status: No, score=-2.798 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=lowentropy.net header.b="mLgHY2D4"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="kWcdY7Ps"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kkUka1OYC1Wi for <rfc-interest@ietfa.amsl.com>; Wed, 3 May 2023 18:59:47 -0700 (PDT)
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8DF35C15171E for <rfc-interest@rfc-editor.org>; Wed, 3 May 2023 18:59:47 -0700 (PDT)
Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 49B6E32005B5 for <rfc-interest@rfc-editor.org>; Wed, 3 May 2023 21:59:46 -0400 (EDT)
Received: from imap41 ([10.202.2.91]) by compute6.internal (MEProxy); Wed, 03 May 2023 21:59:46 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lowentropy.net; h=cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm1; t= 1683165585; x=1683251985; bh=BIa8usvjKk9oRs5BUYmOZ9EK1NKHgSPnLxc 35TuVSXg=; b=mLgHY2D4JcJ5ryR7gVs3smKuxNmxS0lc/NVfzzC6o92fc/PRdZQ 8iTL0oEl1LnJgt8LWBRK7vfWnQz8iTPNJ5aYrZns56JEdkFeB21NXV/R8TVXuvI3 JoieFjZuPGkD23bnf38R2CvfmUZddslnmvxBn8X22vbNt+ie4+vah7RBqVMjE/sd IORSFwxEGg4g5nM+De1u9pE590RNpLd9RDcAIVMQMUBjplyxYmUy1cXpFzL3dTa7 I/hHS5DXHBG/zIZbCNTnDed7uIFz48LLgj3/dOckay0FqPxwTT2heQxWy3UyEV+G 4cAed83BnQwA58BYZxIeWp20rhYnafxgh2A==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1683165585; x= 1683251985; bh=BIa8usvjKk9oRs5BUYmOZ9EK1NKHgSPnLxc35TuVSXg=; b=k WcdY7PsOhk/aN6HrWTVIA9Xz6mQcArHBxIhv/vkETtIhjUihmfcvGheZL/WCj/0c BIMmJpIt4TyJjCEw6jFIDiGkNtNFNPOljoI8mhJoJrD4ZfQUOzFun0hUB4twd4Kq Mj5ZKzHYfk0QSIH5ba2IOfVOcBVJ5IM5uG97NBGDlzOMZxpstRcuIaDLS0BHaVfr VO0bNdX8I5FUc1pJi3suK5/Gnds5xIDLerwKuR9zDVwSlSRwIifjyuFVAb5SJ5p7 sdZsbK8M3ziSYnwPe+1pnqhnF+6kMHoRQVMDQuKhCEgyZeLyK8LmXz6QqQPnsYsA JeTnNzV+t/yeZBCJ9iQLQ==
X-ME-Sender: <xms:kRFTZFjLsYpr6-U67Lbe3iwRmbedjmQnB1n8Bnqh1H55lUJRo3dgXA> <xme:kRFTZKDAetcy3w2LxyKf9RK3qPEsamS_x3OxfDuVbHWUXoX-zu-psUoPOGtsrJz_L kfj5ZUfL3O7jAxmvxQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfedvledghedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtgfesth hqredtreerjeenucfhrhhomhepfdforghrthhinhcuvfhhohhmshhonhdfuceomhhtsehl ohifvghnthhrohhphidrnhgvtheqnecuggftrfgrthhtvghrnhephfdtjefhvdfftddvle dujefhvedvhefhudefvdehgfekgfefuefhffefkeegteelnecuffhomhgrihhnpehrfhgt qdgvughithhorhdrohhrghdpsghrohhkvghnrdhlihhnkhenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmtheslhhofigvnhhtrhhophihrdhn vght
X-ME-Proxy: <xmx:kRFTZFEcNFMGWxikruEINunzGWlu4FGFS7xOEYYVCYpKk-z4hf421g> <xmx:kRFTZKRAMNyl7CqEy7z2XnOtdBe1dOscCm3VTE1oBhlr1vKx86l8KA> <xmx:kRFTZCyiEiYzHXZxZT4FMj-hOmAdCEfLOIHeUk1kFHbbSS5bWCH9PA> <xmx:kRFTZB9JP-YRRmqxUswd629sJJRzwbwhphQtDU3D4UuE27y2pxcQ0w>
Feedback-ID: ic129442d:Fastmail
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 92E69234007B; Wed, 3 May 2023 21:59:45 -0400 (EDT)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.9.0-alpha0-386-g2404815117-fm-20230425.001-g24048151
Mime-Version: 1.0
Message-Id: <c3ec8244-c00a-4c71-b642-bf7e0cf09aa8@betaapp.fastmail.com>
In-Reply-To: <BB283056-9CDA-4B3F-BEC7-BBAA036A3D29@rfc-editor.org>
References: <E024D9AC-2B92-4720-9713-519592D2362B@rfc-editor.org> <30c30c2f-4e96-560a-73dd-a51ba8d04714@comcast.net> <771B7586-FFBB-49E4-9B99-5578863FBD8B@rfc-editor.org> <CABcZeBOevOj8cWY7dacWxzwZS82+iAjf1p+DZWF=7WZ9JydnrQ@mail.gmail.com> <48de4d92-e279-4c26-ab3c-15dd854b56f8@betaapp.fastmail.com> <CABcZeBPqePQwPAq5pWda1pGaY_=kLkcOxCjZWmOv9yRZ_MNb7g@mail.gmail.com> <CA+9kkMBVMTG7Zku4gt_DwCNWArYTauR_O0u70zceCMtN2GNN_Q@mail.gmail.com> <796.1682529129@localhost> <CA+9kkMBiqZCqbDviOVQFmjROYJtViz=S7ZsW6T41mv4XGbZ3=g@mail.gmail.com> <04BE48FA-322D-457A-9D7B-A9DA8FCE8E50@rfc-editor.org> <CA+9kkMCKM7A81+EU0OegtE5UbjLoVwsK7FVig8toddj-1APwxw@mail.gmail.com> <CANMZLAakmafNpe91TGG0eioR_yHt=n=ncV7nKLMCvCaQevoH8A@mail.gmail.com> <1718A586-7CFE-42CB-8206-DD7B18383BC9@ietf.org> <CA+9kkMCm1C762sTXiiP=MLLP9huuzdTbjJ-zROEXXJKGuwoGdg@mail.gmail.com> <93dd2fb8-f986-ed10-9369-529ab6bd320c@huitema.net> <BB283056-9CDA-4B3F-BEC7-BBAA036A3D29@rfc-editor.org>
Date: Thu, 04 May 2023 11:59:16 +1000
From: Martin Thomson <mt@lowentropy.net>
To: rfc-interest@rfc-editor.org
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-interest/5jIe8WYE3DVGz1nXfqFAW8M-uHg>
Subject: Re: [rfc-i] archiving outlinks in RFCs
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
X-List-Received-Date: Thu, 04 May 2023 01:59:52 -0000

I've followed this thread with some amount of discomfort.  I'm going to try to articulate that here.

On Wed, May 3, 2023, at 08:15, Alexis Rossi wrote:
> Goal #1: Encourage authors to choose the appropriate URL when creating an RFC. 

This is existing practice and I don't see any evidence to suggest that people are doing poorly at this.  The RPC is particularly careful in this regard.  I don't see how we could do better in terms of choosing URLs.

> - References to URLs where the information you want to cite is the 
> exact same information you want all future readers to access should use 
> archived URLs in their references (ie take a “snapshot” of the info as 
> it is at this point in time and use that archived snapshot as the 
> reference).

This is an interesting idea, but a challenging one for a couple of reasons:

1. Superficial archival doesn't really work.  https://rfc-editor.org/info/rfcXXXX is our preferred form of link, but that references a cover page, not the important content.  Many sites that host the sorts of publications we might cite often (IEEE, IACR, ARXIV, ITU) all tend to do the same thing.
2. Paywalls, copyrights, etc...  We can't just take a copy of a spec and host it without running afoul of various constraints.
3. Formats.  The reason for cover pages can be to offer alternative formats.  Archival forces us to contend with format choices at the point of archival and then at retrieval time.

These are surmountable if we are willing to make the archival discretionary to some degree.  Anything mandatory will just keep hitting those obstacles.

> Goal #2: Allow RPC to fix broken links in a version of published RFCs 
> with appropriate approval.

I don't think that this is a good idea, as worded, pending conclusions in RSWG about changing XML.  I tend to think that this sort of change is over the line.

However, I do think that offering an alternative presentation form that shows, in a sufficiently clear form, any archived forms or updated links *as alternatives*, potentially by also marking broken links.

For instance, HTML is easy to tweak, and you could have links annotated.  Interacting with the annotation could show an overlay with information.  For example:

🔗 https://broken.link.example/X6.92

might become:

🔗[  https://broken.link.example/X6.92 (marked broken 2021-02-28)
      Alternative link: https://alternative.source.example/ (added 2023-05-04)
      Alternative link: https://another.alternative/ (added 2021-02-28)
      Archived copies of this resource: _HTML_, _PDF_, _TXT_ (added 2023-05-04) ]

(The added links could link to a record of the transaction.)  

This marker could be added easily to HTML renderings, without needing to query the database until the interaction occurs, meaning that you wouldn't need to regenerate the RFC rendering often; realistically, only when the process for rendering this stuff needs to change.

Note that adding archived copies might be part of the RPC service for newly published documents, but this wouldn't require that to happen.

> - When the RPC receives notification of a broken link, they can 
> identify a suggested replacement, obtain approval from the appropriate 
> entity, and update an html version of the RFC with the approved link.


> - Approval of replacement links for a document is provided by the same 
> entities who approve errata for the document.

This makes sense in the abstract, but I'd like to see this improved in this case, because the errata process leans more heavily on area directors than this process would seem to require.  If this only affects rendering and some of the metadata that is maintained for published RFCs (as my above proposal would), then a lighter process might help avoid this turning into a reason that we get even further behind on processing errata.

One thing that might make this easier, but which might also overturn this point, is that the same links are very common across the entire RFC catalogue.  So if https://broken.example/ appears in 15 RFCs, maybe the key to that database isn't the RFCs, but the URL itself.  That has several advantages: new RFCs that cite URLs already in the database don't need new copies of the documents, one change can fix many RFCs, etc...  Maybe you can use any editor from any of those RFCs to approve the change.

That's harder, but I think that a citation-centred database has enough advantages to justify considering it.