Re: [rfc-i] Meta decorations in generated HTML

John R Levine <johnl@taugh.com> Thu, 26 May 2022 14:18 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 7B1D2C185141 for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Thu, 26 May 2022 07:18:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ietf.org; s=ietf1; t=1653574698; bh=ehH3NUWAg/02Q0HO+tVMnjAreE+cdz1jWcWg0ZRzdV0=; h=Date:From:To:In-Reply-To:References:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe; b=iHmiwAMh/hmEXuGWFBqSp3b1cqLzteTSECZeJdnBlga2/Z/NmrhQGngKYLOjseFEx cHi4n4W40ksB9yNjkTyx+WUMDkaaMWRb9LDC3NkuYANY+InMn56yJU8I9vSX05hXTy lJYKLZBVIP7hCseeCfbujOVYBlM1nJ6Mt21lXp/c=
X-Mailbox-Line: From rfc-interest-bounces@rfc-editor.org Thu May 26 07:18:18 2022
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 45F3AC1850E2; Thu, 26 May 2022 07:18:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ietf.org; s=ietf1; t=1653574698; bh=ehH3NUWAg/02Q0HO+tVMnjAreE+cdz1jWcWg0ZRzdV0=; h=Date:From:To:In-Reply-To:References:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe; b=iHmiwAMh/hmEXuGWFBqSp3b1cqLzteTSECZeJdnBlga2/Z/NmrhQGngKYLOjseFEx cHi4n4W40ksB9yNjkTyx+WUMDkaaMWRb9LDC3NkuYANY+InMn56yJU8I9vSX05hXTy lJYKLZBVIP7hCseeCfbujOVYBlM1nJ6Mt21lXp/c=
X-Original-To: rfc-interest@ietfa.amsl.com
Delivered-To: rfc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9E550C1850E3 for <rfc-interest@ietfa.amsl.com>; Thu, 26 May 2022 07:18:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=iecc.com header.b=WZrx7apX; dkim=pass (2048-bit key) header.d=taugh.com header.b=gOrfM97p
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dPQwOFK8yvh1 for <rfc-interest@ietfa.amsl.com>; Thu, 26 May 2022 07:18:11 -0700 (PDT)
Received: from gal.iecc.com (gal.iecc.com [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id EEA81C165513 for <rfc-interest@rfc-editor.org>; Thu, 26 May 2022 07:18:10 -0700 (PDT)
Received: (qmail 46253 invoked from network); 26 May 2022 14:18:09 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:subject:in-reply-to:references:mime-version:content-type; s=b4a9.628f8c21.k2205; bh=SIHHqj8LK5zRqeLOhAt4KiktfzjgLHVEsrbtduK28Vc=; b=WZrx7apX4+H/Mlex+r8s2VJY6bzg04L/rU/fVQsR6fNxU7nLDdKXzW2FVk+uPlNus+hGzeMenlyL1+cv5yEvu4oGM0WLf9CJYmKHOpTrE++0vsuEaj6GALqVE84CWuXLDEfb1Me1HQFv43n0dDOYntnBDUjW8zOwnvqheR0ZYM/Ptc3Sdfn4nJKOEt2PXRi41JpeAnhEzahTnAt3FYihAxxHUblf6EFsOuX9jbkbjomB41riuDKOyz71nBcM9SNVqGmgMvBgfV0hvz+641Wvv+FFyZWSffT3Z+pjBz2P8tJ041v27MjJgUrdjnOCB0Giuo8vTCW7qd1GfatrkudVKg==
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=taugh.com; h=date:message-id:from:to:subject:in-reply-to:references:mime-version:content-type; s=b4a9.628f8c21.k2205; bh=SIHHqj8LK5zRqeLOhAt4KiktfzjgLHVEsrbtduK28Vc=; b=gOrfM97p9ELj3agU032AudnUvOPxmQXAI0ZOgV555w3wmfrcQrd8wQVBs5k2IjDIHHum6fpYU1FJQb8Qu3vonQ4qgVlaoi6pMyOsCYyyPg3srNZzggYePPc90ECr+LOq6qZ32Vrcb7ezL5VGTSv6ZIMoMz8iYI6EbGbgpkPxhSRUmKDR5TwfOWhvoipdc9s3eqw9VM19+zZeVqUyxNMsW3JJ2k97KuAA0p8XHLP3GE6Ze0JMeKvTpiIFBvL5FH8ALIWOj9muiSgxOtS6wiMH38Jc7fZTMeRNBCRx7Eip2Sn5J9pkHOvYnbfkdjTzbS1bPh9c0STxK8iM8+p2rVtcIg==
Received: from ary.qy ([IPv6:2001:470:1f07:1126::78:696d:6170]) by imap.iecc.com ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.3 ECDHE-RSA AES-256-GCM AEAD) via TCP6; 26 May 2022 14:18:08 -0000
Received: by ary.qy (Postfix, from userid 501) id 1C0F041AE996; Thu, 26 May 2022 10:18:07 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1]) by ary.qy (Postfix) with ESMTP id D12AC41AE978 for <rfc-interest@rfc-editor.org>; Thu, 26 May 2022 10:18:07 -0400 (EDT)
Date: Thu, 26 May 2022 10:18:07 -0400
Message-ID: <0ab66d2e-aa7d-eb17-83dc-2774e9d021a7@taugh.com>
From: John R Levine <johnl@taugh.com>
To: rfc-interest@rfc-editor.org
X-X-Sender: johnl@ary.qy
In-Reply-To: <5afe0f29-ab5a-b79e-cad4-7c18cf8fc5d3@gmx.de>
References: <20220525203826.8606A41A4E93@ary.qy> <f0f92d4c-8cc4-c3bb-0f0d-96c3ad422303@gmx.de> <C826D239-7CCB-404E-9591-B33C34ED82C9@tzi.org> <5afe0f29-ab5a-b79e-cad4-7c18cf8fc5d3@gmx.de>
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-interest/EshCYn8lefD0VMtxxozNAWQDoFI>
Subject: Re: [rfc-i] Meta decorations in generated HTML
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

> Am 26.05.2022 um 08:30 schrieb Carsten Bormann:
>> On 26. May 2022, at 07:43, Julian Reschke <julian.reschke@gmx.de> wrote:
>>> 
>>> They duplicate a lot of information that is elsewhere, so I think it
>>> would be good to minimize the amount of duplication.
>> 
>> But then, they are buried in inscrutable HTML boilerplate anyway, so I 
>> think the need to minimize is somewhat weak.
>> The need to get indexed prevails, at least from my point of view.
>
> Absolutely. It would just be nice to understand what Google Scholar
> actually *needs* (their intransparency is really mind boggling).

We're talking about seven small pieces of bibliographic metadata that are 
added mechanically to the HTML.  They're all the ones listed on the 
Scholar web site that Google uses to decide where to index the pages they 
include.  While I suppose they might still include us if we left some of 
them out, I don't see why we would want to make it harder to index our 
RFCs accurately.

There was one undocumented problem, that they can't handle tags like
", Ed." in the author name which we found out by talking to people who run 
Scholar.  I also added an XML site index that includes all of the versions 
of the RFCs to make spidering easier.

I would have thought it was self-evident why we have redundant 
bibliographic tags in the HTML (not the XML).  After three decades of 
disorganized evolution, different indexes use different tags.  None of 
them are large or hard to create, so if we add them all, we help get our 
documents indexed better.

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest