Re: [Tools-discuss] Google Scholar not indexing Internet-Drafts

Carsten Bormann <cabo@tzi.org> Fri, 30 July 2021 11:28 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 53E3C3A26ED for <tools-discuss@ietfa.amsl.com>; Fri, 30 Jul 2021 04:28:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ziVAVhKKzNYU for <tools-discuss@ietfa.amsl.com>; Fri, 30 Jul 2021 04:28:13 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [IPv6:2001:638:708:32::15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 719A33A26EF for <tools-discuss@ietf.org>; Fri, 30 Jul 2021 04:28:13 -0700 (PDT)
Received: from [192.168.217.118] (p548dcc89.dip0.t-ipconnect.de [84.141.204.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4GblWY6CTYz31WQ; Fri, 30 Jul 2021 13:28:09 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <2f359f5-838c-f1fe-bdb0-156d98ddc0e5@taugh.com>
Date: Fri, 30 Jul 2021 13:28:07 +0200
Cc: Warren Kumari <warren@kumari.net>, Tools Discussion <tools-discuss@ietf.org>
X-Mao-Original-Outgoing-Id: 649337287.144787-e118d7eaa264377861e62a05d63bef62
Content-Transfer-Encoding: quoted-printable
Message-Id: <FE7F9DF4-1C3B-4955-B51B-2EEC9432F9C2@tzi.org>
References: <b69f81cc-b0bc-ba9d-c752-e707d3b9174f@petit-huguenin.org> <20210729035232.EF66B254891D@ary.qy> <CAHw9_iLqS58BqefUVBeYSW22wEZy9LMKkRhaw0pDMNCGbcjo4A@mail.gmail.com> <2f359f5-838c-f1fe-bdb0-156d98ddc0e5@taugh.com>
To: "John R. Levine" <johnl@taugh.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/EIwaIKXqZnab4gbqYES1mBM3M3I>
Subject: Re: [Tools-discuss] Google Scholar not indexing Internet-Drafts
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2021 11:28:18 -0000

On 2021-07-29, at 18:57, John R Levine <johnl@taugh.com> wrote:
> 
> The other issue is that Google Scholar isn't finding any of the three copies of RFCs on our servers.

Ages ago I had copies of all RFCs in some PDF form on our servers and finally got some bibliography servers to index them.  That must have been 15 years or more ago, I don’t remember the details.

Clearly, Google Scholar should be taught to find the canonical RFCs at https://rfc-editor.org/rfc - having many secondary copies is an SEO anathema.

Internet-Drafts, hmm.  Many are not arxiv quality, and we don’t want the I-D repository as a dumping ground for people who just want to get their garbage into Scholar.  But the main problem is that the references to the drafts will stay active even when the RFC has been published(*) (or the I-D replaced), so we should be very careful with what we offer under the URI that will be indexed.

Let’s stop kicking the can on the doc.ietf.org thing!

Grüße, Carsten

(*) E.g., I get https://tools.ietf.org/id/draft-ietf-core-observe-09.html as the first hit for CoAP Observe.  No metadata block, seven versions off, ...