Re: [Tools-discuss] Google Scholar not indexing Internet-Drafts

John R Levine <johnl@taugh.com> Thu, 29 July 2021 16:57 UTC

Return-Path: <johnl@taugh.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 999CB3A0DD7 for <tools-discuss@ietfa.amsl.com>; Thu, 29 Jul 2021 09:57:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=iecc.com header.b=l+VxGc04; dkim=pass (2048-bit key) header.d=taugh.com header.b=Q6EbpqxS
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BrfYttZGWDCe for <tools-discuss@ietfa.amsl.com>; Thu, 29 Jul 2021 09:57:11 -0700 (PDT)
Received: from gal.iecc.com (gal.iecc.com [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9EEBD3A0DD4 for <tools-discuss@ietf.org>; Thu, 29 Jul 2021 09:57:11 -0700 (PDT)
Received: (qmail 6837 invoked from network); 29 Jul 2021 16:57:09 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:cc:subject:in-reply-to:references:mime-version:content-type; s=1ab3.6102dde5.k2107; bh=v+WQ1+uJn4aO/6w8RVazaSH0ACRfjb3NI6cYiVYwHxY=; b=l+VxGc04SapGKULvJlSKU+cnQHMCtPMowALwz+VW91h1HRUpsxQOp4DLg+Q4uDj5H1rlmZLjC9/ZXx4Bi+O3Z1cLPe4lO0FLYr48Zf7LlEP0c9L53xmYRyXhSE5dhdj1yRHNIjQCHu/WMerswDouSsP/o3fFDWdJsb2eGqWyMLta8LcQWMwjIdK5pEiiuovWNF5QiicWO7mWLgRqtcL3c3lNZyaJ8DAGXElhn0sl50usVjadzYeS/wqTYH/KfJgFn7BkWzZXopyLl3hQvN+te2VGiU9CEVlufG+bTc66JUjILXvBtfBK1iugBnqH9ghLajLI1Hitb7Oeyq4kzq2mxg==
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=taugh.com; h=date:message-id:from:to:cc:subject:in-reply-to:references:mime-version:content-type; s=1ab3.6102dde5.k2107; bh=v+WQ1+uJn4aO/6w8RVazaSH0ACRfjb3NI6cYiVYwHxY=; b=Q6EbpqxSr6aSlOzfljimi+UojN+WGs4jAqrmztsu26GZZP2vaogFgUJWwhI31wsee/CX/RFqkxtVjUvvupymdnHXgNiYoTpu0sQeOHlIRVqiVgE3d3Tcw0fvTMFD7rSyueGo1NmxB7zG57gLWUbRLasxG313Jxt8LFVMJTQS5NbqxTyyCAlJCu1UIAGOarQSjhxX5ymRMzarDvT0NCuTHfJzAOUxMVw7IF6SC8TnxQvQwhlQ5VUoGBa3Sm4vbq5uOuN+upOvpasoMh0kUXVon/b4tdrDiSD8IN25BjZkjggnnyrWiCsi6umvrGG81VKZihsb4j3Hn4bHeN0X1fdBqA==
Received: from ary.qy ([IPv6:2001:470:1f07:1126::78:696d:6170]) by imap.iecc.com ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.2 ECDHE-RSA AES-256-GCM AEAD) via TCP6; 29 Jul 2021 16:57:08 -0000
Received: by ary.qy (Postfix, from userid 501) id 473F9254BA5E; Thu, 29 Jul 2021 12:57:06 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1]) by ary.qy (Postfix) with ESMTP id 9CC01254BA40; Thu, 29 Jul 2021 12:57:06 -0400 (EDT)
Date: Thu, 29 Jul 2021 12:57:06 -0400
Message-ID: <2f359f5-838c-f1fe-bdb0-156d98ddc0e5@taugh.com>
From: John R Levine <johnl@taugh.com>
To: Warren Kumari <warren@kumari.net>
Cc: Tools Discussion <tools-discuss@ietf.org>
X-X-Sender: johnl@ary.qy
In-Reply-To: <CAHw9_iLqS58BqefUVBeYSW22wEZy9LMKkRhaw0pDMNCGbcjo4A@mail.gmail.com>
References: <b69f81cc-b0bc-ba9d-c752-e707d3b9174f@petit-huguenin.org> <20210729035232.EF66B254891D@ary.qy> <CAHw9_iLqS58BqefUVBeYSW22wEZy9LMKkRhaw0pDMNCGbcjo4A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/3TMw8SRRXG2hw5tvFDq2R8a4seQ>
Subject: Re: [Tools-discuss] Google Scholar not indexing Internet-Drafts
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2021 16:57:18 -0000

>> They're not even indexing RFCs, other than some old ones where it finds
>> copies that the ACM has.  Yow.
>
> Citation needed.

Try site:rfc-editor.org or site:tools.ietf.org and you get nothing,

> In an incognito window, the RFC Editor result was the first, second or
> third result for all of these...

There's two separate issues here.  While Google does index our web sites, 
the results are pretty random.  Once the tools server is retired, some 
sitemaps and meta tags would help.  The sitemap on datatracker.ietf.org 
only includes IPR disclosures and liason statements which seems 
suboptimal.  rfc-editor.org has no sitemap at all.

The other issue is that Google Scholar isn't finding any of the three 
copies of RFCs on our servers.  Our file formats leave something to be 
desired (there are industry standard HTML tags for bibliographic info 
which we don't use) but I don't think that's it, because it found a copy 
of a recent HTML RFC on Brian Carpenter's institution's site.

I think I will try some experiments with sitemaps, and maybe try adding 
the industry standard citation tags that Google says they prefer.

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly