Re: [Tools-discuss] Google Scholar not indexing Internet-Drafts

Brian E Carpenter <brian.e.carpenter@gmail.com> Fri, 30 July 2021 21:00 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 351D93A1051 for <tools-discuss@ietfa.amsl.com>; Fri, 30 Jul 2021 14:00:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MEn5NtRrRlxM for <tools-discuss@ietfa.amsl.com>; Fri, 30 Jul 2021 14:00:33 -0700 (PDT)
Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 397513A106F for <tools-discuss@ietf.org>; Fri, 30 Jul 2021 14:00:19 -0700 (PDT)
Received: by mail-pj1-x102d.google.com with SMTP id g23-20020a17090a5797b02901765d605e14so16126490pji.5 for <tools-discuss@ietf.org>; Fri, 30 Jul 2021 14:00:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=zpwaejSEiOgpECjEJN4/L0EQ8sfrRrTs/pDR+as+EM0=; b=EwN8+mtKRLE+tHtNTBxRVALPKzGMjDd3E9YnhjLZj0g97K7v7WiUVq2g2uTz1UmHak gmzsC1dtDALqTBC+Lk3vXB7DbE9uEPyV7wyMQ7x43/+iDJTQvwmY/q6LpNhL+Mt3HXhx ARDM3jHyx+Uiz0nwSzz3Mz1pXQKVjAYqBu2ME/CRCknok5PCL1EaiO6PTlbs7RKA/tvr RK7ApbzRjO38pjK0GRELpjviU8JJGmzsgort9HPf2SKdOzQ7ELcrIY32mxBSD01jhLRy VmXtEEqA5lqZIzbYDv7MiKCsdeUyhPmBi+vLU1xMU/1qut10teHSGi+8VrJ2CigcGxO1 QD5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=zpwaejSEiOgpECjEJN4/L0EQ8sfrRrTs/pDR+as+EM0=; b=EMWdu9fZvNZjWwKzm5zTsC9W3K0dofxzQk/OUw0SvMYGm3WEWU0IdhhyVPZS7FopBT FAmvr71Jwj31nHO0IWw35Yfzx3YQw3n/eWT2If0m+/fOXWtIi5pWDdYjNooSzXyB4x/A w2JnFm5qlKmCSwQZ43Lb/s5ZxbCPRDcITjT1XWhRTa4IzqTq27/fDRKmoabAI9uNRPik K1ip5beaKnAbNnLqJ3XyBLaBPzHJYLzopnHl4BDmBt4QfW2RrClVeSLf/431PRYCt+R1 8KRMqiRByWulsR3IKtMfWHOwSas5HDZ9lSuD3g3YnCo45PQWCWgQB+4nedhrdBJdeEUU 4MLA==
X-Gm-Message-State: AOAM532+6qERlpEv9pCFbVHGMFWI6nDLyKErmWL7zm2YLyhroW+FAqL8 EfNcOmqcJGIzDGccSQEbpBS5pD6W5Ti18Q==
X-Google-Smtp-Source: ABdhPJwNAtbYC3AyNfUetF6PKi1gJAmAS/PB8PAIh076WsHkksl/qKyfV2e36R0kiYgXs4VYZkhQ/Q==
X-Received: by 2002:a63:b48:: with SMTP id a8mr1898494pgl.169.1627678817870; Fri, 30 Jul 2021 14:00:17 -0700 (PDT)
Received: from ?IPv6:2406:e003:1188:5b01:80b2:5c79:2266:e431? ([2406:e003:1188:5b01:80b2:5c79:2266:e431]) by smtp.gmail.com with ESMTPSA id c17sm3311619pfv.68.2021.07.30.14.00.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 30 Jul 2021 14:00:17 -0700 (PDT)
To: John R Levine <johnl@taugh.com>, Carsten Bormann <cabo@tzi.org>
Cc: Tools Discussion <tools-discuss@ietf.org>
References: <b69f81cc-b0bc-ba9d-c752-e707d3b9174f@petit-huguenin.org> <20210729035232.EF66B254891D@ary.qy> <CAHw9_iLqS58BqefUVBeYSW22wEZy9LMKkRhaw0pDMNCGbcjo4A@mail.gmail.com> <2f359f5-838c-f1fe-bdb0-156d98ddc0e5@taugh.com> <FE7F9DF4-1C3B-4955-B51B-2EEC9432F9C2@tzi.org> <e415ab95-9863-78c9-b111-6b0dd2aef@taugh.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Message-ID: <afe42744-7c5e-254a-065c-8974bd7af605@gmail.com>
Date: Sat, 31 Jul 2021 09:00:14 +1200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <e415ab95-9863-78c9-b111-6b0dd2aef@taugh.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/McgUk0oRZCZxEDK1AH6jyBhp4TA>
Subject: Re: [Tools-discuss] Google Scholar not indexing Internet-Drafts
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2021 21:00:39 -0000

Several times, I've been asked to claim I-Ds by Google Scholar and have
rejected them. I doubt if there's an active feed, but if they analyze the
citations in a publication that cites an I-D, it will get into their system.

The fact that they point to a University of Auckland site for RFC8799
is direct proof that they either index the university's publication
repository, or that they index the entirety of repositories based on
symplectic.co.uk.

They do have some very strange things, though. With a little effort,
you can extract this citation:
@article{carpenter1994aeiou,
  title={AEIOU: Address extension by IP option usage},
  author={Carpenter, B},
  journal={Internet draft-carpenter-aeiou-00. txt},
  year={1994}
}


Regards
   Brian

On 30-Jul-21 23:35, John R Levine wrote:
>> Clearly, Google Scholar should be taught to find the canonical RFCs at 
https://rfc-editor.org/rfc - having many secondary copies is an SEO anathema.
> 
> Right.  I hope some sitemaps will help.
> 
>> Internet-Drafts, hmm.  Many are not arxiv quality, and we don’t want the 
>> I-D repository as a dumping ground for people who just want to get their 
>> garbage into Scholar.  But the main problem is that the references to 
>> the drafts will stay active even when the RFC has been published(*) (or 
>> the I-D replaced), so we should be very careful with what we offer under 
>> the URI that will be indexed.
> 
> I agree they're not what Scholar is looking for.
> 
> I'm not worried about people gaming it, you should be able to get anything 
> into Scholar if you put it on a web server with the right metadata, but 

> the last thing I want is yet another reason people will imagine that every 
> I-D is an Internet Standard.
> 
> Regards,
> John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
> Please consider the environment before reading this e-mail. https://jl.ly
> 
> 
> ___________________________________________________________
> Tools-discuss mailing list - Tools-discuss@ietf.org
> This list is for discussion, not for action requests or bug reports.
> * Report datatracker and mailarchive bugs to: datatracker-project@ietf.org
> * Report tools.ietf.org bugs to: webmaster@tools.ietf.org
> * Report all other bugs or issues to: ietf-action@ietf.org
> List info (including how to Unsubscribe): https://www.ietf.org/mailman/listinfo/tools-discuss
>