Re: [rfc-i] Wrong Internet search results for new RFCs

John Levine <johnl@taugh.com> Wed, 04 May 2022 02:27 UTC

Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 1DFB1C15EB3F for <ietfarch-rfc-interest-archive@ietfa.amsl.com>; Tue, 3 May 2022 19:27:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ietf.org; s=ietf1; t=1651631232; bh=y5lJgo+ZrDQnpdyBn6uoTSzqaKD7pwF7HxwgGXXvdWM=; h=Date:From:To:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe; b=ChVaFVHiLH0pzkv6EoQXW/Zm6e0IfzlWJ9xl2rE2pvgqZUenTk8hu9TKRvMf23NaP I7g5woEbKeaG1mJ6ceuiWBsHTWkxuX+Zp5eSQ3cOTyOPDPeAVzbeqJBtQU67Z9Zk+C n3BjD4jaVX3gvtx9u3Vx8hEKqoUNwHT/gqktkCI0=
X-Mailbox-Line: From rfc-interest-bounces@rfc-editor.org Tue May 3 19:27:12 2022
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id CEF30C15E41D; Tue, 3 May 2022 19:27:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ietf.org; s=ietf1; t=1651631231; bh=y5lJgo+ZrDQnpdyBn6uoTSzqaKD7pwF7HxwgGXXvdWM=; h=Date:From:To:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe; b=KYPtl+Um83xEUXZQiXZTWSlMbXHyx4csp57PVf+CwsuuX3Wj4z8xJIKFDM3eLOEiJ nrfQl6KWaPECJjiFi21fx9tDGtYGLJWnD7QCL/GGVdqpRjDHft1/CQBd6Qa5kkiR4A yq1pcWe544li9IhzvyT5+6DT0j9oVbNh/BXfwVh0=
X-Original-To: rfc-interest@ietfa.amsl.com
Delivered-To: rfc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 78A7CC14F721 for <rfc-interest@ietfa.amsl.com>; Tue, 3 May 2022 19:27:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.849
X-Spam-Level:
X-Spam-Status: No, score=-1.849 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.248, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=iecc.com header.b=geYNH4+j; dkim=pass (2048-bit key) header.d=taugh.com header.b=A9UsWezn
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5ZODPgW_dyJ6 for <rfc-interest@ietfa.amsl.com>; Tue, 3 May 2022 19:27:05 -0700 (PDT)
Received: from gal.iecc.com (gal.iecc.com [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 01149C15E412 for <rfc-interest@rfc-editor.org>; Tue, 3 May 2022 19:27:04 -0700 (PDT)
Received: (qmail 34061 invoked from network); 4 May 2022 02:27:02 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:cc:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:cleverness; s=850b.6271e476.k2205; bh=UM6lLvKAilQFA6sRyzZSgDPbbqeC6nHQmjLlHIih8AE=; b=geYNH4+jlrGN3bhx48m2QXoEwEDnM6cqBiPA/L7j1GHPxTkXDOeyY15qq2G2FjjSRQ/Vooq7PLCbrx/yzGlUWJW9k5+N8noCfZGFHkPssvBA1eYedoDDBXLSOljfvouOoVC50lQ8X72R3LMJCgcqPfEJlAA5agKVKP1CTEKhPncaRd04vfKz5EIY/c9LRHGOBaifaPdVOddtnOZj92/4jrHgb6YfvDGyRIvjiXW6v+eM35a8xPnp/K0bCOpqLjpsiIvLLX4Ghxs2ElJ5xO6/B+RJuiDT6IPrPGlXeZBEKO7StwZvrjWv563tGcQlk68zygwpd3CTe5dtRpwZ2zV9Ew==
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=taugh.com; h=date:message-id:from:to:cc:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:cleverness; s=850b.6271e476.k2205; bh=UM6lLvKAilQFA6sRyzZSgDPbbqeC6nHQmjLlHIih8AE=; b=A9UsWezndadAh0YaJCJVlyp1ZnJMCynFy5jXzGUUliamxEDxEolClB6xNz9jZZDSyMp5P2zcY33SheCRKhzP067B4qMVYCXOQpaCY4YOTQ83rWMfhVOrNUyco9BP48t5riKNue8tk/2/PqV1utYv/Smw6eQ59pF/yl4zjFsE14YRJTrAQHBQTo2qH+jhbT8Xw0MPXFO1q3XDbIrbfpCbcuSEHkf5S3dxFgxzbQScn7VWCxfmooFtkX1O5tzz1UQAYxi5OJk6XUEUV6m4ZKe85V9PNclDDwr1TjMxaxBWWp7AGgxTkhp4Jcil5ELqjtkN2g+5WdHSxpVK/8dR8sH/5A==
Received: from ary.qy ([IPv6:2001:470:1f07:1126::78:696d:6170]) by imap.iecc.com ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.3 ECDHE-RSA AES-256-GCM AEAD) via TCP6; 04 May 2022 02:27:01 -0000
Received: by ary.qy (Postfix, from userid 501) id ACBD33F5687A; Tue, 3 May 2022 22:27:00 -0400 (EDT)
Date: Tue, 03 May 2022 22:27:00 -0400
Message-Id: <20220504022700.ACBD33F5687A@ary.qy>
From: John Levine <johnl@taugh.com>
To: rfc-interest@rfc-editor.org
In-Reply-To: <4f682c77-a110-764b-dbf6-4d51695182f9@lear.ch>
Organization: Taughannock Networks
X-Headerized: yes
Cleverness: minimal
Mime-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-interest/TkqGy7KGZRl6bGdByTkYtU1Le3Y>
Subject: Re: [rfc-i] Wrong Internet search results for new RFCs
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>

It appears that Eliot Lear  <lear@lear.ch> said:
>Unless someone made a recent change robots.txt, someone is harvesting 
>the exclusions. ...

The bogus files are under /rfc/authors which wasn't in the robots.txt
until I noticed it and asked the RPC to update it, which they did a
few hours ago, around 22:50 UTC.

I also explained how to add a <meta> tag to the auto-generated index
pages, like the one for /rfc/authors, to tell spiders to go away,
which I presume they will do shortly.  That shouldn't be needed for this
particular directory now that the robots file is updated, but it should
keep any other folders with auto-generated indices from leaking into search
results.

Legitimate search engines like Google and Bing all follow the advice
in robots.txt, so the main challenge is to make sure what's in the
robots file and <meta> tags matches what we want to index. There are
also lots of junk spiders with names like BLEXbot and Ahrefs that seem to
be for the SEO crowd. Some follow robots.txt, some don't, but I don't
care what their users see.

R's,
John

_______________________________________________
rfc-interest mailing list
rfc-interest@rfc-editor.org
https://mailman.rfc-editor.org/mailman/listinfo/rfc-interest