Re: [Last-Call] Last Call: <draft-koster-rep-06.txt> (Robots Exclusion Protocol) to Informational RFC

John Levine <johnl@taugh.com> Mon, 28 February 2022 22:29 UTC

Return-Path: <johnl@iecc.com>
X-Original-To: last-call@ietfa.amsl.com
Delivered-To: last-call@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 333B53A1634 for <last-call@ietfa.amsl.com>; Mon, 28 Feb 2022 14:29:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.862
X-Spam-Level:
X-Spam-Status: No, score=-1.862 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.248, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=iecc.com header.b=kjDX36Hg; dkim=pass (2048-bit key) header.d=taugh.com header.b=ne31iPxv
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HEm3WWPlUd3E for <last-call@ietfa.amsl.com>; Mon, 28 Feb 2022 14:29:37 -0800 (PST)
Received: from gal.iecc.com (gal.iecc.com [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F31CA3A162D for <last-call@ietf.org>; Mon, 28 Feb 2022 14:29:36 -0800 (PST)
Received: (qmail 36895 invoked from network); 28 Feb 2022 22:29:33 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:cleverness; s=901b.621d4ccd.k2202; bh=aPIrxSijBB0QhDne2W5ZE1UUoV3o1VkQz8Obnw7og1Q=; b=kjDX36HgqDhnKjoAeNW+xW66MTx9KBvs872ZOcZmYz1/JWIh25eG82liNZU5BMP/vuIvg3oU2CMUuAQom3CdlklC7eVCZW0F2HkX1woublUu5TQnEYVECMXJ3YiPKMqj+3wUBHSMPiLLKuBLEpqzL0GpRYn2sD9rNPtPeGVbobm4m+JrRyyDcTdNOZrm/n9T+1n1NOuJuhaK7zaO08PzhlMxoNV/sY8FtsRP55bSphaJHW0hawWwtwJJTc+uJYeg1yqt8xQHG/6AbBnEWLAT0Bm/nX1BAku4oz88i5f7Jcti72vzbMXIWnzdSOxkxVxjA2aNOziNiWK22V1wtmtXVQ==
DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=taugh.com; h=date:message-id:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:cleverness; s=901b.621d4ccd.k2202; bh=aPIrxSijBB0QhDne2W5ZE1UUoV3o1VkQz8Obnw7og1Q=; b=ne31iPxv5nCi4OetcyXFwbaJM0cFWtXZceR9dCbDU3k7V/ImEfKfLwEvPvTPKIOUIC/BRJHeeeMoyFWIqlVZOeCxxOSWhGjU8se6q3y6GUOEy0fpAG6k9stFPee/d3ZqtGCT5s2+eHixQ0PzkXNdSoihzEQAeiXF9AM2TAp9G90Iaqktm46rl93hmw6+byX/VP3oWHvlQqrPledBSgCxEklFSd/XFyW3UvMhrKKBb0pCrK0LUCDZTdVRcejsoum0XvQ1ZmIRc6wHUBySEJS+ELdykNzs83OQYYOXsdXTU5dwjidtnDgwbdyfGqVUIn7ihu1YfwFVl6hO4fT36Xd2cQ==
Received: from ary.qy ([IPv6:2001:470:1f07:1126::78:696d:6170]) by imap.iecc.com ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPS (TLS1.2 ECDHE-RSA AES-256-GCM AEAD) via TCP6; 28 Feb 2022 22:29:33 -0000
Received: by ary.qy (Postfix, from userid 501) id 825F33844270; Mon, 28 Feb 2022 17:29:30 -0500 (EST)
Date: 28 Feb 2022 17:29:30 -0500
Message-Id: <20220228222932.825F33844270@ary.qy>
From: "John Levine" <johnl@taugh.com>
To: last-call@ietf.org
In-Reply-To: <2597.1646071002@localhost>
Organization: Taughannock Networks
X-Headerized: yes
Cleverness: minimal
Mime-Version: 1.0
Content-type: text/plain; charset=utf-8
Content-transfer-encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/last-call/1iKMC_7jFMtvK9Q0QADSXied1rs>
Subject: Re: [Last-Call] Last Call: <draft-koster-rep-06.txt> (Robots Exclusion Protocol) to Informational RFC
X-BeenThere: last-call@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Last Calls <last-call.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/last-call>, <mailto:last-call-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/last-call/>
List-Post: <mailto:last-call@ietf.org>
List-Help: <mailto:last-call-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/last-call>, <mailto:last-call-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Feb 2022 22:29:42 -0000

It appears that Michael Richardson  <mcr+ietf@sandelman.ca> said:
>It's good to see robots.txt coming to the IETF.

Agreed, also agree with Mnot's question whether we have reports from
other search engines that they follow this spec. Based on my
experience looking at my web server's log files and tweaking the
robots.txt files and looking at the web sites where they explain their
crawling practices, I think they do, but surely we know people at a few
other search engines.  I'd be particularly interested to hear who
interprets the * and $ pattern metacharacters.

Section 2.2.2 has this example of a path with a Unicode character:

   | /foo/bar/U+E38384 | /foo/bar/%E3%83%84    | /foo/bar/%E3%83%84    |

There is no U+E38384 character, but the UTF-8 version of the Japanese
character U+30C4 is hex E3 83 84 so I'm guessing that's what they meant.

The "Crawl-Delay" line is ignored by Google but followed by many other
search engines such as Bing and Yandex. I would describe it, with a
note that only some spiders use it.

Most importantly, the copyright license is broken. At the top it has
the "no derivatives" license, which is fine, but it also has code
sections in <CODE BEGINS>. The TLP specifically says that the code
license only applies RFCs that use the regular license, not any other
license. In this case the "code" sections are short snippets of sample
robots files with made up names and paths so I would take out the code flags.

R's,
John