Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster-rep-12> for your review
rfc-editor@rfc-editor.org Fri, 26 August 2022 06:27 UTC
Return-Path: <wwwrun@rfcpa.amsl.com>
X-Original-To: auth48archive@ietfa.amsl.com
Delivered-To: auth48archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 92E19C14CF12; Thu, 25 Aug 2022 23:27:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.66
X-Spam-Level:
X-Spam-Status: No, score=-0.66 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, CTE_8BIT_MISMATCH=0.998, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MwStmQT509hI; Thu, 25 Aug 2022 23:27:41 -0700 (PDT)
Received: from rfcpa.amsl.com (rfc-editor.org [50.223.129.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C2243C1522B5; Thu, 25 Aug 2022 23:27:29 -0700 (PDT)
Received: by rfcpa.amsl.com (Postfix, from userid 499) id ACF1255D46; Thu, 25 Aug 2022 23:27:29 -0700 (PDT)
To: m.koster@greenhills.co.uk, garyillyes@google.com, henner@google.com, lizzi@google.com
From: rfc-editor@rfc-editor.org
Cc: rfc-editor@rfc-editor.org, ted.ietf@gmail.com, superuser@gmail.com, auth48archive@rfc-editor.org
Content-type: text/plain; charset="UTF-8"
Message-Id: <20220826062729.ACF1255D46@rfcpa.amsl.com>
Date: Thu, 25 Aug 2022 23:27:29 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/auth48archive/opx4U7RyxthcoMEFJkYgTJRH4JI>
Subject: Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster-rep-12> for your review
X-BeenThere: auth48archive@rfc-editor.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Archiving AUTH48 exchanges between the RFC Production Center, the authors, and other related parties" <auth48archive.rfc-editor.org>
List-Unsubscribe: <https://mailman.rfc-editor.org/mailman/options/auth48archive>, <mailto:auth48archive-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/auth48archive/>
List-Post: <mailto:auth48archive@rfc-editor.org>
List-Help: <mailto:auth48archive-request@rfc-editor.org?subject=help>
List-Subscribe: <https://mailman.rfc-editor.org/mailman/listinfo/auth48archive>, <mailto:auth48archive-request@rfc-editor.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Aug 2022 06:27:45 -0000
Authors, While reviewing this document during AUTH48, please resolve (as necessary) the following questions, which are also in the XML file. 1) <!-- [rfced] Abbreviated title (running header in PDF output): Because "REP" as an abbreviation for "Robots Exclusion Protocol" appears to be new to RFCs (we only found "Remote Encode Program (REP)" in RFC 5 (published June 1969)), we changed "REP" to "Robots Exclusion Protocol (REP)". Please let us know if you have any concerns. Current: RFC 9309 Robots Exclusion Protocol (REP) August 2022 --> 2) <!-- [rfced] Please insert any keywords (beyond those that appear in the title) for use on <https://www.rfc-editor.org/search>. --> 3) <!-- [rfced] Abstract: Please confirm that "1996" is as desired. The first several "hits" for a "what is the Robots Exclusion Protocol?" Google search seem to indicate that the year is 1994, and we see "the original 1994 'A Standard for Robot Exclusion' document" on <https://www.robotstxt.org/robotstxt.html>. Original: This document specifies and extends the "Robots Exclusion Protocol" method originally defined by Martijn Koster in 1996 for service owners to control how content served by their services may be accessed, if at all, by automatic clients known as crawlers. --> 4) <!-- [rfced] Sections 2.2 and subsequent: We were recently informed that, in the context of HTTP, the hyphenated noun form of "user-agent" could be interpreted as referring to the User-Agent header field. We see several instances in this document where usage might create confusion for some. Please review, and let us know if we may (1) change the noun forms of "user-agent" to "user agent" (lowercase and in text only) and (2) change "user-agent" to "User-Agent" where referring to the header (RFC 3568 is the only published RFC that uses "user-agent header"). Also, we see "; for UAs" in the ABNF. "UA(s)" is only used once. May we change this to "; for user agents"? Original: group = startgroupline ; We start with a user-agent *(startgroupline / emptyline) ; ... and possibly more ; user-agents *(rule / emptyline) ; followed by rules relevant ; for UAs ... | user-agent: Mozilla/5.0 | user-agent: | ... there is more than one group matching the user-agent, the matching ... | user-agent: ExampleBot | user-agent: | ... found amongst the rules in a group for a matching user-agent, or ... * *: A group that's relevant to all user-agents that don't have an ... * foobot: A regular case. A single user-agent followed by rules. --> 5) <!-- [rfced] Table 1: The title of this table is a bit longer than most. May we move the "Note that ..." sentence to a separate paragraph just after the table? Original: Table 1: Example of a user-agent HTTP header and robots.txt user-agent line for the ExampleBot product token. Note that the product token (ExampleBot) is a substring of the user-agent HTTP header Crawlers MUST use ... Possibly: Table 1: Example of a User-Agent HTTP header and robots.txt user-agent line for the ExampleBot product token Note that the product token (ExampleBot) is a substring of the User-Agent HTTP header. Crawlers MUST use ... --> 6) <!-- [rfced] Section 2.2.1: Does the "MUST" also apply to "and then obey the rules" (in which case the comma should be removed)? Original: Crawlers MUST use case-insensitive matching to find the group that matches the product token, and then obey the rules of the group. --> 7) <!-- [rfced] Please note that we have changed US-ASCII to ASCII, as our understanding is "ASCII" is more correct and it aligns with <https://www.rfc-editor.org/materials/terms-online.txt>. Original: Octets in the URI and robots.txt paths outside the range of the US- ASCII coded character set, and those in the reserved range defined by [RFC3986], MUST be percent-encoded as defined by [RFC3986] prior to comparison. If a percent-encoded US-ASCII octet is encountered in the URI, it MUST be unencoded prior to comparison, unless it is a reserved character in the URI as defined by [RFC3986] or the character is outside the unreserved character range. --> 8) <!-- [rfced] Even though these are examples, may we change "http:" and "http%" to "https:" and "https%"? Suggested: | /foo/bar?baz= | /foo/bar?baz= | /foo/bar?baz= | | https://foo.bar | https%3A%2F%2Ffoo.bar | https%3A%2F%2Ffoo.bar | --> 9) <!-- [rfced] Table 5: Does "an end of line comment" mean "an end of a line comment", "an EOL comment", or something else? Will the meaning be clear to readers? Original: | "#" | Designates an end | "allow: / # comment in line" | | | of line comment. | | | | | "# comment on its own line" | --> 10) <!-- [rfced] Section 2.2.4: Please confirm that "typos" (and not "types") is correct. Original (the previous sentence is included for context): Crawlers MAY be lenient when interpreting other records. For example, crawlers may accept common typos of the record. --> 11) <!-- [rfced] Section 2.2.4: We found this sentence confusing. Because this text is part of Section 2, should a different subsection be specified? The only other forms of "explicit" are in Table 3 (Section 2.2.1) and Section 5.1. Original: Parsing of other records MUST NOT interfere with the parsing of explicitly defined records in Section 2. --> 12) <!-- [rfced] Section 2.3.1.2: Should "HTTP 301 and HTTP 302" be "HTTP 301 or HTTP 302", or must both status codes be generated? Also, as written, the comma before "as defined" in the second sentence appears to indicate that "as defined in [RFC1945]" refers to "SHOULD follow at least five consecutive redirects". Please confirm that this is correct. We ask because we see "A user agent should never automatically redirect a request more than 5 times, since such redirections usually indicate an infinite loop" in Section 9.3 of RFC 1945. Original: It's possible that a server responds to a robots.txt fetch request with a redirect, such as HTTP 301 and HTTP 302 in case of HTTP. The crawlers SHOULD follow at least five consecutive redirects, even across authorities (for example, hosts in case of HTTP), as defined in [RFC1945]. --> 13) <!-- [rfced] Section 2.3.1.4: This sentence reads oddly. Should "has a response code" be "would result in a response code", or does the robots.txt file contain these response codes? Original: For example, in the context of HTTP, an unreachable robots.txt has a response code in the 500-599 range. --> 14) <!-- [rfced] [ROBOTSTXT]: We found the original title a bit confusing at first, as it doesn't match what we found when we clicked on the provided link. We had to drill down a bit in order to find mention of the Robots Exclusion Protocol. May we update this listing as follows? Original: [ROBOTSTXT] "Robots Exclusion Protocol", n.d., <http://www.robotstxt.org/>. Possibly: [ROBOTSTXT] "The Web Robots Pages (including /robots.txt)", 2007, <https://www.robotstxt.org/>. --> 15) <!-- [rfced] [SITEMAPS]: The title shown in the original listing does not match the title of the page we found when we clicked on the provided link. Also, we see both "Sitemap protocol" and "Sitemaps protocol" on <https://www.sitemaps.org/protocol.html>; is one form preferred over the other? Original: [SITEMAPS] "Sitemaps Protocol", n.d., <https://www.sitemaps.org/index.html>. Possibly: [SITEMAPS] "What are Sitemaps? (Sitemap protocol)", April 2020, <https://www.sitemaps.org/index.html>. --> 16) <!-- [rfced] Please review the "Inclusive Language" portion of the online Style Guide at <https://www.rfc-editor.org/styleguide/part2/#inclusive_language>, and let us know if any changes are needed (for example, whitespace). --> 17) <!-- [rfced] Please let us know if any changes are needed for the following: a) The following terms were used inconsistently in this document. We chose to use the latter forms. Please let us know any objections. Rules of type "allow" and "disallow": We added quotes for consistency. Original: allow and disallow rules "disallow" and "allow" rules User-agent: (1 instance: example in Section 5.1) / User-Agent: b) The following term appears to be used inconsistently in this document. Please let us know which form is preferred. UTF8 encoded / UTF-8 encoded (We also see (for example) "MUST be percent-encoded as defined", so we suggest "UTF8-encoded" for consistency of style.) c) Single versus double quotes: We changed single quotes to double quotes for consistency. Please review, and let us know any objections. Examples from original: file named 'robots.txt' file named "/robots.txt" ; excluding control, space, '#' the '*' the "*" value The * character (added the quotes in this case) --> Thank you. RFC Editor On Aug 25, 2022, at 11:23 PM, rfc-editor@rfc-editor.org wrote: *****IMPORTANT***** Updated 2022/08/25 RFC Author(s): -------------- Instructions for Completing AUTH48 Your document has now entered AUTH48. Once it has been reviewed and approved by you and all coauthors, it will be published as an RFC. If an author is no longer available, there are several remedies available as listed in the FAQ (https://www.rfc-editor.org/faq/). You and you coauthors are responsible for engaging other parties (e.g., Contributors or Working Group) as necessary before providing your approval. Planning your review --------------------- Please review the following aspects of your document: * RFC Editor questions Please review and resolve any questions raised by the RFC Editor that have been included in the XML file as comments marked as follows: <!-- [rfced] ... --> These questions will also be sent in a subsequent email. * Changes submitted by coauthors Please ensure that you review any changes submitted by your coauthors. We assume that if you do not speak up that you agree to changes submitted by your coauthors. * Content Please review the full content of the document, as this cannot change once the RFC is published. Please pay particular attention to: - IANA considerations updates (if applicable) - contact information - references * Copyright notices and legends Please review the copyright notice and legends as defined in RFC 5378 and the Trust Legal Provisions (TLP – https://trustee.ietf.org/license-info/). * Semantic markup Please review the markup in the XML file to ensure that elements of content are correctly tagged. For example, ensure that <sourcecode> and <artwork> are set correctly. See details at <https://authors.ietf.org/rfcxml-vocabulary>. * Formatted output Please review the PDF, HTML, and TXT files to ensure that the formatted output, as generated from the markup in the XML file, is reasonable. Please note that the TXT will have formatting limitations compared to the PDF and HTML. Submitting changes ------------------ To submit changes, please reply to this email using ‘REPLY ALL’ as all the parties CCed on this message need to see your changes. The parties include: * your coauthors * rfc-editor@rfc-editor.org (the RPC team) * other document participants, depending on the stream (e.g., IETF Stream participants are your working group chairs, the responsible ADs, and the document shepherd). * auth48archive@rfc-editor.org, which is a new archival mailing list to preserve AUTH48 conversations; it is not an active discussion list: * More info: https://mailarchive.ietf.org/arch/msg/ietf-announce/yb6lpIGh-4Q9l2USxIAe6P8O4Zc * The archive itself: https://mailarchive.ietf.org/arch/browse/auth48archive/ * Note: If only absolutely necessary, you may temporarily opt out of the archiving of messages (e.g., to discuss a sensitive matter). If needed, please add a note at the top of the message that you have dropped the address. When the discussion is concluded, auth48archive@rfc-editor.org will be re-added to the CC list and its addition will be noted at the top of the message. You may submit your changes in one of two ways: An update to the provided XML file — OR — An explicit list of changes in this format Section # (or indicate Global) OLD: old text NEW: new text You do not need to reply with both an updated XML file and an explicit list of changes, as either form is sufficient. We will ask a stream manager to review and approve any changes that seem beyond editorial in nature, e.g., addition of new text, deletion of text, and technical changes. Information about stream managers can be found in the FAQ. Editorial changes do not require approval from a stream manager. Approving for publication -------------------------- To approve your RFC for publication, please reply to this email stating that you approve this RFC for publication. Please use ‘REPLY ALL’, as all the parties CCed on this message need to see your approval. Files ----- The files are available here: https://www.rfc-editor.org/authors/rfc9309.xml https://www.rfc-editor.org/authors/rfc9309.html https://www.rfc-editor.org/authors/rfc9309.pdf https://www.rfc-editor.org/authors/rfc9309.txt Diff file of the text: https://www.rfc-editor.org/authors/rfc9309-diff.html https://www.rfc-editor.org/authors/rfc9309-rfcdiff.html (side by side) Diff of the XML: https://www.rfc-editor.org/authors/rfc9309-xmldiff1.html The following files are provided to facilitate creation of your own diff files of the XML. Initial XMLv3 created using XMLv2 as input: https://www.rfc-editor.org/authors/rfc9309.original.v2v3.xml XMLv3 file that is a best effort to capture v3-related format updates only: https://www.rfc-editor.org/authors/rfc9309.form.xml Tracking progress ----------------- The details of the AUTH48 status of your document are here: https://www.rfc-editor.org/auth48/rfc9309 Please let us know if you have any questions. Thank you for your cooperation, RFC Editor -------------------------------------- RFC9309 (draft-koster-rep-12) Title : Robots Exclusion Protocol Author(s) : M. Koster, Ed., G. Illyes, Ed., H. Zeller, Ed., L. Sassman, Ed. WG Chair(s) : Area Director(s) :
- [auth48] AUTH48: RFC-to-be 9309 <draft-koster-rep… rfc-editor
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… rfc-editor
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Gary Illyes
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Lizzi Sassman
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Martijn Koster
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Gary Illyes
- [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <draft-… Lynne Bartholomew
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Gary Illyes
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Lynne Bartholomew
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Gary Illyes
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Murray S. Kucherawy
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Gary Illyes
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Gary Illyes
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Murray S. Kucherawy
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Gary Illyes
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Murray S. Kucherawy
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Gary Illyes
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Gary Illyes
- Re: [auth48] *[AD] Re: AUTH48: RFC-to-be 9309 <dr… Gary Illyes
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Lynne Bartholomew
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Lynne Bartholomew
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Gary Illyes
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Lynne Bartholomew
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Lynne Bartholomew
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Martijn Koster
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Gary Illyes
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Lynne Bartholomew
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Gary Illyes
- Re: [auth48] AUTH48: RFC-to-be 9309 <draft-koster… Lynne Bartholomew