Re: [Last-Call] Last Call: <draft-koster-rep-08.txt> (Robots Exclusion Protocol) to Proposed Standard

Gary Illyes <garyillyes@google.com> Fri, 20 May 2022 15:23 UTC

Return-Path: <illyes@google.com>
X-Original-To: last-call@ietfa.amsl.com
Delivered-To: last-call@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A705EC1850ED for <last-call@ietfa.amsl.com>; Fri, 20 May 2022 08:23:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.1
X-Spam-Level:
X-Spam-Status: No, score=-17.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4I__ShbOs37I for <last-call@ietfa.amsl.com>; Fri, 20 May 2022 08:23:08 -0700 (PDT)
Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 003CBC1850E9 for <last-call@ietf.org>; Fri, 20 May 2022 08:23:07 -0700 (PDT)
Received: by mail-lj1-x231.google.com with SMTP id a23so10012364ljd.9 for <last-call@ietf.org>; Fri, 20 May 2022 08:23:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9fWfBizt+/Ee9yWArAbqAi4b2WMgUB4GSWp8nnEYbVc=; b=l1GF6Y4GiD/TJ3zN2i/HxnYK3Pk8g9sxW5DiZYoQ14YDyDEuivqUH++PNz72RUBR1O o5O4Ijg3zj+xuZawV9S6hopqRzB1O0ii/o5pUfTKHWLsaresBzt5mRaJhZINZS/1jQ2i DSe1nVpeHrZqIjuxmCYViNXxpFJ7dz6UBLRXUkWvswrFvMTeQ0FYUYERETommAsmjf2Y XBmWZhhx7k7W4MFt//ILzbSIdBheNNO2JoAQF9XwEWuF9AMUgfu7K34Ecx3OM53vA5Va vrhFuzihhXzCgcW2N6MXjAy3TpIEuHh3RRM8zm3VBlQy0QiGlzXOurIqJBvKz6weORIa 1Wzw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9fWfBizt+/Ee9yWArAbqAi4b2WMgUB4GSWp8nnEYbVc=; b=nG6hpr5XCwuyWndmbcUAvzNCIWnShbXr8KjnFCHZlpPhwKvdp7y9zwapO19lMbvwXe FNA96JNGT92f88sXinU4rcfvO1mt1XgCox9k7VwKPmFosOeXrZ5ntoyqbVAHK19tOFuo Wv6vWpc20MStCN/5XnE7GDUJRtej2l2d9euC5RDYICp2EMzwv2DsCLG6kAY00RlYR4ax zaczxNpVjDVsz11rHmOSdB8RiPKKTttJgB8mh9gc0otJn1kTDEy81g0leNvBV2p19ZJV IyhsvhTpCqVVigs4Rs90k/+g3UPtD9yGCtV+vtb6S9R4TXIO/vB6hvfp83vOH/SvmVLw nhng==
X-Gm-Message-State: AOAM532l526Ufydq6RpuwoX2k2a4xEnujzrNFVHz9fPJcwNZEpiteJnJ MWNW5MZiJodNJwAWwy3XizzJ5b67l6Z9HCXTw71pfE3DP0cqmQ==
X-Google-Smtp-Source: ABdhPJwqaf5VFZzYY04N8YmlJspRIReETL07XK0B+i8wWwI66nRRk8OhISaPARC0irRvLJ6TLAJ8JoqK5ndX9PA7xSA=
X-Received: by 2002:a2e:702:0:b0:253:bc1a:8a8c with SMTP id 2-20020a2e0702000000b00253bc1a8a8cmr5743526ljh.128.1653060185027; Fri, 20 May 2022 08:23:05 -0700 (PDT)
MIME-Version: 1.0
References: <165177144451.21157.981700432753087513@ietfa.amsl.com> <20220505203458.5BF303F6C1E9@ary.qy>
In-Reply-To: <20220505203458.5BF303F6C1E9@ary.qy>
From: Gary Illyes <garyillyes@google.com>
Date: Fri, 20 May 2022 17:22:53 +0200
Message-ID: <CADTQi=eb5iKg3q+CK3Jfy_7mK1iUZGhMD90+rzb_1iHhsCbHsw@mail.gmail.com>
To: John Levine <johnl@taugh.com>
Cc: last-call@ietf.org
Content-Type: multipart/alternative; boundary="0000000000001fced705df731128"
Archived-At: <https://mailarchive.ietf.org/arch/msg/last-call/MAwMmgm4DAHCO-AUHBzkxyrqllY>
Subject: Re: [Last-Call] Last Call: <draft-koster-rep-08.txt> (Robots Exclusion Protocol) to Proposed Standard
X-BeenThere: last-call@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: IETF Last Calls <last-call.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/last-call>, <mailto:last-call-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/last-call/>
List-Post: <mailto:last-call@ietf.org>
List-Help: <mailto:last-call-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/last-call>, <mailto:last-call-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2022 15:23:08 -0000

Hi John,

Thanks for the encouraging words regarding the standardization of this
protocol.

crawl-delay is a highly useless signal to deal with from the perspective of
a crawler and most crawlers decided to rather use a visual setting (i.e. a
slider) in a tool such as the Yandex.Webmaster tool and Google Search
Console a long time ago. That's why we decided to not even mention it in
the draft. We did explicitly allow implementors to define their own custom
rules, but we brought up the sitemap rule as an example. We can add
crawl-delay as a secondary example for a custom rule if you think that
would be useful

The Sitemap rule is very useful in general, however it cannot be a
directive, which the explicitly defined "disallow" and "allow" are. This is
because different implementors approach sitemap ingestion vastly
differently to protect their own systems; for example, one search engine
might not ingest "anonymous" sitemap submissions such as the robots.txt
submissions at all, or they might not ingest from spam ridden domains at
all. Basically it's merely a hint, so we decided to give a nod to sitemaps
for it's one of the most prominent rules in existing robots.txt files, and
just allow implementors to decide how they're implementing and using this
custom rule.

Happy to chat more about this, however each of these decisions were made
after months of discussions and agreed on by large consumers of robots.txt
files.

On Thu, May 5, 2022 at 10:35 PM John Levine <johnl@taugh.com> wrote:

> It appears that The IESG  <last-call@ietf.org> said:
> >
> >The IESG has received a request from an individual submitter to consider
> the
> >following document: - 'Robots Exclusion Protocol'
> >  <draft-koster-rep-08.txt> as Proposed Standard
> >
> >The IESG plans to make a decision in the next few weeks, and solicits
> final
> >comments on this action. Please send substantive comments to the
> >last-call@ietf.org mailing lists by 2022-06-02. Exceptionally, comments
> may
> >be sent to iesg@ietf.org instead. In either case, please retain the
> beginning
> >of the Subject line to allow automated sorting.
>
> I'm glad to see that the authors agreed to move this to standards
> track where it belongs, and I encourage you to publish it.
>
> Nits: it would be nice if it explicitly described the Crawl-delay: and
> Sitemap: lines which are widely implemented even if not by the
> authors' employer.
>
> R's,
> John
>
> --
> last-call mailing list
> last-call@ietf.org
> https://www.ietf.org/mailman/listinfo/last-call
>