Re: [Last-Call] Last Call: <draft-koster-rep-06.txt> (Robots Exclusion Protocol) to Informational RFC

Mark Nottingham <mnot@mnot.net> Tue, 08 March 2022 11:20 UTC

Return-Path: <mnot@mnot.net>
X-Original-To: last-call@ietfa.amsl.com
Delivered-To: last-call@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D096C3A0CCB for <last-call@ietfa.amsl.com>; Tue, 8 Mar 2022 03:20:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=mnot.net header.b=yM0vmiCO; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=nh2qvbTw
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wPONw3hmVVfx for <last-call@ietfa.amsl.com>; Tue, 8 Mar 2022 03:20:19 -0800 (PST)
Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9063B3A0CB7 for <last-call@ietf.org>; Tue, 8 Mar 2022 03:20:19 -0800 (PST)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id B928A5C020F; Tue, 8 Mar 2022 06:20:18 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Tue, 08 Mar 2022 06:20:18 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mnot.net; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; bh=JkJzEnrc2rrGRh Hwsi7CYgB2RjNYmXccB0eEdQzsT8c=; b=yM0vmiCOlTms0chTsjsH46093DRH22 TkbBj9V/w1WoKgySuWWnydZHBWoX2O1GaXQ+uiN8881ZnGejwrvgaWBoS8HsWAYS LQ1uHr7qa7H1pbkDfqFjD2kzei4tFnCgPt+5j/kDk+CDwB0ZMe0qdKiyUhlHc8q4 6y1UL2a7a5cXFDDaPCY/eXSYcXLCK+gFUHWIPgwV7Irj+udR8NxjGzcYOTh1satN +O2BalYOzonyC4UCMgjTwTtjZ1mr76fGTf72Y8EO24OYwwzg4zfSmrZ8pCUXob87 /ZY9cvrsSHohMr6gfghVCYW5TjG8oIzrj5EAbzs1SYQatgy0pQlvM8eg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=JkJzEnrc2rrGRhHwsi7CYgB2RjNYmXccB0eEdQzsT 8c=; b=nh2qvbTwTCZ4gJZ6ynCRlFy8yaVVFe8qpfkn6RC/RCuG3ZOtWUdk+xoR5 wkZr1zg1LGIVPQj2VydkWulPPkSnOzCLyYqX/pvykx1POmUgKTAWugQhL0lvddSH 8/rsuLJfl7SjYXeI01BuaPdavMwQnuIHD+M92R4jDzo2EkZr4ZE/8SyiNzon4JNQ lXP1Lh3CB4FFAUmYT+pzXV/a1v/sbQA2MpMkZbuccBnHnvzWTBOlUZzx6ykPMvWl N7Hy/FsE7lvqQ7+dSovnMu5rOf9bIpw4INdyZiN5c47gy9msxzQDkCWAYDAr25E7 EdNWJEi8BC3RdGhKLxbL+QZtAlUcA==
X-ME-Sender: <xms:8jsnYp6oWWZUrNKqgDdVDeAQjyfzZOwztmFzA5bsUtBGsjSclFABCA> <xme:8jsnYm7bSN9YzXDdi5DzIIaZl5TBBJ_-uYI0_Pdmz85-MJiQdlwvVrO7byOTSeH7T ihEecxTTla-Nkm84Q>
X-ME-Received: <xmr:8jsnYgea8itbOzOW_UhRYAxZUi7WLcYmBoQLoi8JBdnwD_P27Rm7agih1VLKNaJET6Z3d3rX_E-FZss3o90sR07XBQ70v2pwbODxcWKigxV-gO3J4GSb54XL>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudduiedgvdeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpegtggfuhfgjfffgkfhfvffosehtqh hmtdhhtddvnecuhfhrohhmpeforghrkhcupfhothhtihhnghhhrghmuceomhhnohhtsehm nhhothdrnhgvtheqnecuggftrfgrthhtvghrnhepvefffffhudetveevhfeuffeigedtue dtheffleetffeftddtgeegjeehieeuteetnecuffhomhgrihhnpehmnhhothdrnhgvthen ucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmnhhoth esmhhnohhtrdhnvght
X-ME-Proxy: <xmx:8jsnYiJLBQJp0CCvS-NdGLetTMO1hyhFnzAL-3W9ZYLQoGBAI6q7oQ> <xmx:8jsnYtJNXZl2fJH9OxrgqLi50OAJY_r_FoInm3kDlZhwfdH-mCciAg> <xmx:8jsnYrxqI3uoDDC8o9aVV3qxnOeZUp9ZKb9_B3GCK9hXrm7FuGqutg> <xmx:8jsnYt3rr_Wx__DqjiBuEiaeQkvNOgWH9qibG9oV4piMkTdZtmwW9A>
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 8 Mar 2022 06:20:17 -0500 (EST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.60.0.1.1\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CA+9kkMAnmoJ0n3mPscZvc6kbyOZjQU78vb+iA0Pw5Qq=_kKZEw@mail.gmail.com>
Date: Tue, 8 Mar 2022 22:20:14 +1100
Cc: last-call@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <57BC7375-B52A-4499-91DE-4D3D5F8D755B@mnot.net>
References: <20220228222932.825F33844270@ary.qy> <245C65D2-EC38-4C49-9CA0-3DD687CB37DA@mnot.net> <CA+9kkMAnmoJ0n3mPscZvc6kbyOZjQU78vb+iA0Pw5Qq=_kKZEw@mail.gmail.com>
To: Ted Hardie <ted.ietf@gmail.com>
X-Mailer: Apple Mail (2.3693.60.0.1.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/last-call/0MgiKsmRveC9RkiwLB1yp6faoPM>
Subject: Re: [Last-Call] Last Call: <draft-koster-rep-06.txt> (Robots Exclusion Protocol) to Informational RFC
X-BeenThere: last-call@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Last Calls <last-call.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/last-call>, <mailto:last-call-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/last-call/>
List-Post: <mailto:last-call@ietf.org>
List-Help: <mailto:last-call-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/last-call>, <mailto:last-call-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Mar 2022 11:20:25 -0000

Hi Ted,

Thanks for the response, especially since this is not strictly your problem any more. Responses below.


> On 8 Mar 2022, at 8:18 pm, Ted Hardie <ted.ietf@gmail.com> wrote:
> 
> Hi Mark,
> 
> On Mon, Feb 28, 2022 at 10:56 PM Mark Nottingham <mnot@mnot.net> wrote:
>> 
>> 
>> > On 1 Mar 2022, at 9:29 am, John Levine <johnl@taugh.com> wrote:
>> > 
>> > Most importantly, the copyright license is broken. At the top it has
>> > the "no derivatives" license, which is fine,
>> 
>> Ah - I missed, that, thanks for pointing it out. 
>> 
>> I'm uncomfortable leaving change control for a key interoperability mechanism in the search market in the hands of one competitor, yet blessing it as part of the IETF stream. I think the IETF as a whole should be uncomfortable with that too, given current competition enforcement trends.
> 
> Having the original author of the spec be the principal author here is a bit of a bulwark against that, as I don't believe he is or would be interested in handing change control over to Google. I also believe Gary and the other authors have reached out to the rest of the relevant community (though my change in employer means I know longer have the relevant e-mails to cite).

Hmm. Because Google employees are co-authors in a joint work, my understanding is that they have the ability to publish derivative works in the future, at least in the US. If that's true, it does give them a form of change control -- or at least significant privilege (regarding the future of the spec) over other members of the community.

 (Obviously, one would need to talk to a copyright lawyer -- my understanding here is informed by a similar situation in a different venue)


> On the more general topic of why this has the "no derivatives" clause,  I understand your reluctance, but I think this is a case where the combination is valid.  First, it's important to note that the specification was brought to the IETF for substantive review, to make sure that the elements it uses (like ABNF) were being used in the right way and to eliminate any possibility of ambiguity.  From my perspective, that's been very useful and it would not have occurred to the same extent had this gone directly to the ISE. 

I find this a bit surprising -- surely it's possible to get adequate review for ISE documents without putting them into the IETF stream? Otherwise, the Independent stream would jeopardise the quality of the RFC Series overall... Was the ISE involved in this discussion?


> However, this spec reflects operations which have been stable/backwards compatible for a very long time.  Given that, it is important to the community which deploys this that it be fairly difficult to amend.  One way to achieve that would have been to make this standards track; that would require standards action to update or obsolete it later.  When we discussed that back at the beginning of this process, though, it was pretty clear  that some folks would use the working group discussion around that to try to insert functionality that would result in breaking changes.  While it would have been kind of unlikely for any of those to win out against the need for maintaining interoperability, the result would have been a pretty big increase in the amount of effort needed to get this published.

This is the rub -- depending on your definitions of "the community" and "some folks" in the statement above, the outcome might be completely reasonable and justified, or blatantly illegitimate. 

It's also notably out of step with the direction that pretty much all other Internet and Web standards are taking. HTML, DOM, and many other aspects of the platform have considerable requirements for stability and backwards compatibility, and yet they are not locked behind a no-derivatives clause. 


> Another option for getting an archival spec with a high bar for change was this one:  an IETF informational with a no-derivatives clause. That gave the full benefit of IETF review and made the bar for amendment high  enough to allay the concerns of the original author and the relevant community.  It had this clause when Adam agreed to sponsor it and it has had it in every iteration since, so I thought this was well understood.  As shepherd, my apologies if it was not.
> 
> There is another option that gets the full set of characteristics needed:  AD sponsored on the standards track.  At the time this went through the first set of discussions that was something folks had become very reluctant to do.  If it is on the table, I personally believe that a standards track document with the usual clauses would work as well.  Those can't be superseded or amended without serious work and plenty of time for the relevant community to chip in.  
> 
> But, absent that, I think this kind of document is why BCP78 permits this combination: documents which need and have received significant IETF review but which also have a significant external community for whom the usual clauses result in a risk of inappropriate later amendments. To put this slightly differently, I think you'll see that this falls under the logic in RFC 5378, Section 3, in the penultimate paragraph. 

Assuming that you're referring to this paragraph in s 3.3:

~~~
   The IETF has historically encouraged organizations to publish details
   of their technologies, even when the technologies are proprietary,
   because understanding how existing technology is being used helps
   when developing new technology.  But organizations that publish
   information about proprietary technologies are frequently not willing
   to have the IETF produce revisions of the technologies and then
   possibly claim that the IETF version is the "new version" of the
   organization's technology.  Organizations that feel this way can
   specify that a Contribution be published with the other rights
   granted under this document but may withhold the right to produce
   derivative works other than translations.
~~~

... then the question is whether the robots.txt format is really 'proprietary' technology, or whether it's a public good. Given its wide deployment and use as what amounts to an API for search engines to interoperate with Web sites, I struggle to see it as the former.


> If you and the broader community prefer the standards track approach, now would be a good time to let the sponsoring AD know.

To be clear, I think that this document is almost certainly a reasonable record of how the robots.txt format works today, and that its authors are acting in good faith. However, given the circumstances I'm concerned that from the 'outside', publishing the document in this manner won't look legitimate -- and therefore call into question the legitimacy of the IETF itself, in some eyes.

I think there are a few different ways we could address that while meeting the authors' goals.

0) I still believe that removing the no-derivatives clause is the most straightforward to do so. TCP, QUIC, HTTP, and many other Internet specifications remain stable and backwards compatible without the benefit of a no-derivatives clause; I don't see how robots.txt is different.

1) Alternatively, statements from at least some other search engines that they are aware of this work and do not object to it being published would change how this action is perceived considerably. Ideally, this would be represented by adding authors from other search engines to the document.

2) Or, since the document has now been reviewed for ABNF, nothing stops it from being switched to the Independent Stream with a title like "The Google Robots Exclusion Protocol" (to reflect its 'proprietary' nature).

Cheers,


--
Mark Nottingham   https://www.mnot.net/