[ai-control] Re: WG Last Call: draft-ietf-aipref-vocab-03 (Ends 2025-09-18)
Mark Nottingham <mnot@mnot.net> Wed, 10 September 2025 02:55 UTC
Return-Path: <mnot@mnot.net>
X-Original-To: ai-control@mail2.ietf.org
Delivered-To: ai-control@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id 2ADD6602CF12 for <ai-control@mail2.ietf.org>; Tue, 9 Sep 2025 19:55:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: -2.799
X-Spam-Level:
X-Spam-Status: No, score=-2.799 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (2048-bit key) header.d=mnot.net header.b="covaDkuI"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="gbDjKq81"
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sazUaVWrCrcn for <ai-control@mail2.ietf.org>; Tue, 9 Sep 2025 19:55:01 -0700 (PDT)
Received: from fhigh-a8-smtp.messagingengine.com (fhigh-a8-smtp.messagingengine.com [103.168.172.159]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail2.ietf.org (Postfix) with ESMTPS id 44174602CF01 for <ai-control@ietf.org>; Tue, 9 Sep 2025 19:55:01 -0700 (PDT)
Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfhigh.phl.internal (Postfix) with ESMTP id DE4811400149; Tue, 9 Sep 2025 22:54:55 -0400 (EDT)
Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-02.internal (MEProxy); Tue, 09 Sep 2025 22:54:55 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mnot.net; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1757472895; x=1757559295; bh=v3HCM8tQ+3hfm8mBWHOIwwimSYmuAHTWeLIZxdMG+g0=; b= covaDkuIuKT7dI5FPwXy3PayUbmznPlxWyusl0IDexML2wF6vTsJydefNXlGbExU FKFybDs07onhU64CJesIApHddOuhWVAVBa50EYQBvMn/Ztf2nbYRKxH2v7XdBs7t aE+auzijkWyVg4H13EKtuUoA5QD5IboP6Jd9t7S0jDLBTTEqIrwPqTpdrkG3HQQp cF65K+Bd71MXXMkRt3LsHwdgB4CtxdprQJ22rP/1lzzC/B4WfsIE569yib2EkP0e pG1cV5SQFpRS8UkeC5FqTAqWremxyfN8yUHeXiFNh1zTSfIVuRL1TxBNCbitf48i oxm+mVEBNsxMwww7+3eVtA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1757472895; x= 1757559295; bh=v3HCM8tQ+3hfm8mBWHOIwwimSYmuAHTWeLIZxdMG+g0=; b=g bDjKq81kgfumAjdQUNYyw/xxk9/ia9veQleotJRFOrIEd1+nVO4rVNIeD8nxFZy8 gQ5O7lJXVq6uHBxAbuBZAp7GFG7iSsPw4CS8dV2n9AXeYdZcGx2AD2H0+pDWYSIa WO/8KMkeVqaGj6tbgBsPjRiMFEbAfDddybeKXjzDkcIsA+pARNXhUM/khHOHS5Cx 2/fJ1+nC893Gcnpgke4jiJC+ntd72Ni76X0ewdiHrsQzL4TYlA0T1owjCDvEuwQG gc79UCcPKVpxdep/y8GyOWDqmmWv1Kw5xE7owenDOZqy0O1p7FeP/JFEQWGPGgbT K83/wiSxveppgn2yt+eCw==
X-ME-Sender: <xms:f-jAaJ80PT-7xXYswePku3dQYdvksZztncVnqexbbZVNJ1WuE-7S3g> <xme:f-jAaC9L6V3o7tJ_aicGTR6ySC4jw4kzrQxRkVMB2IYbIzCkomhn8mYJTizWp5Ib7 cNmNaooZZUvyGMe1A>
X-ME-Received: <xmr:f-jAaIvcyH8OFuhpDv9R83McjfF5u3N6qeEZ6zTwq6Zlenig2jal_FuNGRaWjK7DtXA3shUUXHAIVQMVgtPxiYfrh21BQf2Qd7zio3DQB68BRaBYEFKFCA>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggddvvddugecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecunecujfgurheptggguffhjgffvefgkfhfvffosehtqhhmtd hhtdejnecuhfhrohhmpeforghrkhcupfhothhtihhnghhhrghmuceomhhnohhtsehmnhho thdrnhgvtheqnecuggftrfgrthhtvghrnhepueetffdvgeelhedufeehffevieejfeffvd egieeuvdeukefhheejkefggedvuefgnecuffhomhgrihhnpehgihhthhhusgdrtghomhdp ihgvthhfrdhorhhgpdhmnhhothdrnhgvthenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehmnhhothesmhhnohhtrdhnvghtpdhnsggprhgtphht thhopedvpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopegvkhhrsehrthhfmhdrtg homhdprhgtphhtthhopegrihdqtghonhhtrhholhesihgvthhfrdhorhhg
X-ME-Proxy: <xmx:f-jAaFq_F-03KWvnCmnVj_IxvhMNIWs6ZRSD-MfsMWwzQGxZKHUurw> <xmx:f-jAaFme2eOWoAjcLpSWfMSdAKaPGBH23eQ_0JFw2th27XqMDL0HWg> <xmx:f-jAaJzfvpkT_5aTCYLpKQOmuj8lIrtBueyMYqkDnckCBMkds2Jckg> <xmx:f-jAaNlIk_w1Ddj8DSjsV6jX9uk7uxdPDEnlqawgBtfB9r6DtAipPA> <xmx:f-jAaG6QEr__5FP7JWZTJmenZXHJHSS4KDHwwfNOlkpYfGFzGq9wH96F>
Feedback-ID: ie6694242:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 9 Sep 2025 22:54:54 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CABcZeBNW4cy1EBXzx1A=qgWS+F4a8zJFyxEP33XEBNSimt9UMw@mail.gmail.com>
Date: Wed, 10 Sep 2025 12:54:51 +1000
Content-Transfer-Encoding: quoted-printable
Message-Id: <7AFC1626-0E0B-4017-ABA5-3006A8720687@mnot.net>
References: <175703816389.1311.5141574230046433427@dt-datatracker-f7c8fdcb7-pjx77> <CABcZeBNW4cy1EBXzx1A=qgWS+F4a8zJFyxEP33XEBNSimt9UMw@mail.gmail.com>
To: Eric Rescorla <ekr@rtfm.com>
X-Mailer: Apple Mail (2.3826.700.81)
Message-ID-Hash: UABEKSOGSODOIJO4HBMMYPLM2ILUJBCX
X-Message-ID-Hash: UABEKSOGSODOIJO4HBMMYPLM2ILUJBCX
X-MailFrom: mnot@mnot.net
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: ai-control@ietf.org
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [ai-control] Re: WG Last Call: draft-ietf-aipref-vocab-03 (Ends 2025-09-18)
List-Id: AI Control <ai-control.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ai-control/v-FTNhqoYrlYOaqvEz0KLKRjbP0>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ai-control>
List-Help: <mailto:ai-control-request@ietf.org?subject=help>
List-Owner: <mailto:ai-control-owner@ietf.org>
List-Post: <mailto:ai-control@ietf.org>
List-Subscribe: <mailto:ai-control-join@ietf.org>
List-Unsubscribe: <mailto:ai-control-leave@ietf.org>
Hi EKR, I've attempted to capture this in the following issues: https://github.com/ietf-wg-aipref/drafts/issues/151 - Definition of AI https://github.com/ietf-wg-aipref/drafts/issues/152 - Use of 'Machine Learning' https://github.com/ietf-wg-aipref/drafts/issues/153 - Difference between "unknown" and "allowed" https://github.com/ietf-wg-aipref/drafts/issues/154 - Defaults https://github.com/ietf-wg-aipref/drafts/issues/155 - Automated Processing is Too Broad https://github.com/ietf-wg-aipref/drafts/issues/156 - Search's Parent https://github.com/ietf-wg-aipref/drafts/issues/157 - AI Training and Generative AI are Too Broad https://github.com/ietf-wg-aipref/drafts/issues/158 - Bots Collect Data for Multiple Purposes If you see anything I missed, please point it out (either in e-mail, a comment on those, or a new issue). I didn't include the OVERALL prelude in each of the issues that section spawned, but can copy it into a comment on them if you like -- the information is still in the links back to your message. Or would it be better as a separate issue? Cheers, > On 10 Sep 2025, at 9:09 am, Eric Rescorla <ekr@rtfm.com> wrote: > > OVERALL > I don't think this taxonomy is really going in the right direction. I > have some critiques of the specific categories but, more broadly, I > think this is tied to specific technology choices in a way that's > unlikely to age well. I think instead it would be much more useful to > focus on the uses to which the data will be put (i.e., the output of the model), > as that's really the source of unhappiness about these systems. > > Just so we're on the same page modern LLMs work (very approximately) > by training on an enormous corpus of data which sets the model weights > (I'm deliberately conflating pre-training and fine-tuning). They are > then prompted with input and asked to create output based on that > input, as well as the output it's already generated (collectively the > context). As part of that process, some models can also collect new > data and use that as part of the context (RAG). In the context of web > crawling, then we have two ways for data to get into the system: > > - In the training phase and stored in the model weights. > - In the generation phase as part of the context via RAG. > > This is an important technical distinction, but it's not clear why it > matters from the perspective of the site. Consider an AI system which > collects the same corpus as is currently used for pre-training but > only trains on a small portion of it and then when it's asked to > generate content, uses some kind of "internal RAG" to suck in the > relevant documents and generate the output (this is a lot more like > human brains seem to work in my experience, because we just can't > remember much stuff). From the perspective of the site this is the > same: the system is using the site's content to generate new content, > but in the taxonomy of this draft, this isn't AI training or even > generative AI training because it doesn't impact the model > weights. Obviously I have no idea if this is a productive technical > direction, but neither do you, and conformance shouldn't hinge > on whether it turns out to be. > > In my opinion a much more productive direction would be to focus on > the application to which this data is being put. I don't have > anything like a complete story, but it seems to me that we have some > understanding of the categories at either end of the spectrum: > > - Indexing (search): where you attempt to determine which > piece of content the user wants and steer them towards it. > > - Substitution: where the service generates something that > is effectively a substitute good for the content. > > I think these are useful conceptually because the first is the > traditional one that I think people have generally accepted as "good" > and the second is the one that seems to be of most concern. Obviously, > it's not as clear as that because even before generative AI search > systems were producing substitute goods (infoboxes, etc.), but I think > that's a feature of this analysis, because sites often didn't like > that and I think this kind of taxonomy captures that intuition without > worrying about whether the substitute good was generated via some > LLM, deterministic hand-written code, or something in between. > > In between these two, we have some other applications. For instance: > > - Summarization (think AI overviews) this is a partial substitute, but > often it comes with a source link which can steer traffic to the > site. > > - Generating original content that isn't a direct substitute. An > example here might be code generation: the other day I asked an AI > to help write me a scraper for the IETF agenda [0] and while I'm > sure it was inspired by a lot of existing scraping scripts, it's not > like it plagiarized one of them in total or somehow reduced the > demand for them (this is a distinct question from whether it > injected some verbatim code). > > Again, I don't claim that these are the right dividing lines, but I > think that this kind of analysis does a better job of capturing what > is actually concerning to sites about AI. The challenge then becomes > how to turn these into relatively precise definitions. This is > probably harder than the current definitions, but I don't think > precision is a virtue if the definitions aren't actually a good fit to > the problem they are trying to solve (as they say, "for every complex > problem there is an answer that is clear, simple, and wrong"). > > > To that end... > > - As I and others have noted, "automated processing" essentially > sweeps in any web crawler. In particular, as noted by Greg Lindahl, > parsing the document to find links is plainly "automated > processing", so you just can't really have a crawler without > "automated processing", which makes this whole category redundant > with forbidding crawling in the existing robots.txt framework. > > - I don't understand why "search" is somehow a subset of "automated > processing" rather than "AI training". In many if not most cases, > search will involve training an AI model, especially given the very > wide parameters of "AI" (see below). > > - AI training is a coherent category but seems like a trap for the > unwary, because a lot of people are going to turn it on and thus > exclude all kinds of useful applications, when what they really want > is much more narrow (something like gen AI). I also have a problem > with the term "training" for the reasons indicated above. > > - Generative AI training is a remarkably wide category (see > aforementioned comments about substitution, summarization, and > generating original content). > > As an aside, it's obviously possible for a bot to collect data for > multiple purposes. Maybe I've missed it, but does this draft say what > the rules are there? I guess you're supposed to somehow only proceed > if the intersection of all the uses is allowed? S 5.1 seems to be > about the related but different problem of harmonizing multiple > statements for a given usage. > > > > DETAILED > S 2. > > Artificial Intelligence (AI): > An engineered system of sufficient complexity that, for a given > set of human-defined objectives, learns from data to generate > outputs such as content, predictions, recommendations, or > decisions. > > As I mentioned previously, I think this definition of AI is extremely > problematic and would effectively sweep in any statistical > technique. I understand that "sufficient complexity" is intended to > somehow reduce the scope, but it's a totally subjective standard. > > AI Training: > The application of machine learning to data to produce or improve > a model for an artificial intelligence system. > > I'm not sure what "machine learning" is doing here. Why not just: > > The use of data to produce or improve an artificial > intelligence system. > > I would then strike the term "machine learning" entirely. > > > S 3. > After processing a statement of preferences the recipient associates > each category of use one of three preference values: "allowed", > "disallowed", or "unknown". In the absence of a statement of > preference, all usage categories are assigned a preference value of > "unknown". > > What's the semnatic difference between "unknown" and "allowed"? > Eventually, I need to either use or not use a given piece of > data. Is this just an internal detail of the algorithm? > > > S 3.1. > > An entity that receives usage preferences MAY choose to respect those > preferences it has discovered, according to an understanding of how > the asset is used, how that usage corresponds to the usage categories > where preferences have been stated, and the applicable legal context. > > Usage preferences can be ignored due to express agreements between > relevant parties, explicit provisions of law, or the exercise of > discretion in situations where widely recognized priorities justify > doing so. Priorities that could justify ignoring preferences > include—but are not limited to—free expression, safety, education, > scholarship, research, preservation, interoperability, and > accessibility. > > As stated before, I think we should strike this text and the > following. If the specification doesn't require respecting > the preferences then it doesn't need to take positions on > why not respecting them is OK, and despite the text, this > will inevitably be taken as indicating that other reasons > aren't as valid. > > > S 5. > One approach for dealing with an "unknown" outcome is to assign a > default value. This document takes no position on what default might > be assigned. > > I don't think this makes any sense. The purpose of this document > is to allow the processing agent to understand the declaring > party's preferences, and we don't even require the agent > to respect them, so it's weird to talk about defaults in > that context. > > > S 6. > I think it's premature to worry about this. > > -Ekr > > [0] https://github.com/ekr/ietf-agenda > > > On Thu, Sep 4, 2025 at 7:09 PM Mark Nottingham via Datatracker <noreply@ietf.org> wrote: > > Subject: WG Last Call: draft-ietf-aipref-vocab-03 (Ends 2025-09-18) > > This message starts a 2-week WG Last Call for this document. > > Abstract: > This document defines a vocabulary for expressing preferences > regarding how digital assets are used by automated processing > systems. This vocabulary allows for the declaration of restrictions > or permissions for use of digital assets by such systems. > > File can be retrieved from: > https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/ > > Please review and indicate your support or objection to proceed with the > publication of this document by replying to this email keeping > ai-control@ietf.org in copy. Objections should be motivated and suggestions > to resolve them are highly appreciated. > > Authors, and WG participants in general, are reminded again of the > Intellectual Property Rights (IPR) disclosure obligations described in BCP 79 > [1]. Appropriate IPR disclosures required for full conformance with the > provisions of BCP 78 [1] and BCP 79 [2] must be filed, if you are aware of > any. Sanctions available for application to violators of IETF IPR Policy can > be found at [3]. > > Thank you. > > [1] https://datatracker.ietf.org/doc/bcp78/ > [2] https://datatracker.ietf.org/doc/bcp79/ > [3] https://datatracker.ietf.org/doc/rfc6701/ > > > > -- > ai-control mailing list -- ai-control@ietf.org > To unsubscribe send an email to ai-control-leave@ietf.org > -- > ai-control mailing list -- ai-control@ietf.org > To unsubscribe send an email to ai-control-leave@ietf.org -- Mark Nottingham https://www.mnot.net/
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Eric Rescorla
- [ai-control] WG Last Call: draft-ietf-aipref-voca… Mark Nottingham via Datatracker
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Mike Dierken
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Mark Nottingham
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Eric Rescorla
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Mark Nottingham
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Chris Needham
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Deen, Glenn (Comcast Cable)
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Sebastian Posth
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Mark Nottingham
- [ai-control] Re: WG Last Call: draft-ietf-aipref-… Bradley Silver