[Cats] Re: draft-ietf-cats-usecases-requirements-10 ietf last call Artart review
kehan yao <khyao78@gmail.com> Fri, 12 December 2025 09:37 UTC
Return-Path: <khyao78@gmail.com>
X-Original-To: cats@mail2.ietf.org
Delivered-To: cats@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id 8818B9985677 for <cats@mail2.ietf.org>; Fri, 12 Dec 2025 01:37:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: -0.848
X-Spam-Level:
X-Spam-Status: No, score=-0.848 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, GB_AFFORDABLE=1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FZoQQkz21MPw for <cats@mail2.ietf.org>; Fri, 12 Dec 2025 01:37:48 -0800 (PST)
Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail2.ietf.org (Postfix) with ESMTPS id 5967B998561E for <cats@ietf.org>; Fri, 12 Dec 2025 01:37:48 -0800 (PST)
Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-42e2d44c727so486241f8f.0 for <cats@ietf.org>; Fri, 12 Dec 2025 01:37:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765532261; x=1766137061; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=uDBnmAIPERLnK1Mzddp0Gv5olBt0FiPpQ7tby0Pd+OU=; b=YQz91SaKA4WV1F2+SJ6JYMBJvgzisdkelHOuFxxKeWweCJo5zjZm4nGm/yTG0ZMPq7 MGmrdl7/bRqorS2tAIx2qgcOe0ecjbGonGIQR5spWzjp2f8eotVTbeqQOZS8ck5q2PTT WDtAtI02L9kjaNChiInRs5/J4QWTkFG/Ah2WW+EhCyxdCAWlMJKZ4vdVo3QbZuhvGL+D kcAS2eO1vOA/Tp9GCVEQET/9REi4H5/Or403xhv5FTM+QwonTXbcxRt4zAz68XboPt+I bm9peOk4+q0KSC0QM4iIg2CYf2/Fe4boU9Yq3nCZ8rwq4rYaP/abwJCQoeqHoDSU1KrU oDiA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765532261; x=1766137061; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uDBnmAIPERLnK1Mzddp0Gv5olBt0FiPpQ7tby0Pd+OU=; b=xKLpKCckZjr/6AdQNjMgpkJAikFDiHsO4d0EteswfqXXu4QtxqizWOab8NkrJSimdx eTBRHlva9GDCV/DBh4C4rCfqNe0/t6/MDcn9R68XyW2B4NA7Yjd0GfgTxxZ5zzJZA1A2 JvmgqB8HTP1b7l7ShXPdX4IQg0ZGKdUKMw+9NPDpo59uBETzHmI7KLicyOqfpbJTYNS1 ZH7ORoVkJYYnx/8OAB3uWH4yQ8Mpa1zYY5wya2DnV0ythKlHUZ9QF+SGfrdqRpKrQqQP ewoxnPXQl953oIusN/Pz7AwSKi+GUFY1cHl1PRB45cMeONL2/rtj779Rrdu4oJB0Lygt 1Fzw==
X-Forwarded-Encrypted: i=1; AJvYcCVxlTbQCW0b9yDpAQvar8wGEEmozyILJTQuwwbCtsomzfcK/Yb+7q5wJD565cyCoHLdFHJV@ietf.org
X-Gm-Message-State: AOJu0YwU0a/DD1Rl5ApxSCvVSZyn1Yca3/U11TGbA37Byuhcm3lFVivG Rq/w1B3N1lcnJeaYA0X3+27EmQVUU2xOOJniEsCnPGKUD+/BGUoeByuEJk7SAyDh3YXkM1IKaQO 2NajKnFRolWB51NGHjt68E6aJKxII4A0=
X-Gm-Gg: AY/fxX7eZ15L5V6ZOQi4AQn1UuvBxUcQoMlMb3Ye0XQlEf2UCaGXmwuNSuFaNyB5nre tlsoieEXMaXE/TIzRut/nStDJh3CupijQFd+k2n/T13nJX5zbJdU/zKYaQcGb6Mna31QP4u3d9l EGqCI/HPGZKH4aBwVchI00NDBLJgxaZJ65LcV74Pt4btYGU4bYRuIyvt6fDCFixMn71m4kSrb8E 3dYaG5kaz96J9pjcNwfsT2YRvPnhrnqQH4X0jr51vbvQFO4ajmCmhXk6RLcHqSiXY98B2f6dz5n 5ZrkUX+UxiivsPywIm5QyUpX0A==
X-Google-Smtp-Source: AGHT+IG7QnwcSpBDjXMYGbLf8b6D9+Sk+cSGECdm48/2c/VCU/WkYaEiHVZBuDGvHUiJfL3KzQ8g5L9GwRZGkLbEv0U=
X-Received: by 2002:a05:6000:18a3:b0:42b:2e39:6d66 with SMTP id ffacd0b85a97d-42fb44e1677mr1582728f8f.15.1765532260728; Fri, 12 Dec 2025 01:37:40 -0800 (PST)
MIME-Version: 1.0
References: <176539583482.1334877.15163936020458948002@dt-datatracker-5bd94c585b-wk4l4>
In-Reply-To: <176539583482.1334877.15163936020458948002@dt-datatracker-5bd94c585b-wk4l4>
From: kehan yao <khyao78@gmail.com>
Date: Fri, 12 Dec 2025 17:37:26 +0800
X-Gm-Features: AQt7F2pOjo0S-w1d5hsBXVymV77JYd7F8qnstvoLvU7pysuEKO_hVWIYRZQKa9I
Message-ID: <CABYiY4uNOG4FFj5ROisb4usR_8q757XgRKi8iYk4sUfm20Ms_g@mail.gmail.com>
To: Tim Bray <tbray@textuality.com>
Content-Type: multipart/alternative; boundary="0000000000003d82890645be03b6"
Message-ID-Hash: RS34XHLTFT3E3GDA3W6DLJSWV7SHFFBZ
X-Message-ID-Hash: RS34XHLTFT3E3GDA3W6DLJSWV7SHFFBZ
X-MailFrom: khyao78@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: art@ietf.org, cats@ietf.org, draft-ietf-cats-usecases-requirements.all@ietf.org, last-call@ietf.org
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Cats] Re: draft-ietf-cats-usecases-requirements-10 ietf last call Artart review
List-Id: "Computing-Aware Traffic Steering (CATS)" <cats.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cats/52r4Ujvr0h0yzdIHoIaLbLIOOWo>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cats>
List-Help: <mailto:cats-request@ietf.org?subject=help>
List-Owner: <mailto:cats-owner@ietf.org>
List-Post: <mailto:cats@ietf.org>
List-Subscribe: <mailto:cats-join@ietf.org>
List-Unsubscribe: <mailto:cats-leave@ietf.org>
Hi Tim, Thank you so much for your detailed review. Please see some of my replies inline below. The modifications will be reflected in the next revision. Best regards, Kehan Tim Bray via Datatracker <noreply@ietf.org> 于2025年12月11日周四 03:44写道: > Document: draft-ietf-cats-usecases-requirements > Title: Computing-Aware Traffic Steering (CATS) Problem Statement, Use > Cases, > and Requirements Reviewer: Tim Bray Review result: Not Ready > > This is the ARTART review of draft-ietf-cats-usecases-requirements-10. It > has > no special standing and is offered as input to further discussion of the > subject. > > While I have never looked at ALTO, I spent 5+ years as an employee of AWS > where > a central everyday concern was the design and operation of distributed > systems, > so I feel I have some exposure to the issues being addressed. > > I feel that this document is not suitable for publication as an RFC. > Quoting > from the Shepherd Report: > > The WG milestones only explicitly say to adopt this document (not to > publish > as an RFC). However, the charter does not preclude this. The working > group > discussed this point and had strong consensus that publication as an > Informational RFC would be helpful for future protocol work. > > This document contains a lot of RFC 2119 language, which I don't think > belongs > in an informational RFC. After my review, I am left dubious of the claim > that > this "would be helpful for future protocol work". Perhaps this would be > suitable for leaving as a draft for guiding the work of the WG? > > *[KY] The working group has conducted detailed discussions on this matter previously. The authors and contributors are eager to publish this document as an informational RFC, and this initiative has received substantial support from the working group. On one hand, it provides guidance for the CATS framework, metric definitions, solution design, and other related work. On the other hand, a more critical reason is that publication of this work by the IETF will facilitate its reference by external parties and other organizations (e.g., 3GPP, ETSI, etc.). During the document update process, we have also received significant interest and support from outside the working group. Publishing it as an RFC may therefore better enhance the influence of both this work and the IETF.* > I found this draft difficult (and very time-consuming) to read and am not > convinced that it offers practical value. Perhaps it is aimed at a class > of > system or protocol designer who is working on problems different from > those I > faced, so my experience is not relevant and the comments below are not > helpful. > If so, sorry. > > The draft is extremely verbose, 11K words in length. I found it difficult > to > read and understand because of this and because the language is often > general > and nontechnical. (Also the quality of the language needs work, there are > many > grammatical errors.) It would benefit from the attention of an editor > with the > goal of reducing its size and increasing its clarity. For example, I > think the > entirety of Section 1 could be replaced by the following without loss of > value: > "It is often desirable to distribute compute workloads across multiple > compute > resources. These resources can include servers and load balancers in data > centers and compute capacity deployed in CDN POPs. Routing requests for > service to such nodes with the goals of providing good response to variable > loads presents multiple complex problems." > > * [KY] Thanks. I do acknowledge that the document is relatively lengthy, particularly the sections on problem description and scenario elaboration. As one of the editors, my native language is not English, which has indeed led to some grammatical errors. I apologize for this. Moving forward, I will appropriately condense redundant content in subsequent updates to improve readability for audiences with diverse professional backgrounds.* 2, 3.1 Edge computing could mean two different things: Resources at CDN > POPs, > or resources at infrastructure locations which are specialized at mediating > access to internal servers and the Internet. These offer functions > including > load balancing and firewalling. The draft uses the term "edge" in a very > generalized way. > > *[KY] Regarding the question about "edge computing", please allow me to provide a brief clarification. Similar questions were raised by experts in the working group, including the chair, during the document update. To better distinguish between the two terms, we have defined "edge computing" and "network edge" in section 2 as follows:“Network Edge: The network edge is an architectural demarcation point used to identify physical locations where the corporate network connects to third-party networks.Edge Computing: Edge computing is a computing pattern that moves computing infrastructures, i.e, servers, away from centralized data centers and instead places it close to the end users for low latency communication.Relations with network edge: edge computing infrastructures connect to corporate network through a network edge entry/exit point.”* *I believe that the former understanding provided by you is ‘edge computing’ while the latter is ‘network edge’. I wonder if we can reach a consensus on this point?* > I am unconvinced that some of the scenarios offered are realistic: > > 4.1 "Cloud VR/AR introduces the concept of cloud computing to the > rendering of > audiovisual assets in such applications. Here, the edge cloud helps > encode/decode and render content.” I'm surprised. Rendering AR/VR requires > considerable compute cycles and typically would be accomplished either on > client hardware (mobile phone, AR/VR headset) or in a data center server, > the > results being cached by the edge. But rendering on edge devices? I don't > think > so? I haven't worked on AR in a few years so maybe I'm out of date, but > this is > still surprising. > > *[KY] Regarding AR/VR scenarios, large-scale deployment has not yet been achieved at present. However, with the emergence of more intelligent terminals (e.g., AI headsets, AI smart glasses), computing resources deployed in edge data centers will be required to provide rendering services. While the evolution of this scenario is relatively slow, we believe it still has significant deployment potential and value for the future. For example, during the 118th IETF meeting, the BBC shared similar work in the CATS working group—the AI4ME project. For reference:https://datatracker.ietf.org/doc/slides-118-cats-2a-ai4me-and-bbc-cats-use-cases/ <https://datatracker.ietf.org/doc/slides-118-cats-2a-ai4me-and-bbc-cats-use-cases/>* > 4.2 Repeated discussions of the same problem which could be summarized > “try to > use the nearest edge PoP to reduce latency, unless it’s overloaded, in > which > case fall back to somewhere else, while reporting the problem” > > *[KY] Thank you. I will refine the expressions in this section in the next update.* > 4.5.2 “Distributed AI training” - Is this really a thing? It’s not my > understanding of how model building/training is done in practice. This > and the > other use cases would benefit from citations to real-world research. > > *[KY] First, inference is a typical use case for CATS, as it is highly sensitive to metrics such as latency and memory consumption. For distributed training, federated learning, and related technologies, there is also strong relevance to CATS. Relevant references are listed below:FedFog: Resource-Aware Federated Learning in Edge and Fog Networkshttps://arxiv.org/abs/2507.03952 <https://arxiv.org/abs/2507.03952>ARES: Adaptive Resource-Aware Split Learning for Internet of Thingshttps://dl.acm.org/doi/10.1016/j.comnet.2022.109380 <https://dl.acm.org/doi/10.1016/j.comnet.2022.109380>The authors will add relevant citations in the next revision.* > 5.2, R5 “The Resource Model MUST be implementable in an interoperable > manner.“ > The use of RFC2119 language on such a vague, general statement feels like > mis-use to me. This comment applies to a high proportion of the > requirement > assertions. > > *[KY] There are some explanations after this requirement, “R5: The Resource Model MUST be implementable in an interoperable manner. That is, independent implementations of the Resource Model must be interoperable.”If you think this is not sufficient, how about the following modifications:_NEW_“R5: The Resource Model MUST be implementable in an interoperable manner. That is, metrics generated by this resource model MUST be understood and interoperable across independent implementations.”* > R6: "The Resource Model MUST be executable in a scalable manner. That is, > an > agent implementing the Resource Model MUST be able to execute it at the > required time scale and at an affordable cost (e.g., memory footprint, > energy, > etc.)” The absence of discussion of scaling metrics such as for example > “p99 > latencies” is striking. Note that 5.3 is about metrics, but provides no > examples nor does it enumerate any specific metrics. > > *[KY] How about adding some examples at the first paragraph in section 5.2:* *_NEW_“Computing metrics can have many different semantics, particularly for being service-specific. For example, delay, measured as milliseconds (ms), can gauge packet transmission time as well as service processing time, GPU memory, measured as Gigabytes (GB) or MegaBytes (MB) can represent the computing load that influence how many service requests that can be handled in a pre-defined time duration. These representations may entail information on the semantics of the metric or sometimes metrics may also be purely one or more semantic-free numerals.” * > R7: "The Resource Model MUST be useful." Once again, the 2119 language > feels > inapplicable. > > * [KY] Thanks for the criticisms.If it is not appropriate and compelling, authors could consider to delete this requirement.* R18: "CATS systems MUST maintain instance affinity for stateful sessions and > transactions." This may be true in some service scenarios but in > large-scale > distributed systems it can cause all sorts of problems. I personally was > severely bitten by a misguided attempt to provide instance affinity in a > large-scale cloud application, see > https://www.tbray.org/ongoing/When/201x/2019/09/25/On-Sharding (also have > a > look at some of the other issues discussed there, which feel like they > ought to > be relevant to this subject matter) > > There is no discussion of shuffle sharding, which is overwhelmingly seen > as a > best practice to make systems resilient in the face of inevitable server > failures. In fact, there is little discussion of resilience in the face of > server failures. That feels like one of the big and hard problems in > operating > real-world distributed systems. * [KY] Thank you for providing the reference materials. Indeed, we lack practical experience in distributed systems. We will strive to incorporate considerations related to fault resilience in the next update. Regarding instance affinity, after reading your blog, we agree that the current wording is overly absolute. Situations such as traffic surges could restrict system optimization options due to this requirement. Therefore, we propose weakening the wording as follows:* *_NEW_R18: "CATS systems are RECOMMENDED to maintain instance affinity for stateful sessions and transactions."* > > > The Security Considerations section seems short. One of the functions > required > of every system is authentication of its users, and not all classes of > servers > can perform this task; how does authentication figure in the CATS > ecosystem? > > * [KY] Similar comments have also been raised by DNSdir review, I’ve proposed some revisions in previous thread, please take a look to see if approriate.* *https://mailarchive.ietf.org/arch/msg/cats/mpjybHZE9X91EaiY7oJrSKbTFkU/ <https://mailarchive.ietf.org/arch/msg/cats/mpjybHZE9X91EaiY7oJrSKbTFkU/>* > > > -- > Cats mailing list -- cats@ietf.org > To unsubscribe send an email to cats-leave@ietf.org >
- [Cats] draft-ietf-cats-usecases-requirements-10 i… Tim Bray via Datatracker
- [Cats] Re: draft-ietf-cats-usecases-requirements-… kehan yao
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Julien Maisonneuve (Nokia)
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Yao Kehan
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Julien Maisonneuve (Nokia)
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Joel Halpern
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Yao Kehan
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Joel Halpern
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Yao Kehan
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Cheng Li