[Cats] Re: draft-ietf-cats-usecases-requirements-10 ietf last call Tsvart review
Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com> Tue, 13 January 2026 13:23 UTC
Return-Path: <zahed.sarker.ietf@gmail.com>
X-Original-To: cats@mail2.ietf.org
Delivered-To: cats@mail2.ietf.org
Received: from localhost (localhost [127.0.0.1]) by mail2.ietf.org (Postfix) with ESMTP id D9DE7A6FD93C for <cats@mail2.ietf.org>; Tue, 13 Jan 2026 05:23:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at ietf.org
X-Spam-Flag: NO
X-Spam-Score: -1.097
X-Spam-Level:
X-Spam-Status: No, score=-1.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, GB_AFFORDABLE=1, HTML_FONT_FACE_BAD=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: mail2.ietf.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail2.ietf.org ([166.84.6.31]) by localhost (mail2.ietf.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BWWhDhATTVeN for <cats@mail2.ietf.org>; Tue, 13 Jan 2026 05:23:08 -0800 (PST)
Received: from mail-yx1-xb130.google.com (mail-yx1-xb130.google.com [IPv6:2607:f8b0:4864:20::b130]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail2.ietf.org (Postfix) with ESMTPS id 37D0DA6FD8A6 for <cats@ietf.org>; Tue, 13 Jan 2026 05:23:08 -0800 (PST)
Received: by mail-yx1-xb130.google.com with SMTP id 956f58d0204a3-64472c71fc0so7459726d50.0 for <cats@ietf.org>; Tue, 13 Jan 2026 05:23:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768310582; x=1768915382; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5Uo9nAGsIQDgT9A2nGPB3P11xJl1cbgGZUgcTH+LZ/M=; b=BBhzdUeYYBHg6n6WbcSgznzGyVss2YJV7eQCMsnZ83eOtUhvPnskydygoScks7N4Kw LGkARS98m8gKm3VTo2XJssXlyjKlJ734TT93tGDYkNRBjve0d/kNBfajTsIpVdOXVk0b 2vMkyNsUuy1BA0R8vxE/+KewZUUjhV1cHQ6wm6fS8E1xjhtHbQ4JvmAjSSJpZim/buSA w/l/cEJCvNHYbMWfeSWhf+3Ragj2GgcIS7mALl910AjsDjD7e2d3oEQXHBcc8G5e+PFn OFKBpQo9/ffQeTnqkeanIIbJETSB7vqgUWHmsClVRAVrn3g3HnXmIgn1M3mMnMKzdpZa QKew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768310582; x=1768915382; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5Uo9nAGsIQDgT9A2nGPB3P11xJl1cbgGZUgcTH+LZ/M=; b=FUWQ0Ld0v4k2gJavfdOmIKh/Efp4QL61iNPnDtoN9j7QlI2LLMA6jBgrvgToAZpqhH gMKlf4Un0hBqEUXripqVGeHgMsiGEOzujwsooIiHL0Kn83TXT55JgjuBC4fgjJbBAAZk Xg+1D9sF/FbyCHnN+R3Fg1Nf+oTem+o2XYCv8/AagrhRtmmD9EpPqZMOEna9Wqw/oT1O IEU9+K5KuJFEUcKr9DokP87rZbPeySKF2nOWwsOEmHDDjBjGzRE7B2/9OsuH5rjWQ1AW m3GuQ13ro06nXNWUVwig0YS3x27M4Yq4i7JYRYNEH7I9O331A3y9mZQ/pg9H8aaJksDL TpnA==
X-Forwarded-Encrypted: i=1; AJvYcCUvHihuFnjrjI6SsFWwLTKYYm3Hv7flf8toL3iNqudGqrl8rbmXzpTOuuesoy0i1iCgHbWi@ietf.org
X-Gm-Message-State: AOJu0YwBJnVtN2yafwjQFMn9DQMI7McDl14nVrFZjDuZv4DSu+bPuusg UCgjVFBJ9iJmtTvLc0MYTXHMV8W0WtXV/uShvkyzyHZJgni1L27LlGwF2NRDwO4AsnGvRMyu8Yx lQBCk1OoM/dyaOTGtMS58PiF25CAZ+7k5Xg==
X-Gm-Gg: AY/fxX6g4uDSsOgnbxnvNQ3GIsFAwLlPHoOXZ78k1wBszt+GU4hRb2Uo8ffZlrWa3Le BpSgFoD5iCXzznlaMXpnRblExJadSOFKyFObjD+RonsviyrRNqrCmx7In0VctPySM5tGf5L/+Pt QYkKPNgp9DWEnC53+qz850U70THgLPzDpEKbaB1Rkhvmyh59DVIXXZLwvL35E2yif/k4XW2+TEW DA8DLCsAjy87txUfNG4fDacEV38FUld7cvJtnB+dX05Hg+uo+ArQKpJW5OJ+TIQryoFV0698VnB 4uJPJNMw
X-Google-Smtp-Source: AGHT+IEB9m+5irzxPYDhmFSujTbuVG8r2+f+MyODJdHWL4lY+sJajoAQ6Mrorh7VWJwsIHqKfdyXHKfuOlkkZFOGl/M=
X-Received: by 2002:a05:690e:1448:b0:644:60d9:8676 with SMTP id 956f58d0204a3-64716cc0e58mr16526072d50.91.1768310581483; Tue, 13 Jan 2026 05:23:01 -0800 (PST)
MIME-Version: 1.0
References: <176599616794.880158.17455224008756727143@dt-datatracker-5bd94c585b-pvtsm> <CABYiY4v7dyDn7Xj-QPZJ5smrMo_FmusNMu8nHan5E1GTbXTPNQ@mail.gmail.com> <CAEh=tcfagtk-d7AxKczfpqSO3GNpbb968xdTeti9dWveq5CB8g@mail.gmail.com> <CABYiY4v4nMBGaj81tdueaFGCPu6d_pwZt5PGr6AUS1prRLYAAA@mail.gmail.com>
In-Reply-To: <CABYiY4v4nMBGaj81tdueaFGCPu6d_pwZt5PGr6AUS1prRLYAAA@mail.gmail.com>
From: Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com>
Date: Tue, 13 Jan 2026 14:22:48 +0100
X-Gm-Features: AZwV_QgZbrVK0y1n9HtmseN8ke3tkdsnzqnPnvuZednECF79TIUwYRTXVvJ_rL8
Message-ID: <CAEh=tce15i-i5qSGeL5nz2P1WN=qbcWnCaSmuUKC8zZr=XaWVQ@mail.gmail.com>
To: kehan yao <khyao78@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000000fdc2f064844e488"
Message-ID-Hash: TMD5W2MKVDZLWOMSU4IN4V2PD3YK6AWP
X-Message-ID-Hash: TMD5W2MKVDZLWOMSU4IN4V2PD3YK6AWP
X-MailFrom: zahed.sarker.ietf@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: tsv-art@ietf.org, cats@ietf.org, draft-ietf-cats-usecases-requirements.all@ietf.org, last-call@ietf.org
X-Mailman-Version: 3.3.9rc6
Precedence: list
Subject: [Cats] Re: draft-ietf-cats-usecases-requirements-10 ietf last call Tsvart review
List-Id: "Computing-Aware Traffic Steering (CATS)" <cats.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cats/6ml99zf-d1ZyFrcP6aDAxL1ann0>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cats>
List-Help: <mailto:cats-request@ietf.org?subject=help>
List-Owner: <mailto:cats-owner@ietf.org>
List-Post: <mailto:cats@ietf.org>
List-Subscribe: <mailto:cats-join@ietf.org>
List-Unsubscribe: <mailto:cats-leave@ietf.org>
Thanks for addressing my comments. //Zahed On Tue, Jan 13, 2026 at 7:07 AM kehan yao <khyao78@gmail.com> wrote: > Dear Zahed, > > I've just submitted a new revision, > https://datatracker.ietf.org/doc/draft-ietf-cats-usecases-requirements/ > Please see if it is appropriate, and also see my replies inline below. > > Best regards, > Kehan > > Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com> 于2026年1月8日周四 19:36写道: > >> Hello Kehan, >> >> Don't worry about late replies, I am even slower than you as I took a >> longer vacation. See my responses/reflections below. >> >> Based on this conversation I am expecting revision (s) of this document. >> >> //Zahed >> >> On Sat, Dec 20, 2025 at 2:26 PM kehan yao <khyao78@gmail.com> wrote: >> >>> Hi Zahed, >>> >>> Thank you very much for your quite detailed and professional review. >>> It took me some while to prepare the answers. Sorry for late reply >>> and please see replies inline below. >>> >>> Best regards, >>> Kehan >>> >>> Zaheduzzaman Sarker via Datatracker <noreply@ietf.org> 于2025年12月18日周四 >>> 02:29写道: >>> >>>> Document: draft-ietf-cats-usecases-requirements >>>> Title: Computing-Aware Traffic Steering (CATS) Problem Statement, Use >>>> Cases, >>>> and Requirements Reviewer: Zaheduzzaman Sarker Review result: Not Ready >>>> >>>> This document has been reviewed as part of the transport area review >>>> team's >>>> ongoing effort to review key IETF documents. These comments were written >>>> primarily for the transport area directors, but are copied to the >>>> document's >>>> authors and WG to allow them to address any issues raised and also to >>>> the IETF >>>> discussion list for information. >>>> >>>> When done at the time of IETF Last Call, the authors should consider >>>> this >>>> review as part of the last-call comments they receive. Please always CC >>>> tsv-art@ietf.org if you reply to or forward this review. >>>> >>>> I didn't find any particual issues related to transport protocols. >>>> However, >>>> this document made me wonder about number of things hence I am not sure >>>> I >>>> understood the context and requirements properly. With better >>>> understanding my >>>> view on transport protocol issues might change. >>>> >>>> As I had to read and understand this document. I encountered lots of >>>> issues >>>> which I noted below. I believe addressing those would make this >>>> document more >>>> understandbale. Form that point of view I don't think this document is >>>> ready to >>>> be published. >>>> >>>> By reading this document it seems CATS is trying to solve issues from >>>> application layer to routing layer. However, it didn't give >>>> comprehensive hints >>>> where it should converge. Lots of times it describes an application >>>> requirement >>>> and tries to justify network responsibilities, which confuses me a >>>> number of >>>> times. It does not provide a clear description about ingestion points >>>> at the >>>> network and assumes the CATS service providers have full control of the >>>> service >>>> instance deployment. # Introduction >>>> >>>> It says >>>> >>>> Offloading compute intensive processing to the user >>>> devices is >>>> not acceptable, since it would place pressure on local >>>> resources >>>> such as the battery and incur some data privacy issues if >>>> the >>>> needed data for computation is not provided locally. >>>> >>>> Is that even an option for CATS or other network resources >>>> deployments? If >>>> not, then why is it mentioned here? >>>> >>>> [KY] Sorry for the confusion. CATS focuses on network-side traffic >>>> steering by perceiving computing resources, rather than end-device >>>> offloading. This statement emphasizes that end-device offloading cannot >>>> fully meet the computing needs of applications. Would it be acceptable to >>>> simplify this sentence to: "Relying solely on end-side computing resource >>>> enhancement cannot meet the computing requirements of all applications." >>>> >>> >> Yes, something like that would be more useful. >> [KY] Please see the introduction for revision. >> >> >>> >>>> What is an "edge site"? What is the "edge of a network"? I haven't >>>> found >>>> any definition or description of those in this document to fully >>>> understand the meaning in the context. >>>> >>>> [KY] Thank you for pointing out the ambiguity in terminology. The >>>> definition of "edge site" will reference the "service site" definition in >>>> draft-ietf-cats- >>>> >>> framework-19, referring to a physical location (e.g., operator equipment >>>> rooms, base station-side nodes) where edge computing resources are deployed >>>> and can be accessed via edge routers. "Edge of a network" refers to the >>>> architectural demarcation point where a corporate network connects to >>>> third-party networks (consistent with the terminology definition in Section >>>> 2 of this document). Subsequent revisions will add cross-references in the >>>> introduction or terminology section to ensure contextual consistency. >>>> >>> >> Great!! >> [KY] This document refers to CATS framework on the definition of >> service site, service instance, and CATS service identifier(CS-ID). This >> document only newly defines “edge computing”. >> >> >>> >>>> The introduction gives an impression that CATS is only for edge >>>> computing. >>>> Is that the intention? >>>> >>>> [KY] The reference to edge computing in the introduction is intended to >>>> illustrate a typical application scenario, not to limit CATS' scope. The >>>> core of CATS is "computing-aware traffic steering," which includes edge >>>> computing, SD-WAN (mentioned in Section 4.4), and cloud-edge collaboration. >>>> Per the charter requirements, only single-domain scenarios are considered >>>> for now. Future revisions will clarify that CATS applies to all scenarios >>>> requiring computing-aware traffic steering. >>>> >>> >> OK, then let's make it clear that this is one of the uses of CATS. >> [KY] In the last second paragraph in the introduction, I added the >> following clarification “ It should be noted that CATS is not limited to >> edge computing scenarios, however, Section 3 of this document will focus on >> edge computing scenarios for problem statement.” >> >> >>> >>>> # Definition of terms >>>> >>>> It is not very clear to me about the difference between "Network >>>> edge" >>>> and "provider network" that is mentioned in the CATS framework >>>> draft. >>>> What are the differences? and why is "Network edge" defined here >>>> and not >>>> used in the rest of the document? >>>> >>>> [KY] "Network edge" in this document focuses on "physical location >>>> demarcation" (e.g., corporate network gateways), while "provider network" >>>> in the CATS framework refers to network infrastructure managed by service >>>> providers (including core networks, access networks, etc.). They are >>>> defined from different dimensions: the former describes location >>>> attributes, and the latter describes management attributes. "Network edge" >>>> is only used in the terminology definition section and not elsewhere >>>> because the core discussion focuses on "edge sites" (physical locations >>>> hosting computing resources). Subsequent revisions will add explanatory >>>> notes on terminology correlations to avoid confusion. >>>> >>> >> If the term is not used in the document then there is no point of >> defining it. I would suggest adding some text in the introduction section >> to make it clear about the scope of the document as you described above. >> [KY] Same as above. This document refers to CATS framework on the >> definition of service site, service instance, and CATS service >> identifier(CS-ID). This document only newly defines “edge computing”. >> >> >>> >>>> What kind of service identity are we talking about? Is this >>>> service >>>> identity that can be obtained by a TLS client? or an ALTO endpoint >>>> Identifier? or something else? >>>> >>>> [KY] The "service identifier" in this document is a unified identifier >>>> used by clients to access services (e.g., application-layer service IDs), >>>> independent of TLS client identifiers or ALTO endpoint identifiers. Its >>>> core function is to enable "mapping a single identifier to multiple service >>>> instance addresses" (as specified in Requirement R1), with the specific >>>> format defined by upper-layer applications. >>>> >>> >> Please make it clear in the text or define the term "service identifier" >> the way you want the readers to interpret it. >> [KY] I refer to CATS framework on the definition of CATS service >> identifier(CS-ID). >> >> >>> >>>> # Problem statement >>>> >>>> It says - >>>> , a number of representative cities have deployed >>>> multi-edge sites >>>> and the typical applications, and there are more edge >>>> sites to be >>>> deployed in the future. >>>> >>>> I find this unnecessary, specially when there is no provided >>>> information >>>> to verify this claim. >>>> >>>> [KY] Thank you for the feedback. This descriptive statement lacks >>>> practical data support and will be deleted in subsequent revisions. The >>>> core logic of the problem statement—"multi-edge site deployment leads to >>>> scattered service instances, requiring computing-aware steering to resolve >>>> resource mismatches"—will be strengthened to eliminate irrelevant content. >>>> >>> >> OK. Then focus on the issues on the multi-edge deployments. >> [KY] Please see section 3.1 “Multi-deployment of Edge Service Sites and >> Service” for revision. >> >> >>> >>> Section 3.1 mentions one expired ALTO draft and ALTO protocol, but >>>> then >>>> it does not really say if ALTO helps picking up the best node in >>>> the >>>> network to reach then what is left to be fixed and what >>>> differentiate >>>> ALTO solution from CATS. >>>> >>>> >>>> As per charter CATS is for network nodes to pick the right place >>>> to >>>> deploy the services. But I fail to see that part in this problem >>>> statement. >>>> [KY] The core of the ALTO protocol is to provide applications with >>>> network topology, resource information, and other data to assist >>>> applications in independently selecting service instances. However, it does >>>> not involve "real-time perception of computing resource metrics or dynamic >>>> steering execution" (though ALTO could be extended to support that). >>>> Essentially, ALTO empowers applications to make decisions, while CATS’ core >>>> value lies in enabling networks to make decisions: ① Networks perceive both >>>> network metrics and computing metrics (e.g., CPU/GPU load); ② Traffic >>>> steering (including instance discovery and path selection) is implemented >>>> through the network; ③ The network proactively supports real-time dynamic >>>> adjustments (e.g., user mobility, sudden load changes). Currently, ALTO is >>>> cited in Section 3.1 to illustrate that ALTO can help applications better >>>> deploy instances by perceiving computing resource information. >>> >>> >> You need to clarify this in the document. >> [KY] I’ve explained this in section 3.2. as follows: >> >> ”Such traffic steering can be initiated either by the application layer >> or by the network layer: the application layer may actively query for the >> optimal node and guide traffic using mechanisms such as the ALTO >> protocol[RFC7285], while the network layer may leverage Anycast >> routing[RFC4786], where routering systems automatically distribute traffic >> according to routing tables in an application- transparent manner. However, >> regardless of whether the steering is performed by the application or the >> network, the core criteria for selecting "closest" or "close" sites often >> rely solely on communication metrics (such as physical distance, hop count, >> or network latency). This decision logic can easily lead to suboptimal >> choices, meaning that the "closest" site is not always the "best" one. ” >> >>> >>> >>>> >>>> What is the difference between an "edge node" and "edge site"? >>>> and what >>>> are their relation with "service site", "service instance" >>>> defined in >>>> CATS framework? If they are supposed to mean the same, why >>>> aren't the >>>> CATS framework defined terms used here? >>>> >>>> [KY] We apologize for the confusion caused by terminology. "Edge node" >>>> and "edge site" are similar in meaning, referring to service sites >>>> physically close to users. Subsequent revisions will uniformly adopt >>>> terminology defined in the CATS framework. >>>> >>> >> OK. >> [KY] unify on “edge service site” >> >> It says - >>>> >>>> If the resources are insufficient to support new instances, >>>> the >>>> operator can be informed to increase the hardware resources. >>>> >>>> OK,fine. Does CATS do that? Do we need a protocol to inform the >>>> operator? Who tells them about the need to invest in hardware. >>>> Sorry, >>>> why are we talking about this here in the problem statement? >>>> >>>> We have a section in the problem statement that talks about >>>> multi-deployment of the edge sites and services, but then it >>>> ends >>>> saying "where to locate service instances and when to create >>>> new ones >>>> in order to provide the right levels of resource to support >>>> user >>>> demands" is out of scope of CATS. So, what are really the >>>> problems >>>> here? >>>> >>>> [KY] Please allow us to address these two consecutive questions >>>> collectively. First, adding hardware computing resources falls under the >>>> scope of computing resource deployment, which is not addressed by CATS. The >>>> specific emphasis on computing instance deployment in Section 3.1 is due to >>>> the document’s evolutionary history. Computing resource deployment, >>>> computing instance selection, and computing service assurance form a >>>> lifecycle with tight interdependencies—instance deployment significantly >>>> impacts and determines service quality. During the establishment of the >>>> CATS working group, most participants had a network background rather than >>>> a computing background, so there was a need to clarify the specific network >>>> problems CATS aims to solve. After extensive discussions, the focus >>>> narrowed to tasks achievable by the network (routing → computing instance >>>> selection). However, the working group agreed that including the closely >>>> related instance deployment phase in the problem statement would provide >>>> supplementary context and a more comprehensive discussion. >>>> >>> >> I found that statement in the document out of context, as a reader and as >> I don't have the history of what is discussed, please clarify what is >> relevant for CATS and what is not when you talk about computing >> resource deployments. >> [KY] The main relevance of instance deployment to CATS traffic >> scheduling lies in the definition and selection of compute metrics. This >> relationship has been revised and clarified in Section 3.1. >> >> >>> >>>> ## Section 3.2 says - >>>> >>>> Traffic is steered to an edge site that is "closest" or to >>>> one of a >>>> few "close" sites using load-balancing >>>> >>>> Who is steering this and is this load-balancing static? Is >>>> there >>>> support for mid-session steering and load-balancing? if these >>>> are >>>> dynamic and only done at the beginning of the client sessions >>>> then it >>>> should be already possible to pick the right edge site but not >>>> the >>>> closed one. >>>> >>>> [KY] Would it be acceptable to revise the second paragraph of Section >>>> 3.2 as follows: >>>> >>> _NEW_ >>>> >>> With conventional scheduling mechanisms (e.g., Anycast and Equal Cost >>>> Multi Path (ECMP)), the network can select one or more optimal network >>>> paths for services (e.g., directing traffic to the nearest service site). >>>> However, this does not guarantee the selection of the most suitable service >>>> instance, as computing resource information is equally critical to instance >>>> selection as network information: >>>> >>> >> Not really helping that much. Still I am not sure when this selection is >> done and how cats interact with other scheduling mechanisms. How would the >> routing decision and decision based on compute resource information would >> converse (no the whole solution does not need to be documented here) ? It >> needs a bit more description to understand "the most suitable service >> instance" case. >> [KY] Same as above. Please see the revision in section 3.2. >> > > >> "we assume" who are we? authors or the wg or the IETF? >>>> >>>> [KY] Is it okay to change this paragraph like >>>> >>>> _New_ >>>> >>>> “It's assumed that Service instances are multi-site deployed, and they >>>> are reachable through a network infrastructure.” >>>> >>> >> OK. >> > [KY] See revision. > >> >> >>> Please describe an "edge router". >>>> >>>> [KY] “Edge router” is avoided since its only used once. >>>> >>> >> OK. >> > [KY] “Edge router” was abandoned. > >> >>>> It says >>>> selection of one of candidate service instances is done >>>> using >>>> traffic steering methods, where the steering decision may >>>> take >>>> into account pre-planned policies (assignment of certain >>>> clients >>>> to certain service instances), realize shortest-path to >>>> the >>>> 'closest' service instance, or utilize more complex and >>>> possibly >>>> dynamic metric information, such as load of service >>>> instances, >>>> latency experienced or similar, for a more dynamic >>>> selection of a >>>> suitable service instance. >>>> >>>> Why can't anycast routing be used here? or is that the idea >>>> here? >>>> >>>> [KY] As mentioned earlier, anycast can only schedule traffic based on >>>> "network reachability" or "shortest path" and cannot consider computing >>>> resource status (e.g., CPU load, GPU availability). In contrast, CATS’ core >>>> is "computing awareness," which integrates both computing and network >>>> metrics—essentially functioning as computing-aware dynamic anycast. >>>> >>> >> Then why don't we call it that? and provide comparisons with vanila >> anycast. >> [KY] Same as above. Please see the revision in section 3.2. >> >> >>> >>>> It says >>>> >>>> It is important to note that clients may move. This >>>> means that >>>> the service instance that was "best" at one moment >>>> might no >>>> longer be best when a new service request is issued. >>>> >>>> OK, will CATS solve the issue of mid-session mobility? The >>>> service >>>> instances will have states, those states need to be migrated to >>>> the new >>>> site so it is not a plug-n-play solution unless the service is >>>> completely stateless. Does this mean the services that CATS can >>>> entertain need to be stateless? This seems like a requirement >>>> to the >>>> service instances. I am asking this as afaik the big use cases >>>> written >>>> in the document AR or VR or vehicle that maintain lots of >>>> states in the >>>> servers and at the client and they even need stickiness. Just >>>> moving to >>>> a low load service instance might not be the ideal solution. >>>> So, what >>>> is the main problem CATS is going to solve in this context? >>>> >>>> [KY] This is a crucial question, closely related to subsequent use >>>> cases and requirements for service stickiness. As a network-layer routing >>>> technology, CATS aims to support diverse service requirements, including >>>> both stateful and stateless services. However, given the complexity of >>>> services (especially mobility), managing service states could become >>>> extremely intricate. Currently, CATS does not intend to participate in >>>> application decision logic but rather to provide applications with a >>>> flexible, network-based routing method for instance selection. Specific >>>> state management remains the responsibility of the application layer. >>>> >>> >> Please clarify this in the document, it is not clear right now. As you >> mention this is very critical to understand the requirements. >> [KY] Please see the revision in section 3.2 as follows: >> >> From a routing perspective, CATS is an application-transparent routing >> mechanism that can provide scheduling for both stateful and stateless >> services. However, in scenarios where clients move and the service is >> stateful, CATS requires the application to explicitly indicate whether it >> allows the routing system to enable CATS functionality. Otherwise, >> mid-session scheduling triggered by CATS may cause application context >> inconsistency among service sites or even service interruption. >> >>> >>> >>>> # Section 4.1 >>>> >>>> It is not clear to me the meaning or "dynamically steer >>>> traffic". does >>>> it mean start of a service session or it mean mid-session >>>> steering or >>>> both? Can this be clarified? >>>> >>>> [KY] This question is related to the previous one. CATS can undoubtedly >>>> be applied at the start of a session. It can also be used for mid-session >>>> steering (e.g., user mobility, sudden load surges, or instance failures). >>>> However, the specific implementation of mid-session steering depends on >>>> application requirements for the network. Applications may require the >>>> network to enable CATS capabilities continuously, with state maintenance >>>> and context migration handled by the applications themselves. >>>> Alternatively, applications may explicitly dis-enable CATS during >>>> deployment, retaining full control over state management and instance >>>> selection. >>>> >>> >> Again, this needs to be clarified in the document that mid session >> steering is application dependent. >> [KY] Please see revisions in section 4.1 as follows: >> >> For AR/VR scenarios, traffic should only be steered at the start of a >> service session, because mid-session steering involves significant context >> migration, which is costly and requires explicit application participation >> and approval. >> >>> >>> >>>> # Section 4.2 >>>> >>>> In the first paragraph, it describes the need to increase the >>>> transmission capacity, video processing and network bandwidth. >>>> I fail >>>> to see what is the relation toward CATS. >>>> >>>> [KY] The authors will condense this section. It primarily provides >>>> background information and is somewhat redundant. >>>> >>> >> OK, I will read it again when it is ready. >> [KY] Please see revisions in section 4.2. >> >> >>> >>>> It says - >>>> >>>> The notion of sending the request to the "nearest" edge >>>> node is >>>> important for being able to collate the video information of >>>> "nearby" vehicles, using relative location information >>>> among the >>>> vehicles. Furthermore, data privacy may lead to a >>>> requirement to >>>> process the data by an edge node (or an adjacent vehicle as >>>> a >>>> cluster node ) as close to the source as possible to limit >>>> the >>>> data's spread across many network components in the network >>>> >>>> So, here the video should not be processed anywhere else so it >>>> is kind >>>> of fixed with the "nearest edge node" policy. Do we need CATS >>>> for this? >>>> >>>> [KY] The following paragraph explains that the nearest node may be >>>> overloaded and unable to provide services, thereby justifying the need for >>>> CATS. However, this paragraph is also redundant and will be appropriately >>>> condensed by the authors. >>>> >>> >> OK, I will read it again when it is ready. >> [KY] Same as above. >> >> >>> >>>> In the 3rd paragraph, it starts to talk about "closest" but it >>>> was >>>> discussing "nearest" in the previous paragraph. Do they mean >>>> the same >>>> thing? If yes, then please use the same terminologies. >>>> >>>> [KY] Thank you. These two terms are synonymous, and "closest" will be >>>> used uniformly in subsequent revisions. >>>> >>> >> OK. >> > [KY] “nearest” is avoided to use. > > >>> >>>> According to my understanding these scenarios can be satisfied >>>> by ALTO >>>> protocol as there the network provides information to the >>>> clients to >>>> pick the right service instance or even change. So, it is not >>>> clear to >>>> me why this is a CATS use case. >>>> >>>> [KY] As previously discussed, standard ALTO does not integrate >>>> computing metrics or support dynamic steering during vehicle mobility. >>>> These capabilities require closer collaboration between network components, >>>> and CATS offers a network-layer solution. >>>> >>> >> What is important here is to tell how much the network can do without >> changing the application (both for stateful and stateless ). >> [KY] See revisions in section 4.2 as follows: >> >> As mentioned in the problem statement section, CATS is an >> application-transparent network-layer solution. Unlike ALTO[RFC7285], it >> enables coordinated scheduling of network and computing resources without >> requiring application modifications. For moving vehicles, CATS supports >> smooth and proactive context migration between edge nodes, provided that >> the application allows it, to maintain service continuity. >> >>> >>> >>>> Also I didn't find any discussion on moving speed, vehicles >>>> usually >>>> move fast, that means it is possible that by the time CATS >>>> realizes the >>>> compute or bandwidth is loaded the vehicle has moved to another >>>> basestation that might need a new PE/edge site all together. >>>> What are >>>> the considerations on the requirements regarding this? >>>> >>>> [KY] Since only Section 4.2 (Intelligent Transportation) is closely >>>> related to mobility among all use cases, no common technical requirements >>>> for speed have been proposed. However, the authors will consider adding >>>> relevant discussions in the technical requirements section—for example, >>>> addressing how speed impacts metric update frequency in the context of >>>> metric distribution requirements. >>>> >>> >> That's needed and would be useful. >> [KY] A short paragraph was added before the current R14 as follows. >> >> For example, In highly mobile scenarios, such as fast-moving vehicles >> mentioned in Section 4.2, compute metrics can quickly become outdated as >> the UE moves across base stations and edge service sites, potentially >> requiring more frequent updates. However, updates should remain stable and >> avoid excessive overhead. >> >>> >>> >>>> # Section 4.3 >>>> >>>> This section is clear about the use case of having decentralized >>>> storage, but previous use cases are not clear about how they >>>> relate to >>>> CATS. >>>> >>>> [KY] Thank you. The PLC in this use case is computing-sensitive. Would >>>> it be acceptable to add the following discussion at the end of Section 4.3: >>>> >>> _NEW_ >>>> >>> In this use case, CATS is required for selecting the optimal PLC >>>> instance and storage node, ensuring low latency and reliability for data >>>> processing in industrial scenarios, as well as low latency for data >>>> reading/writing during twin control processes. >>>> >>> >>> >>>> # Requirements >>>> >>>> R1 : is this to inform the clients accessing the service >>>> instances? or >>>> is this for the CATS system to decide where to send a client >>>> request? >>>> Throughout the document so far the discussion on applications >>>> made me >>>> wonder this. >>>> What is "real-time system state"? >>>> >>>> [KY] It’s for the CATS system to decide where to send a service flow >>>> to. Usually, it’s hard to gauge what "real-time" is, because there is no >>>> benchmark applicable to all use cases. >>>> >>> >> The "real-time system state" still needs to be defined if you want to use >> it. Otherwise.. It is hard to understand the requirements. >> [KY] A short paragraph after R1 and R2 was added to explain “up-to-date” >> : >> >> Note: The term "up-to-date" herein refers to the latest metric >> information collected by the system in accordance with the preset metric >> update cycle. The principle for setting the cycle is generally >> pre-determined by the network. For example, based on historical statistical >> data, a relatively appropriate update cycle (either second-level or >> millisecond-level) is selected for a specific type or certain types of >> services. >> >>> >>> >>>> R2 : see comments for R1, what is the periodicity for the >>>> "up-to-date >>>> status"? without proper understanding of that, it is hard to >>>> understand >>>> the requirement. >>>> >>>> [KY] Similarly, this requirement doesn’t want to limit a time duration >>>> for "up-to-date," considering there are many different use cases—some are >>>> sensitive in milliseconds, while others may be in seconds. >>>> >>> >> So how does CATS know about this? Does CATS have ideas about what kind of >> use case is in play? >> [KY] Same as above revisions. >> >> >>> >>>> R3 : are these service instances in the participating edges >>>> administered, implemented and deployed by several entities in the >>>> service provider's network? If not then why is this a >>>> requirement? Now, >>>> even if we agree on a certain metric, say "CPU load" and my >>>> instance has >>>> 5 fully loaded CPU and 5 ideal ones, then my instance is 50% >>>> loaded, >>>> will it be an understandable metric for another instance about my >>>> situation? >>> >>> [KY] This requirement applies not only to consistent understanding among >>> computing service providers but also to network equipment vendors, who must >>> share a uniform interpretation of computing resource information. >>> Additionally, per the charter, the current scope is limited to a single >>> domain. However, future scenarios may involve cross-provider deployment of >>> user services. Detailed specifications for ensuring consistent metric >>> interpretation are provided in draft-ietf-cats-metrics-definition. >>> >> >> I still don't understand if a single domain is configured and >> administered by one entity, isn't possible to configure those with a >> common understanding of the metrics by that domain? >> It also seems like a very hard metric definition to agree with among all >> the service providers and vendors. Maybe I don't understand the goal here. >> > > [KY]Even within the administrative domain of a single service > provider, the computing resources deployed at different service sites may > come from different vendors, and there can be significant differences in > the models and types of these computing resources. For example, a CPU load > of 50% from vendor A may still be able to handle 5 additional service > requests, while a CPU load of 50% from vendor B may be able to handle 10 > additional service requests. In such cases, a simple CPU load value cannot > adequately reflect the actual service capacity. Therefore, it is necessary > to abstract these metrics into a scoring system that can be consistently > understood across implementations. This abstraction also helps network > routing protocols advertise resource information, as routing protocols are > generally not well-suited to encapsulating and extending overly complex > resource details. > > >> >>> >>>> R4 : Who are "we" again? and I am not sure I understand >>>> the requirements ( there are actually two requirements here ) . >>>> What are >>>> the requirements on the CATS system? >>> >>> [KY] Sorry. How about the following changes: >>> _NEW_ >>> >>> “ R4: A model of the compute and network resources MUST be defined. Such >>> a model MUST characterize how metrics required by CATS systems are >>> abstracted out from the compute and network resources.” >>> >> >> Is this an information model or a data model? >> [KY] Information model. >> >>> >>> >>>> R5 : it seems like a >>>> requirement on the resource model not on the CATS. >>>> >>>> [KY] Similarly. How about the following changes: >>>> _NEW_ >>>> >>>> “R5: The Resource Model MUST be implementable in an interoperable >>>> manner. That is, metrics generated by this resource model MUST be >>>> understood and interoperable across independent CATS implementations.” >>>> >>> >> somewhat better. >> >> >>> >>> >>>> R6 : not clear at all. What is an "agent" here? >>>> >>>> [KY] It’s because CATS metrics may have multiple places for >>>> aggregation or normalization. So “agent” is used here to refer to this >>>> component. The component can be within CATS systems or belong to computing >>>> service provider. >>>> >>> >> You need to define this "agent". >> [KY] “agent” is avoided. See revision of R6: >> >> R6: The Resource Model MUST be executable in a scalable manner. That is, >> the Resource Model MUST be capable of being executed at the required time >> scale and at an affordable cost (e.g., memory footprint, energy, etc.). >> >>> >>> >>>> R7 : not executable unless we have someone/something deciding on >>>> the >>>> usefulness. My resource model may be very useful to me, but can >>>> be >>>> garbage to anyone else. I picked up my last car just because it >>>> has the >>>> best sound system, my friend didn't find that useful at all. >>>> >>>> [KY] Thanks for the criticisms.If it is not appropriate and compelling, >>>> authors could consider to delete this requirement. >>>> >>> >> OK. >> [KY]deleted. >> >> >>> >>>> R8 : not sure this is a system requirement rather seems like a >>>> requirement of how the workgroup should decide if they ever >>>> create with >>>> metrics. >>>> >>>> [KY ] How about the following: >>>> _NEW_ >>>> >>>> R8: Beyond metrics definition, CATS systems MUST be able to deal with >>>> the staleness handling of CATS metrics and indicate when to refresh >>>> the metrics, so that CATS components can know if a metric value is valid or >>>> not. >>>> >>> >> did you mean - >> >> CATS systems MUST refresh the metrics, so that CATS components can know >> if a metric value is valid or not. >> >> [KY] There is some slight difference. I mean the action on refreshing >> should be separate from the action on staleness handling. I’m not sure >> if refreshing should necessarily to be a “MUST” requirement, for >> example, sometimes, a metric maybe a static value. >> >> R7 is revised as follows: >> R7: CATS systems MUST support staleness handling for CATS metrics and >> provide indications of when metrics should be refreshed, so that CATS >> components can know if a metric value is valid or not. >> >> >> >>> >>> >>>> R9 : And again we have more than one requirement here. and I >>>> simply >>>> don't get the reason for a SHOULD in the applicability for >>>> non-CATS >>>> network. By the way, is CATS a system or Network? >>>> >>>> [KY] Here, the entity is CATS system. While the latter “SHOULD” requirement >>>> was added because during discussions before, it was asked to explain if >>>> CATS could be applied and extended in non-CATS systems. So there was a >>>> requirement here. >>>> >>> >> So why SHOULD? Why not MAY? non- CATS system by default is out of scope, >> I would say. But if they can use the CATS system to do something it may. If >> you start to consider non-CATS system requirements then will it not be a >> universal solution? >> [KY]Thanks. I change to use MAY. >> >> >>> >>>> R10 and R11 are good requirements as they are easily >>>> understandable and >>>> executable. However, I am not sure how the fast changing compute >>>> load >>>> can be handled to avoid path oscillation. How do they work >>>> together? >>>> >>>> [KY] This maybe a solution-wise question, and is closely related to the >>>> selection of CATS metrics. Selecting the most appropriate level of CATS >>>> metrics is important for path selection as well as avoiding rooting loops. >>>> >>> >> Can we add such consideration as verbose text? >> [KY] A short paragraph was added after R11. >> >> Compute metrics can change rapidly, which may lead to path oscillation if >> metrics are updated too frequently or become stale if updated too >> infrequently. R10 ensures that CATS components can negotiate metric types >> for consistent interpretation, while R11 requires that metrics be used in a >> way that avoids routing loops and path instability. Together, they balance >> responsiveness with stability. >> >>> >>> >>>> R12 to R17, you can simply create two requirements from all of >>>> them - >>>> one for metric collection and one for metric distribution. >>>> >>>> [KY] The current separation of requirements is primarily to elaborate >>>> on specific behaviors of CATS components. Operations such as aggregation >>>> and normalization are further supplements and requirements to the actions >>>> of collection and distribution. >>>> >>> >> Not sure this separation is really helping here. I would prefer more >> concrete requirements. The current approach just increases the number of >> requirements and they all are related. >> [KY] Former R12 and R13 were merged into: >> >> R12: MUST provide mechanisms for metric collection, including specifying >> the responsible entity for collection. >> >> Former R14 and R17 was deleted since they are should requirements. >> Relevant discussions were modified after R12. >> >> I’m not sure how to handle the former R16(currently R14). Since during >> revision, I found the current R11 and R14 are relevant. >> >> “The computation and use of metrics in CATS MUST be designed to avoid >> introducing routing loops or path oscillations when metrics are distributed >> and used for path selection.” & >> >> “R14: MUST NOT be sensitive to the update frequency of the metrics, and >> MUST NOT be dependent on or vulnerable to the mechanisms used to distribute >> the metrics.” >> >> *How about move R11 to section **‘**Use of CATS Metrics**’**, and merge >> it with R14?* >> >>> >>> >>>> This section uses "service provider" along with GPU >>>> utilizations, is >>>> this a CATS service provider or a cloud service provider, if >>>> it is >>>> CATS then I suggest to reference the CATS framework >>>> definition. >>>> [KY] OK. >>> >>> >>>> R18 : it says >>>> the affinity to a particular service instance may span more >>>> than >>>> one request, as in the AR/VR use case, where the previous >>>> client >>>> input is needed to render subsequent frames. >>>> >>>> This is to me more stickiness than affinity. That means when >>>> there is >>>> strike stickiness the CATS system MUST not migrate the service >>>> instance, rather try to pick a site where the stickiness can >>>> be >>>> preserved. How does CATS know if the session/transaction is >>>> stateful? >>>> >>>> [KY] As per CATS framework draft, >>>> “More importantly, the means for identifying a flow for ensuring >>>> instance affinity should be application-independent to avoid the need for >>>> service-specific instance affinity methods. However, service contact >>>> instance affinity information may be configurable on a per- service basis. >>>> For each service, the information can include the flow or packets >>>> identification type and means, affinity timeout value, etc. >>>> >>>> This document does not define any mechanism for defining or >>>> enforcing service contact instance affinity.” >>>> >>> >> Not sure I understand your response. Can you clarify? >> > [KY]The key point is: CATS does not need to know whether a session is > stateful or not. Instead, the service or the orchestration layer tells > CATS what constitutes a flow that requires affinity. > If I understand correctly, the differences between affinity and stickiness > are that affinity is “soft (prefer but allow migration)” while “stickiness > ” is hard (must not migrate). The requirement chooses to use affinity > here, since there can be specific cases allowing instance migration or > state synchronization among several coordinated instances, but this is > service-specific, and not determined by CATS. > >> >> >>> R19 : what is the difference between R18 and this one? >>> >>> >> >>> >>>> R21 : is this a requirement on CATS or the application clients? >>>> >>>> [KY] Thanks for pinpointing issues. R21 is a requirement on the client >>>> side, thus, I think R19 and R21 could be deleted. >>>> >>> >> OK. >> > [KY] Former R21 was deleted. Former R18 and R19 were merged into the > current R15. “R15: CATS systems MUST maintain instance affinity for > stateful sessions and transactions on a per-flow basis.” > > > >> >>>> # Appendix A: >>>> >>>> This might be helpful to some, but I am not sure why they are >>>> kept here >>>> while the text says - It is a temporary and procedural section >>>> which >>>> might be deleted or merged in future updates. >>>> >>>> [KY] As discussed in previous threads with other review directors, this >>>> part (Appendix A) will be deleted since it has fulfilled its intended role. >>>> >>> >> Good. >> >>> >>> >>>> # Appendix B : >>>> >>>> I find it strange that Appendix B yields some Normative >>>> requirements >>>> but this is not part of the main body. I would suggest either >>>> to remove >>>> the normative reference, or to move them into the main body of >>>> the >>>> document use case ( and recharter the working group to work on >>>> this as >>>> they are so interesting and important that they cannot be >>>> removed all >>>> together). >>>> >>>> [KY] Per internal working group discussions, ISAC is a promising >>>> scenario (supported by liaison statements from organizations like ETSI). >>>> However, it involves important requirements that necessitate significant >>>> updates to the current CATS framework, and these are unique to this >>>> scenario. The working group recommends addressing it in a later phase, >>>> hence its placement in the appendix. We will delete these normative >>>> requirements as suggested. >>>> >>> >> OK, just keep the description for information only if that is really >> required. >> > [KY] Normative requirements were avoided. > >> >>>> -- >>>> Cats mailing list -- cats@ietf.org >>>> To unsubscribe send an email to cats-leave@ietf.org >>>> >>>
- [Cats] draft-ietf-cats-usecases-requirements-10 i… Zaheduzzaman Sarker via Datatracker
- [Cats] Re: draft-ietf-cats-usecases-requirements-… kehan yao
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Zaheduzzaman Sarker
- [Cats] Re: draft-ietf-cats-usecases-requirements-… kehan yao
- [Cats] Re: draft-ietf-cats-usecases-requirements-… Zaheduzzaman Sarker