[Cats] Re: draft-ietf-cats-usecases-requirements-10 ietf last call Tsvart review

Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com> Tue, 13 January 2026 13:23 UTC

MIME-Version: 1.0
References: <176599616794.880158.17455224008756727143@dt-datatracker-5bd94c585b-pvtsm> <CABYiY4v7dyDn7Xj-QPZJ5smrMo_FmusNMu8nHan5E1GTbXTPNQ@mail.gmail.com> <CAEh=tcfagtk-d7AxKczfpqSO3GNpbb968xdTeti9dWveq5CB8g@mail.gmail.com> <CABYiY4v4nMBGaj81tdueaFGCPu6d_pwZt5PGr6AUS1prRLYAAA@mail.gmail.com>
In-Reply-To: <CABYiY4v4nMBGaj81tdueaFGCPu6d_pwZt5PGr6AUS1prRLYAAA@mail.gmail.com>
From: Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com>
Date: Tue, 13 Jan 2026 14:22:48 +0100
Message-ID: <CAEh=tce15i-i5qSGeL5nz2P1WN=qbcWnCaSmuUKC8zZr=XaWVQ@mail.gmail.com>
To: kehan yao <khyao78@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000000fdc2f064844e488"
Message-ID-Hash: TMD5W2MKVDZLWOMSU4IN4V2PD3YK6AWP
CC: tsv-art@ietf.org, cats@ietf.org, draft-ietf-cats-usecases-requirements.all@ietf.org, last-call@ietf.org
Precedence: list
Subject: [Cats] Re: draft-ietf-cats-usecases-requirements-10 ietf last call Tsvart review
Archived-At: <https://mailarchive.ietf.org/arch/msg/cats/6ml99zf-d1ZyFrcP6aDAxL1ann0>

Thanks for addressing my comments.

//Zahed

On Tue, Jan 13, 2026 at 7:07 AM kehan yao <khyao78@gmail.com> wrote:

> Dear Zahed,
>
> I've just submitted a new revision,
> https://datatracker.ietf.org/doc/draft-ietf-cats-usecases-requirements/
> Please see if it is appropriate, and also see my replies inline below.
>
> Best regards,
> Kehan
>
> Zaheduzzaman Sarker <zahed.sarker.ietf@gmail.com> 于2026年1月8日周四 19:36写道：
>
>> Hello Kehan,
>>
>> Don't worry about late replies, I am even slower than you as I took a
>> longer vacation. See my responses/reflections below.
>>
>> Based on this conversation I am expecting revision (s) of this document.
>>
>> //Zahed
>>
>> On Sat, Dec 20, 2025 at 2:26 PM kehan yao <khyao78@gmail.com> wrote:
>>
>>> Hi Zahed,
>>>
>>> Thank you very much for your quite detailed and professional review.
>>> It took me some while to prepare the answers. Sorry for late reply
>>> and please see replies inline below.
>>>
>>> Best regards,
>>> Kehan
>>>
>>> Zaheduzzaman Sarker via Datatracker <noreply@ietf.org> 于2025年12月18日周四
>>> 02:29写道：
>>>
>>>> Document: draft-ietf-cats-usecases-requirements
>>>> Title: Computing-Aware Traffic Steering (CATS) Problem Statement, Use
>>>> Cases,
>>>> and Requirements Reviewer: Zaheduzzaman Sarker Review result: Not Ready
>>>>
>>>> This document has been reviewed as part of the transport area review
>>>> team's
>>>> ongoing effort to review key IETF documents. These comments were written
>>>> primarily for the transport area directors, but are copied to the
>>>> document's
>>>> authors and WG to allow them to address any issues raised and also to
>>>> the IETF
>>>> discussion list for information.
>>>>
>>>> When done at the time of IETF Last Call, the authors should consider
>>>> this
>>>> review as part of the last-call comments they receive. Please always CC
>>>> tsv-art@ietf.org if you reply to or forward this review.
>>>>
>>>> I didn't find any particual issues related to transport protocols.
>>>> However,
>>>> this document made me wonder about number of things hence I am not sure
>>>> I
>>>> understood the context and requirements properly. With better
>>>> understanding my
>>>> view on transport protocol issues might change.
>>>>
>>>> As I had to read and understand this document. I encountered lots of
>>>> issues
>>>> which I noted below. I believe addressing those would make this
>>>> document more
>>>> understandbale. Form that point of view I don't think this document is
>>>> ready to
>>>> be published.
>>>>
>>>> By reading this document it seems CATS is trying to solve issues from
>>>> application layer to routing layer. However, it didn't give
>>>> comprehensive hints
>>>> where it should converge. Lots of times it describes an application
>>>> requirement
>>>> and tries to justify network responsibilities, which confuses me a
>>>> number of
>>>> times. It does not provide a clear description about ingestion points
>>>> at the
>>>> network and assumes the CATS service providers have full control of the
>>>> service
>>>> instance deployment. # Introduction
>>>>
>>>>      It says
>>>>
>>>>               Offloading compute intensive processing to the user
>>>> devices is
>>>>               not acceptable, since it would place pressure on local
>>>> resources
>>>>               such as the battery and incur some data privacy issues if
>>>> the
>>>>               needed data for computation is not provided locally.
>>>>
>>>>      Is that even an option for CATS or other network resources
>>>> deployments? If
>>>>      not, then why is it mentioned here?
>>>>
>>>> [KY] Sorry for the confusion. CATS focuses on network-side traffic
>>>> steering by perceiving computing resources, rather than end-device
>>>> offloading. This statement emphasizes that end-device offloading cannot
>>>> fully meet the computing needs of applications. Would it be acceptable to
>>>> simplify this sentence to: "Relying solely on end-side computing resource
>>>> enhancement cannot meet the computing requirements of all applications."
>>>>
>>>
>> Yes, something like that would be more useful.
>>  [KY] Please see the introduction for revision.
>>
>>
>>>
>>>>      What is an "edge site"? What is the "edge of a network"? I haven't
>>>> found
>>>>      any definition or description of those in this document to fully
>>>>      understand the meaning in the context.
>>>>
>>>> [KY] Thank you for pointing out the ambiguity in terminology. The
>>>> definition of "edge site" will reference the "service site" definition in
>>>> draft-ietf-cats-
>>>>
>>> framework-19, referring to a physical location (e.g., operator equipment
>>>> rooms, base station-side nodes) where edge computing resources are deployed
>>>> and can be accessed via edge routers. "Edge of a network" refers to the
>>>> architectural demarcation point where a corporate network connects to
>>>> third-party networks (consistent with the terminology definition in Section
>>>> 2 of this document). Subsequent revisions will add cross-references in the
>>>> introduction or terminology section to ensure contextual consistency.
>>>>
>>>
>> Great!!
>>  [KY] This document refers to CATS framework on the definition of
>> service site, service instance, and CATS service identifier(CS-ID). This
>> document only newly defines “edge computing”.
>>
>>
>>>
>>>>      The introduction gives an impression that CATS is only for edge
>>>> computing.
>>>>      Is that the intention?
>>>>
>>>> [KY] The reference to edge computing in the introduction is intended to
>>>> illustrate a typical application scenario, not to limit CATS' scope. The
>>>> core of CATS is "computing-aware traffic steering," which includes edge
>>>> computing, SD-WAN (mentioned in Section 4.4), and cloud-edge collaboration.
>>>> Per the charter requirements, only single-domain scenarios are considered
>>>> for now. Future revisions will clarify that CATS applies to all scenarios
>>>> requiring computing-aware traffic steering.
>>>>
>>>
>> OK, then let's make it clear that this is one of the uses of CATS.
>>  [KY] In the last second paragraph in the introduction, I added the
>> following clarification “ It should be noted that CATS is not limited to
>> edge computing scenarios, however, Section 3 of this document will focus on
>> edge computing scenarios for problem statement.”
>>
>>
>>>
>>>> # Definition of terms
>>>>
>>>>       It is not very clear to me about the difference between "Network
>>>> edge"
>>>>       and "provider network" that is mentioned in the CATS framework
>>>> draft.
>>>>       What are the differences? and why is "Network edge" defined here
>>>> and not
>>>>       used in the rest of the document?
>>>>
>>>> [KY] "Network edge" in this document focuses on "physical location
>>>> demarcation" (e.g., corporate network gateways), while "provider network"
>>>> in the CATS framework refers to network infrastructure managed by service
>>>> providers (including core networks, access networks, etc.). They are
>>>> defined from different dimensions: the former describes location
>>>> attributes, and the latter describes management attributes. "Network edge"
>>>> is only used in the terminology definition section and not elsewhere
>>>> because the core discussion focuses on "edge sites" (physical locations
>>>> hosting computing resources). Subsequent revisions will add explanatory
>>>> notes on terminology correlations to avoid confusion.
>>>>
>>>
>> If the term is not used in the document then there is no point of
>> defining it. I would suggest adding some text in the introduction section
>> to make it clear about the scope of the document as you described above.
>>  [KY] Same as above. This document refers to CATS framework on the
>> definition of service site, service instance, and CATS service
>> identifier(CS-ID). This document only newly defines “edge computing”.
>>
>>
>>>
>>>>       What kind of service identity are we talking about? Is this
>>>> service
>>>>       identity that can be obtained by a TLS client? or an ALTO endpoint
>>>>       Identifier? or something else?
>>>>
>>>> [KY] The "service identifier" in this document is a unified identifier
>>>> used by clients to access services (e.g., application-layer service IDs),
>>>> independent of TLS client identifiers or ALTO endpoint identifiers. Its
>>>> core function is to enable "mapping a single identifier to multiple service
>>>> instance addresses" (as specified in Requirement R1), with the specific
>>>> format defined by upper-layer applications.
>>>>
>>>
>> Please make it clear in the text or define the term "service identifier"
>> the way you want the readers to interpret it.
>>  [KY] I refer to CATS framework on the definition of CATS service
>> identifier(CS-ID).
>>
>>
>>>
>>>> # Problem statement
>>>>
>>>>        It says -
>>>>              , a number of representative cities have deployed
>>>> multi-edge sites
>>>>              and the typical applications, and there are more edge
>>>> sites to be
>>>>              deployed in the future.
>>>>
>>>>        I find this unnecessary, specially when there is no provided
>>>> information
>>>>        to verify this claim.
>>>>
>>>> [KY] Thank you for the feedback. This descriptive statement lacks
>>>> practical data support and will be deleted in subsequent revisions. The
>>>> core logic of the problem statement—"multi-edge site deployment leads to
>>>> scattered service instances, requiring computing-aware steering to resolve
>>>> resource mismatches"—will be strengthened to eliminate irrelevant content.
>>>>
>>>
>> OK. Then focus on the issues on the multi-edge deployments.
>>  [KY] Please see section 3.1 “Multi-deployment of Edge Service Sites and
>> Service” for revision.
>>
>>
>>>
>>>       Section 3.1 mentions one expired ALTO draft and ALTO protocol, but
>>>> then
>>>>        it does not really say if ALTO helps picking up the best node in
>>>> the
>>>>        network to reach then what is left to be fixed and what
>>>> differentiate
>>>>        ALTO solution from CATS.
>>>>
>>>>
>>>>        As per charter CATS is for network nodes to pick the right place
>>>> to
>>>>        deploy the services. But I fail to see that part in this problem
>>>>        statement.
>>>> [KY] The core of the ALTO protocol is to provide applications with
>>>> network topology, resource information, and other data to assist
>>>> applications in independently selecting service instances. However, it does
>>>> not involve "real-time perception of computing resource metrics or dynamic
>>>> steering execution" (though ALTO could be extended to support that).
>>>> Essentially, ALTO empowers applications to make decisions, while CATS’ core
>>>> value lies in enabling networks to make decisions: ① Networks perceive both
>>>> network metrics and computing metrics (e.g., CPU/GPU load); ② Traffic
>>>> steering (including instance discovery and path selection) is implemented
>>>> through the network; ③ The network proactively supports real-time dynamic
>>>> adjustments (e.g., user mobility, sudden load changes). Currently, ALTO is
>>>> cited in Section 3.1 to illustrate that ALTO can help applications better
>>>> deploy instances by perceiving computing resource information.
>>>
>>>
>> You need to clarify this in the document.
>>  [KY] I’ve explained this in section 3.2. as follows:
>>
>> ”Such traffic steering can be initiated either by the application layer
>> or by the network layer: the application layer may actively query for the
>> optimal node and guide traffic using mechanisms such as the ALTO
>> protocol[RFC7285], while the network layer may leverage Anycast
>> routing[RFC4786], where routering systems automatically distribute traffic
>> according to routing tables in an application- transparent manner. However,
>> regardless of whether the steering is performed by the application or the
>> network, the core criteria for selecting "closest" or "close" sites often
>> rely solely on communication metrics (such as physical distance, hop count,
>> or network latency). This decision logic can easily lead to suboptimal
>> choices, meaning that the "closest" site is not always the "best" one. ”
>>
>>>
>>>
>>>>
>>>>        What is the difference between an "edge node" and "edge site"?
>>>> and what
>>>>        are their relation with "service site", "service instance"
>>>> defined in
>>>>        CATS framework? If they are supposed to mean the same, why
>>>> aren't the
>>>>        CATS framework defined terms used here?
>>>>
>>>> [KY] We apologize for the confusion caused by terminology. "Edge node"
>>>> and "edge site" are similar in meaning, referring to service sites
>>>> physically close to users. Subsequent revisions will uniformly adopt
>>>> terminology defined in the CATS framework.
>>>>
>>>
>> OK.
>>  [KY] unify on “edge service site”
>>
>>          It says -
>>>>
>>>>            If the resources are insufficient to support new instances,
>>>> the
>>>>            operator can be informed to increase the hardware resources.
>>>>
>>>>         OK,fine. Does CATS do that? Do we need a protocol to inform the
>>>>         operator? Who tells them about the need to invest in hardware.
>>>> Sorry,
>>>>         why are we talking about this here in the problem statement?
>>>>
>>>>          We have a section in the problem statement that talks about
>>>>          multi-deployment of the edge sites and services, but then it
>>>> ends
>>>>          saying "where to locate service instances and when to create
>>>> new ones
>>>>          in order to provide the right levels of resource to support
>>>> user
>>>>          demands" is out of scope of CATS. So, what are really the
>>>> problems
>>>>          here?
>>>>
>>>> [KY] Please allow us to address these two consecutive questions
>>>> collectively. First, adding hardware computing resources falls under the
>>>> scope of computing resource deployment, which is not addressed by CATS. The
>>>> specific emphasis on computing instance deployment in Section 3.1 is due to
>>>> the document’s evolutionary history. Computing resource deployment,
>>>> computing instance selection, and computing service assurance form a
>>>> lifecycle with tight interdependencies—instance deployment significantly
>>>> impacts and determines service quality. During the establishment of the
>>>> CATS working group, most participants had a network background rather than
>>>> a computing background, so there was a need to clarify the specific network
>>>> problems CATS aims to solve. After extensive discussions, the focus
>>>> narrowed to tasks achievable by the network (routing → computing instance
>>>> selection). However, the working group agreed that including the closely
>>>> related instance deployment phase in the problem statement would provide
>>>> supplementary context and a more comprehensive discussion.
>>>>
>>>
>> I found that statement in the document out of context, as a reader and as
>> I don't have the history of what is discussed, please clarify what is
>> relevant for CATS and what is not when you talk about computing
>> resource deployments.
>>  [KY] The main relevance of instance deployment to CATS traffic
>> scheduling lies in the definition and selection of compute metrics. This
>> relationship has been revised and clarified in Section 3.1.
>>
>>
>>>
>>>>          ## Section 3.2 says -
>>>>
>>>>             Traffic is steered to an edge site that is "closest" or to
>>>> one of a
>>>>             few "close" sites using load-balancing
>>>>
>>>>          Who is steering this and is this load-balancing static? Is
>>>> there
>>>>          support for mid-session steering and load-balancing? if these
>>>> are
>>>>          dynamic and only done at the beginning of the client sessions
>>>> then it
>>>>          should be already possible to pick the right edge site but not
>>>> the
>>>>          closed one.
>>>>
>>>> [KY] Would it be acceptable to revise the second paragraph of Section
>>>> 3.2 as follows:
>>>>
>>> _NEW_
>>>>
>>> With conventional scheduling mechanisms (e.g., Anycast and Equal Cost
>>>> Multi Path (ECMP)), the network can select one or more optimal network
>>>> paths for services (e.g., directing traffic to the nearest service site).
>>>> However, this does not guarantee the selection of the most suitable service
>>>> instance, as computing resource information is equally critical to instance
>>>> selection as network information:
>>>>
>>>
>> Not really helping that much. Still I am not sure when this selection is
>> done and how cats interact with other scheduling mechanisms. How would the
>> routing decision and decision based on compute resource information would
>> converse (no the whole solution does not need to be documented here) ? It
>> needs a bit more description to understand "the most suitable service
>> instance" case.
>>  [KY] Same as above. Please see the revision in section 3.2.
>>
>
>
>>          "we assume" who are we? authors or the wg or the IETF?
>>>>
>>>> [KY] Is it okay to change this paragraph like
>>>>
>>>> _New_
>>>>
>>>> “It's assumed that Service instances are multi-site deployed, and they
>>>> are reachable through a network infrastructure.”
>>>>
>>>
>> OK.
>>
> [KY] See revision.
>
>>
>>
>>>         Please describe an "edge router".
>>>>
>>>> [KY] “Edge router” is avoided since its only used once.
>>>>
>>>
>> OK.
>>
>    [KY] “Edge router” was abandoned.
>
>>
>>>>         It says
>>>>               selection of one of candidate service instances is done
>>>> using
>>>>               traffic steering methods, where the steering decision may
>>>> take
>>>>               into account pre-planned policies (assignment of certain
>>>> clients
>>>>               to certain service instances), realize shortest-path to
>>>> the
>>>>               'closest' service instance, or utilize more complex and
>>>> possibly
>>>>               dynamic metric information, such as load of service
>>>> instances,
>>>>               latency experienced or similar, for a more dynamic
>>>> selection of a
>>>>               suitable service instance.
>>>>
>>>>          Why can't anycast routing be used here? or is that the idea
>>>> here?
>>>>
>>>> [KY] As mentioned earlier, anycast can only schedule traffic based on
>>>> "network reachability" or "shortest path" and cannot consider computing
>>>> resource status (e.g., CPU load, GPU availability). In contrast, CATS’ core
>>>> is "computing awareness," which integrates both computing and network
>>>> metrics—essentially functioning as computing-aware dynamic anycast.
>>>>
>>>
>> Then why don't we call it that? and provide comparisons with vanila
>> anycast.
>>  [KY] Same as above. Please see the revision in section 3.2.
>>
>>
>>>
>>>>          It says
>>>>
>>>>                  It is important to note that clients may move. This
>>>> means that
>>>>                  the service instance that was "best" at one moment
>>>> might no
>>>>                  longer be best when a new service request is issued.
>>>>
>>>>         OK, will CATS solve the issue of mid-session mobility? The
>>>> service
>>>>         instances will have states, those states need to be migrated to
>>>> the new
>>>>         site so it is not a plug-n-play solution unless the service is
>>>>         completely stateless. Does this mean the services that CATS can
>>>>         entertain need to be stateless? This seems like a requirement
>>>> to the
>>>>         service instances. I am asking this as afaik the big use cases
>>>> written
>>>>         in the document AR or VR or vehicle that maintain lots of
>>>> states in the
>>>>         servers and at the client and they even need stickiness. Just
>>>> moving to
>>>>         a low load service instance might not be the ideal solution.
>>>> So, what
>>>>         is the main problem CATS is going to solve in this context?
>>>>
>>>> [KY] This is a crucial question, closely related to subsequent use
>>>> cases and requirements for service stickiness. As a network-layer routing
>>>> technology, CATS aims to support diverse service requirements, including
>>>> both stateful and stateless services. However, given the complexity of
>>>> services (especially mobility), managing service states could become
>>>> extremely intricate. Currently, CATS does not intend to participate in
>>>> application decision logic but rather to provide applications with a
>>>> flexible, network-based routing method for instance selection. Specific
>>>> state management remains the responsibility of the application layer.
>>>>
>>>
>> Please clarify this in the document, it is not clear right now. As you
>> mention this is very critical to understand the requirements.
>>  [KY] Please see the revision in section 3.2 as follows:
>>
>> From a routing perspective, CATS is an application-transparent routing
>> mechanism that can provide scheduling for both stateful and stateless
>> services. However, in scenarios where clients move and the service is
>> stateful, CATS requires the application to explicitly indicate whether it
>> allows the routing system to enable CATS functionality. Otherwise,
>> mid-session scheduling triggered by CATS may cause application context
>> inconsistency among service sites or even service interruption.
>>
>>>
>>>
>>>> # Section 4.1
>>>>
>>>>         It is not clear to me the meaning or "dynamically steer
>>>> traffic". does
>>>>         it mean start of a service session or it mean mid-session
>>>> steering or
>>>>         both? Can this be clarified?
>>>>
>>>> [KY] This question is related to the previous one. CATS can undoubtedly
>>>> be applied at the start of a session. It can also be used for mid-session
>>>> steering (e.g., user mobility, sudden load surges, or instance failures).
>>>> However, the specific implementation of mid-session steering depends on
>>>> application requirements for the network. Applications may require the
>>>> network to enable CATS capabilities continuously, with state maintenance
>>>> and context migration handled by the applications themselves.
>>>> Alternatively, applications may explicitly dis-enable CATS during
>>>> deployment, retaining full control over state management and instance
>>>> selection.
>>>>
>>>
>> Again, this needs to be clarified in the document that mid session
>> steering is application dependent.
>>  [KY] Please see revisions in section 4.1 as follows:
>>
>> For AR/VR scenarios, traffic should only be steered at the start of a
>> service session, because mid-session steering involves significant context
>> migration, which is costly and requires explicit application participation
>> and approval.
>>
>>>
>>>
>>>> # Section 4.2
>>>>
>>>>         In the first paragraph, it describes the need to increase the
>>>>         transmission capacity, video processing and network bandwidth.
>>>> I fail
>>>>         to see what is the relation toward CATS.
>>>>
>>>> [KY] The authors will condense this section. It primarily provides
>>>> background information and is somewhat redundant.
>>>>
>>>
>> OK, I will read it again when it is ready.
>>  [KY] Please see revisions in section 4.2.
>>
>>
>>>
>>>>         It says -
>>>>
>>>>             The notion of sending the request to the "nearest" edge
>>>> node is
>>>>             important for being able to collate the video information of
>>>>             "nearby" vehicles, using relative location information
>>>> among the
>>>>             vehicles. Furthermore, data privacy may lead to a
>>>> requirement to
>>>>             process the data by an edge node (or an adjacent vehicle as
>>>> a
>>>>             cluster node ) as close to the source as possible to limit
>>>> the
>>>>             data's spread across many network components in the network
>>>>
>>>>         So, here the video should not be processed anywhere else so it
>>>> is kind
>>>>         of fixed with the "nearest edge node" policy. Do we need CATS
>>>> for this?
>>>>
>>>> [KY] The following paragraph explains that the nearest node may be
>>>> overloaded and unable to provide services, thereby justifying the need for
>>>> CATS. However, this paragraph is also redundant and will be appropriately
>>>> condensed by the authors.
>>>>
>>>
>> OK, I will read it again when it is ready.
>>  [KY] Same as above.
>>
>>
>>>
>>>>         In the 3rd paragraph, it starts to talk about "closest" but it
>>>> was
>>>>         discussing "nearest" in the previous paragraph. Do they mean
>>>> the same
>>>>         thing? If yes, then please use the same terminologies.
>>>>
>>>> [KY] Thank you. These two terms are synonymous, and "closest" will be
>>>> used uniformly in subsequent revisions.
>>>>
>>>
>> OK.
>>
>     [KY] “nearest” is avoided to use.
>
>
>>>
>>>>         According to my understanding these scenarios can be satisfied
>>>> by ALTO
>>>>         protocol as there the network provides information to the
>>>> clients to
>>>>         pick the right service instance or even change. So, it is not
>>>> clear to
>>>>         me why this is a CATS use case.
>>>>
>>>> [KY] As previously discussed, standard ALTO does not integrate
>>>> computing metrics or support dynamic steering during vehicle mobility.
>>>> These capabilities require closer collaboration between network components,
>>>> and CATS offers a network-layer solution.
>>>>
>>>
>> What is important here is to tell how much the network can do without
>> changing the application (both for stateful and stateless ).
>>  [KY] See revisions in section 4.2 as follows:
>>
>> As mentioned in the problem statement section, CATS is an
>> application-transparent network-layer solution. Unlike ALTO[RFC7285], it
>> enables coordinated scheduling of network and computing resources without
>> requiring application modifications. For moving vehicles, CATS supports
>> smooth and proactive context migration between edge nodes, provided that
>> the application allows it, to maintain service continuity.
>>
>>>
>>>
>>>>         Also I didn't find any discussion on moving speed, vehicles
>>>> usually
>>>>         move fast, that means it is possible that by the time CATS
>>>> realizes the
>>>>         compute or bandwidth is loaded the vehicle has moved to another
>>>>         basestation that might need a new PE/edge site all together.
>>>> What are
>>>>         the considerations on the requirements regarding this?
>>>>
>>>> [KY] Since only Section 4.2 (Intelligent Transportation) is closely
>>>> related to mobility among all use cases, no common technical requirements
>>>> for speed have been proposed. However, the authors will consider adding
>>>> relevant discussions in the technical requirements section—for example,
>>>> addressing how speed impacts metric update frequency in the context of
>>>> metric distribution requirements.
>>>>
>>>
>> That's needed and would be useful.
>>  [KY] A short paragraph was added before the current R14 as follows.
>>
>> For example, In highly mobile scenarios, such as fast-moving vehicles
>> mentioned in Section 4.2, compute metrics can quickly become outdated as
>> the UE moves across base stations and edge service sites, potentially
>> requiring more frequent updates.  However, updates should remain stable and
>> avoid excessive overhead.
>>
>>>
>>>
>>>> # Section 4.3
>>>>
>>>>         This section is clear about the use case of having decentralized
>>>>         storage, but previous use cases are not clear about how they
>>>> relate to
>>>>         CATS.
>>>>
>>>> [KY] Thank you. The PLC in this use case is computing-sensitive. Would
>>>> it be acceptable to add the following discussion at the end of Section 4.3:
>>>>
>>> _NEW_
>>>>
>>> In this use case, CATS is required for selecting the optimal PLC
>>>> instance and storage node, ensuring low latency and reliability for data
>>>> processing in industrial scenarios, as well as low latency for data
>>>> reading/writing during twin control processes.
>>>>
>>>
>>>
>>>> # Requirements
>>>>
>>>>        R1 : is this to inform the clients accessing the service
>>>> instances? or
>>>>        is this for the CATS system to decide where to send a client
>>>> request?
>>>>        Throughout the document so far the discussion on applications
>>>> made me
>>>>        wonder this.
>>>>         What is "real-time system state"?
>>>>
>>>> [KY] It’s for the CATS system to decide where to send a service flow
>>>> to. Usually, it’s hard to gauge what "real-time" is, because there is no
>>>> benchmark applicable to all use cases.
>>>>
>>>
>> The "real-time system state" still needs to be defined if you want to use
>> it. Otherwise.. It is hard to understand the requirements.
>>  [KY] A short paragraph after R1 and R2 was added to explain “up-to-date”
>> :
>>
>> Note: The term "up-to-date" herein refers to the latest metric
>> information collected by the system in accordance with the preset metric
>> update cycle. The principle for setting the cycle is  generally
>> pre-determined by the network. For example, based on historical statistical
>> data, a relatively appropriate update cycle (either second-level or
>> millisecond-level) is selected for a specific type or certain types of
>> services.
>>
>>>
>>>
>>>>        R2 : see comments for R1, what is the periodicity for the
>>>> "up-to-date
>>>>        status"? without proper understanding of that, it is hard to
>>>> understand
>>>>        the requirement.
>>>>
>>>> [KY] Similarly, this requirement doesn’t want to limit a time duration
>>>> for "up-to-date," considering there are many different use cases—some are
>>>> sensitive in milliseconds, while others may be in seconds.
>>>>
>>>
>> So how does CATS know about this? Does CATS have ideas about what kind of
>> use case is in play?
>>  [KY] Same as above revisions.
>>
>>
>>>
>>>>        R3 : are these service instances in the participating edges
>>>>        administered, implemented and deployed by several entities in the
>>>>        service provider's network? If not then why is this a
>>>> requirement? Now,
>>>>        even if we agree on a certain metric, say "CPU load" and my
>>>> instance has
>>>>        5 fully loaded CPU and 5 ideal ones, then my instance is 50%
>>>> loaded,
>>>>        will it be an understandable metric for another instance about my
>>>>        situation?
>>>
>>> [KY] This requirement applies not only to consistent understanding among
>>> computing service providers but also to network equipment vendors, who must
>>> share a uniform interpretation of computing resource information.
>>> Additionally, per the charter, the current scope is limited to a single
>>> domain. However, future scenarios may involve cross-provider deployment of
>>> user services. Detailed specifications for ensuring consistent metric
>>> interpretation are provided in draft-ietf-cats-metrics-definition.
>>>
>>
>> I still don't understand if a single domain is configured and
>> administered by one entity, isn't possible to configure those with a
>> common understanding of the metrics by that domain?
>> It also seems like a very hard metric definition to agree with among all
>> the service providers and vendors. Maybe I don't understand the goal here.
>>
>
>     [KY]Even within the administrative domain of a single service
> provider, the computing resources deployed at different service sites may
> come from different vendors, and there can be significant differences in
> the models and types of these computing resources. For example, a CPU load
> of 50% from vendor A may still be able to handle 5 additional service
> requests, while a CPU load of 50% from vendor B may be able to handle 10
> additional service requests. In such cases, a simple CPU load value cannot
> adequately reflect the actual service capacity. Therefore, it is necessary
> to abstract these metrics into a scoring system that can be consistently
> understood across implementations. This abstraction also helps network
> routing protocols advertise resource information, as routing protocols are
> generally not well-suited to encapsulating and extending overly complex
> resource details.
>
>
>>
>>>
>>>> R4 : Who are "we" again? and I am not sure I understand
>>>>        the requirements ( there are actually two requirements here ) .
>>>> What are
>>>>        the requirements on the CATS system?
>>>
>>> [KY] Sorry. How about the following changes:
>>> _NEW_
>>>
>>> “ R4: A model of the compute and network resources MUST be defined. Such
>>> a model MUST characterize how metrics required by CATS systems are
>>> abstracted out from the compute and network resources.”
>>>
>>
>> Is this an information model or a data model?
>>  [KY] Information model.
>>
>>>
>>>
>>>> R5 : it seems like a
>>>>        requirement on the resource model not on the CATS.
>>>>
>>>> [KY] Similarly. How about the following changes:
>>>> _NEW_
>>>>
>>>> “R5: The Resource Model MUST be implementable in an interoperable
>>>> manner. That is, metrics generated by this resource model MUST be
>>>> understood and interoperable across independent CATS implementations.”
>>>>
>>>
>> somewhat better.
>>
>>
>>>
>>>
>>>>        R6 : not clear at all. What is an "agent" here?
>>>>
>>>> [KY] It’s because CATS metrics may have multiple places for
>>>> aggregation or normalization. So “agent” is used here to refer to this
>>>> component. The component can be within CATS systems or belong to computing
>>>> service provider.
>>>>
>>>
>> You need to define this "agent".
>>  [KY] “agent” is avoided. See revision of R6:
>>
>> R6: The Resource Model MUST be executable in a scalable manner. That is,
>> the Resource Model MUST be capable of being executed at the required time
>> scale and at an affordable cost (e.g., memory footprint, energy, etc.).
>>
>>>
>>>
>>>>        R7 : not executable unless we have someone/something deciding on
>>>> the
>>>>        usefulness. My resource model may be very useful to me, but can
>>>> be
>>>>        garbage to anyone else. I picked up my last car just because it
>>>> has the
>>>>        best sound system, my friend didn't find that useful at all.
>>>>
>>>> [KY] Thanks for the criticisms.If it is not appropriate and compelling,
>>>> authors could consider to delete this requirement.
>>>>
>>>
>> OK.
>>  [KY]deleted.
>>
>>
>>>
>>>>        R8 : not sure this is a system requirement rather seems like a
>>>>        requirement of how the workgroup should decide if they ever
>>>> create with
>>>>        metrics.
>>>>
>>>> [KY ] How about the following:
>>>> _NEW_
>>>>
>>>>  R8: Beyond metrics definition, CATS systems MUST be able to deal with
>>>>  the staleness handling of CATS metrics and indicate when to refresh
>>>> the metrics, so that CATS components can know if a metric value is valid or
>>>> not.
>>>>
>>>
>> did you mean -
>>
>>  CATS systems MUST refresh the metrics, so that CATS components can know
>> if a metric value is valid or not.
>>
>> [KY] There is some slight difference. I mean the action on refreshing
>> should be separate from the action on staleness handling. I’m not sure
>> if refreshing should necessarily to be a “MUST” requirement, for
>> example, sometimes, a metric maybe a static value.
>>
>> R7 is revised as follows:
>> R7: CATS systems MUST support staleness handling for CATS metrics and
>> provide indications of when metrics should be refreshed, so that CATS
>> components can know if a metric value is valid or not.
>>
>>
>>
>>>
>>>
>>>>        R9 : And again we have more than one requirement here. and I
>>>> simply
>>>>        don't get the reason for a SHOULD in the applicability for
>>>> non-CATS
>>>>        network. By the way, is CATS a system or Network?
>>>>
>>>> [KY] Here, the entity is CATS system. While the latter “SHOULD” requirement
>>>> was added because during discussions before, it was asked to explain if
>>>> CATS could be applied and extended in non-CATS systems. So there was a
>>>> requirement here.
>>>>
>>>
>> So why SHOULD? Why not MAY? non- CATS system by default is out of scope,
>> I would say. But if they can use the CATS system to do something it may. If
>> you start to consider non-CATS system requirements then will it not be a
>> universal solution?
>>  [KY]Thanks. I change to use MAY.
>>
>>
>>>
>>>>        R10 and R11 are good requirements as they are easily
>>>> understandable and
>>>>        executable. However, I am not sure how the fast changing compute
>>>> load
>>>>        can be handled to avoid path oscillation. How do they work
>>>> together?
>>>>
>>>> [KY] This maybe a solution-wise question, and is closely related to the
>>>> selection of CATS metrics. Selecting the most appropriate level of CATS
>>>> metrics is important for path selection as well as avoiding rooting loops.
>>>>
>>>
>> Can we add such consideration as verbose text?
>>  [KY] A short paragraph was added after R11.
>>
>> Compute metrics can change rapidly, which may lead to path oscillation if
>> metrics are updated too frequently or become stale if updated too
>> infrequently. R10 ensures that CATS components can negotiate metric types
>> for consistent interpretation, while R11 requires that metrics be used in a
>> way that avoids routing loops and path instability. Together, they balance
>> responsiveness with stability.
>>
>>>
>>>
>>>>        R12 to R17, you can simply create two requirements from all of
>>>> them -
>>>>        one for metric collection and one for metric distribution.
>>>>
>>>> [KY] The current separation of requirements is primarily to elaborate
>>>> on specific behaviors of CATS components. Operations such as aggregation
>>>> and normalization are further supplements and requirements to the actions
>>>> of collection and distribution.
>>>>
>>>
>> Not sure this separation is really helping here. I would prefer more
>> concrete requirements. The current approach just increases the number of
>> requirements and they all are related.
>>  [KY] Former R12 and R13 were merged into:
>>
>> R12: MUST provide mechanisms for metric collection, including specifying
>> the responsible entity for collection.
>>
>> Former R14 and R17 was deleted since they are should requirements.
>> Relevant discussions were modified after R12.
>>
>> I’m not sure how to handle the former R16(currently R14). Since during
>> revision, I found the current R11 and R14 are relevant.
>>
>> “The computation and use of metrics in CATS MUST be designed to avoid
>> introducing routing loops or path oscillations when metrics are distributed
>> and used for path selection.” &
>>
>> “R14: MUST NOT be sensitive to the update frequency of the metrics, and
>> MUST NOT be dependent on or vulnerable to the mechanisms used to distribute
>> the metrics.”
>>
>> *How about move R11 to section **‘**Use of CATS Metrics**’**, and merge
>> it with R14?*
>>
>>>
>>>
>>>>            This section uses "service provider" along with GPU
>>>> utilizations, is
>>>>            this a CATS service provider or a cloud service provider, if
>>>> it is
>>>>            CATS then I suggest to reference the CATS framework
>>>> definition.
>>>> [KY] OK.
>>>
>>>
>>>>        R18 : it says
>>>>             the affinity to a particular service instance may span more
>>>> than
>>>>             one request, as in the AR/VR use case, where the previous
>>>> client
>>>>             input is needed to render subsequent frames.
>>>>
>>>>           This is to me more stickiness than affinity. That means when
>>>> there is
>>>>           strike stickiness the CATS system MUST not migrate the service
>>>>           instance, rather try to pick a site where the stickiness can
>>>> be
>>>>           preserved. How does CATS know if the session/transaction is
>>>> stateful?
>>>>
>>>> [KY] As per CATS framework draft,
>>>> “More importantly, the means for identifying a flow for ensuring
>>>> instance affinity should be application-independent to avoid the need for
>>>> service-specific instance affinity methods.  However, service contact
>>>> instance affinity information may be configurable on a per- service basis.
>>>> For each service, the information can include the flow or packets
>>>> identification type and means, affinity timeout value, etc.
>>>>
>>>>    This document does not define any mechanism for defining or
>>>> enforcing service contact instance affinity.”
>>>>
>>>
>> Not sure I understand your response. Can you clarify?
>>
> [KY]The key point is: CATS does not need to know whether a session is
> stateful or not. Instead, the service or the orchestration layer tells
> CATS what constitutes a flow that requires affinity.
> If I understand correctly, the differences between affinity and stickiness
> are that affinity is “soft (prefer but allow migration)” while “stickiness
> ” is hard (must not migrate). The requirement chooses to use affinity
> here, since there can be specific cases allowing instance migration or
> state synchronization among several coordinated instances, but this is
> service-specific, and not determined by CATS.
>
>>
>>
>>>        R19 : what is the difference between R18 and this one?
>>>
>>>
>>
>>>
>>>>        R21 : is this a requirement on CATS or the application clients?
>>>>
>>>> [KY] Thanks for pinpointing issues. R21 is a requirement on the client
>>>> side, thus, I think R19 and R21 could be deleted.
>>>>
>>>
>>  OK.
>>
>     [KY] Former R21 was deleted. Former R18 and R19 were merged into the
> current R15. “R15: CATS systems MUST maintain instance affinity for
> stateful sessions and transactions on a per-flow basis.”
>
>
>
>>
>>>> # Appendix A:
>>>>
>>>>        This might be helpful to some, but I am not sure why they are
>>>> kept here
>>>>        while the text says - It is a temporary and procedural section
>>>> which
>>>>        might be deleted or merged in future updates.
>>>>
>>>> [KY] As discussed in previous threads with other review directors, this
>>>> part (Appendix A) will be deleted since it has fulfilled its intended role.
>>>>
>>>
>> Good.
>>
>>>
>>>
>>>> # Appendix B :
>>>>
>>>>         I find it strange that Appendix B yields some Normative
>>>> requirements
>>>>         but this is not part of the main body. I would suggest either
>>>> to remove
>>>>         the normative reference, or to move them into the main body of
>>>> the
>>>>         document use case ( and recharter the working group to work on
>>>> this as
>>>>         they are so interesting and important that they cannot be
>>>> removed all
>>>>         together).
>>>>
>>>> [KY] Per internal working group discussions, ISAC is a promising
>>>> scenario (supported by liaison statements from organizations like ETSI).
>>>> However, it involves important requirements that necessitate significant
>>>> updates to the current CATS framework, and these are unique to this
>>>> scenario. The working group recommends addressing it in a later phase,
>>>> hence its placement in the appendix. We will delete these normative
>>>> requirements as suggested.
>>>>
>>>
>> OK, just keep the description for information only if that is really
>> required.
>>
>     [KY] Normative requirements were avoided.
>
>>
>>>> --
>>>> Cats mailing list -- cats@ietf.org
>>>> To unsubscribe send an email to cats-leave@ietf.org
>>>>
>>>

[Cats] draft-ietf-cats-usecases-requirements-10 i… Zaheduzzaman Sarker via Datatracker
[Cats] Re: draft-ietf-cats-usecases-requirements-… kehan yao
[Cats] Re: draft-ietf-cats-usecases-requirements-… Zaheduzzaman Sarker
[Cats] Re: draft-ietf-cats-usecases-requirements-… kehan yao
[Cats] Re: draft-ietf-cats-usecases-requirements-… Zaheduzzaman Sarker