draft-joung-detnet-stateless-fair-queuing-08.txt   draft-joung-detnet-stateless-fair-queuing-09.txt 
DetNet Working Group J. Joung DetNet Working Group J. Joung
Internet-Draft Sangmyung University Internet-Draft Sangmyung University
Intended status: Standards Track J. Ryoo Intended status: Standards Track J. Ryoo
Expires: 1 September 2026 T. Cheung Expires: 1 October 2026 T. Cheung
ETRI ETRI
Y. Li Y. Li
Huawei Huawei
P. Liu P. Liu
China Mobile China Mobile
28 February 2026 30 March 2026
Latency Guarantee with Stateless Fair Queuing Latency Guarantee with Stateless Fair Queuing
draft-joung-detnet-stateless-fair-queuing-08 draft-joung-detnet-stateless-fair-queuing-09
Abstract Abstract
This document specifies the framework and the operational procedure This document specifies the implementation details for the framework
for deterministic networking with a set of rate based work conserving specified in ITU-T Y.3129 [Y.3129] and ITU-T Y.3148 [Y.3148]. The
packet schedulers. The framework guarantees end-to-end (E2E) latency framework guarantees end-to-end (E2E) latency bounds to flows. The
bounds to flows. The schedulers in core nodes do not need to schedulers in core nodes do not need to maintain flow states.
maintain flow states. Instead, the entrance node of a flow marks an Instead, the entrance node of a flow marks an ideal service
ideal service completion time according to a fluid model, called completion time according to a fluid model, called Finish Time (FT),
Finish Time (FT), of a packet in the packet header. The subsequent of a packet in the packet header. The subsequent core nodes update
core nodes update the FT by adding the delay factor, which is a the FT by adding a delay factor, which is a function of the flow and
function of the flow and the nodes. The packets in the queue of the the nodes. The packets in the queue of the scheduler are served in
scheduler are served in the ascending order of FT. This mechanism is the ascending order of FT. This mechanism is called the stateless
called the stateless fair queuing. The result is that flows are fair queuing. The result is that flows are isolated from each other
isolated from each other almost perfectly. The latency bound of a almost perfectly. The latency bound of a flow depends only on the
flow depends only on the flow's intrinsic parameters such as the flow's intrinsic parameters such as the maximum burst size and the
maximum burst size and the service rate, except the link capacities service rate, except the link capacities and the maximum packet
and the maximum packet length among other flows sharing each output length among other flows sharing each output link with the flow.
link with the flow. Furthermore, this document specifies an This document specifies the metadata, formats of metadata, the
approximation of stateless fair queuing implemented via a strict admission control procedure, and an approximation of stateless fair
priority (SP) scheduler. This approach maintains a guaranteed end- queuing implemented via a strict priority (SP) scheduler.
to-end (E2E) latency bound.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 1 September 2026. This Internet-Draft will expire on 1 October 2026.
Copyright Notice Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 33 skipping to change at page 2, line 33
described in Section 4.e of the Trust Legal Provisions and are described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License. provided without warranty as described in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Terms Used in This Document . . . . . . . . . . . . . . . 4 2.1. Terms Used in This Document . . . . . . . . . . . . . . . 4
2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 4
3. Conventions Used in This Document . . . . . . . . . . . . . . 5 3. Conventions Used in This Document . . . . . . . . . . . . . . 5
4. Fair Queuing Schedulers . . . . . . . . . . . . . . . . . . . 5 4. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 5
5. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6. Work Conserving Stateless Core Fair Queuing (C-SCORE) . . . . 8 6. Operational Procedures . . . . . . . . . . . . . . . . . . . 7
6.1. Framework . . . . . . . . . . . . . . . . . . . . . . . . 8 6.1. Metadata . . . . . . . . . . . . . . . . . . . . . . . . 7
6.2. E2E Latency Bound . . . . . . . . . . . . . . . . . . . . 9 6.2. Header format . . . . . . . . . . . . . . . . . . . . . . 7
6.3. Operational Procedure . . . . . . . . . . . . . . . . . . 10 6.2.1. IPv6 header format . . . . . . . . . . . . . . . . . 7
6.3.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 10 6.2.2. MPLS label format . . . . . . . . . . . . . . . . . . 8
6.3.2. Header format . . . . . . . . . . . . . . . . . . . . 10 6.3. Admission control for latency guarantee . . . . . . . . . 11
6.3.3. Admission control for latency guarantee . . . . . . . 13 6.4. Compensation algorithm of time difference between
6.3.4. Role of entrance node for generation and update of nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 13
FT . . . . . . . . . . . . . . . . . . . . . . . . . 15 7. Characteristics . . . . . . . . . . . . . . . . . . . . . . . 14
6.3.5. Role of core node for update of FT . . . . . . . . . 15 7.1. Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3.6. Mitigation of complexity in entrance node . . . . . . 16 7.2. Strengths . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3.7. Compensation of time difference between nodes . . . . 16 8. Approximate C-SCORE via strict priority schedulers . . . . . 15
6.4. Characteristics . . . . . . . . . . . . . . . . . . . . . 17 8.1. General description . . . . . . . . . . . . . . . . . . . 15
6.4.1. Taxonomy . . . . . . . . . . . . . . . . . . . . . . 17 8.2. Transit node architecture . . . . . . . . . . . . . . . . 15
6.4.2. Strengths . . . . . . . . . . . . . . . . . . . . . . 17 8.3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . 16
7. Approximate C-SCORE via strict priority schedulers . . . . . 18 8.4. E2E latency bound of the approximate C-SCORE . . . . . . 17
7.1. General description . . . . . . . . . . . . . . . . . . . 18 9. Considerations for non-compliant nodes . . . . . . . . . . . 17
7.2. Transit node architecture . . . . . . . . . . . . . . . . 18 10. Relationships to ITU-T Standards . . . . . . . . . . . . . . 17
7.3. Algorithms . . . . . . . . . . . . . . . . . . . . . . . 19 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
7.4. E2E latency bound of the approximate C-SCORE . . . . . . 20 12. Security Considerations . . . . . . . . . . . . . . . . . . . 18
8. Considerations for non-compliant nodes . . . . . . . . . . . 20 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18
9. Relationships to ITU-T Standards . . . . . . . . . . . . . . 20 14. Contributor . . . . . . . . . . . . . . . . . . . . . . . . . 18
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 18
11. Security Considerations . . . . . . . . . . . . . . . . . . . 21 15.1. Normative References . . . . . . . . . . . . . . . . . . 18
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 15.2. Informative References . . . . . . . . . . . . . . . . . 19
13. Contributor . . . . . . . . . . . . . . . . . . . . . . . . . 21 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19
14. References . . . . . . . . . . . . . . . . . . . . . . . . . 21
14.1. Normative References . . . . . . . . . . . . . . . . . . 21
14.2. Informative References . . . . . . . . . . . . . . . . . 22
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23
1. Introduction 1. Introduction
There are emerging applications that require both latency and jitter Recommendation ITU-T Y.3129 [Y.3129] specifies the requirements and
bounds in large-scale networks. One of the key mechanisms to the framework for deterministic networking with a set of work conserving
latency and jitter performance of a network is the packet scheduling packet schedulers that guarantees end-to-end latency bounds to flows.
in the data plane. The objective of this document is to specify a The schedulers in core nodes do not need to maintain flow states.
scheduling mechanism that can isolate flows from each other. An Instead, the entrance node of a flow marks an ideal service
ideal flow isolation would be achieved, if an imaginary link is completion time according to a fluid model, called finish time (FT),
dedicated solely to the flow across the network, with the capacity of a packet in the packet header. The subsequent core nodes update
equal to the allocated service rate to the flow. In this case the FT by adding a delay factor, which is a function of the flow and
latency upper bound is a function of the flow's parameters only, upstream nodes. The packets in the queue of the scheduler are served
including the maximum burst size, maximum packet length, and the in the ascending order of FT. This mechanism is called stateless
service rate. fair queuing. The result is that flows are isolated from each other
almost perfectly. The latency bound of a flow depends only on the
In large-scale networks, end nodes can join and leave, and a large flow's intrinsic parameters, except the maximum packet length among
number of flows are dynamically generated and terminated. Achieving other flows sharing each output link with the flow.
satisfactory deterministic performance in such environments would be
challenging. The current Internet, which has adopted the
differentiated services (DiffServ) architecture, has the problem of
the burst accumulation and the cyclic dependency, which is mainly due
to FIFO queuing and strict priority scheduling. Cyclic dependency is
defined as a situation wherein the graph of interference between flow
paths has cycles [THOMAS]. The existence of such cyclic dependencies
makes the proof of determinism a much more challenging issue and can
lead to system instability, that is, unbounded delays
[ANDREWS][BOUILLARD].
A class of schedulers called Fair Queuing (FQ) limits interference Recommendation ITU-T Y.3148 [Y.3148] further specifies the functional
between flows to the degree of maximum packet size. Packetized architecture, functional entities, operational procedures, and packet
generalized processor sharing (PGPS) and weighted fair queuing (WFQ) metadata to fulfil the requirements and framework specified in
are representative examples of FQ [PAREKH]. In FQ, the ideal service Recommendation ITU-T Y.3129. It enables deterministic networking
completion time, called Finish Time (FT), of a packet is obtained with a set of work conserving packet schedulers in the data plane,
from an imaginary system which provides the ideal flow isolation. which guarantees end-to-end (E2E) latency bounds to flows.
Packets in the buffer are served in an increasing order of the FT.
When this mechanism is applied, the end-to-end (E2E) latency bound of
a flow is similar to that in the ideally isolated system. However,
the FT of the previous packet within a flow has to be remembered for
the calculation of the current packet's FT. This information can be
seen as the flow state. The complexity of managing such information
for a large number of flows can be a burden, so the FQ has not been
usually adopted in practice.
The edge node through which a flow enters a network is called the In the framework specified in [Y.3129], the edge node through which a
entrance node. The entrance node for a flow generates FT for a flow enters a network is called the entrance node. The entrance node
packet and records it in the packet. A core node, based on these for a flow generates FT for a packet and records it in the packet. A
records, updates FT, without per-flow state, by adding a delay factor core node, based on these records, updates FT, without per-flow
that is a function of parameters of the node and the flow. This state, by adding a delay factor that is a function of parameters of
framework is called work conserving stateless core fair queuing the node and the flow. This framework is called work conserving
(C-SCORE) [C-SCORE], which is also specified in [Y.3129] and stateless core fair queuing (C-SCORE) [C-SCORE]. C-SCORE is work
[Y.3148]. C-SCORE is work conserving and has the property that, for conserving and has the property that, for a certain choice of the
a certain choice of the delay factor, the expression for E2E latency delay factor, the expression for E2E latency bound can be found.
bound can be found. This E2E latency bound function is the same as This E2E latency bound function is the same as that of a network with
that of a network with stateful FQ schedulers in all the nodes. stateful fair queuing schedulers in all the nodes.
This document specifies the protocol and implementation details for This document specifies the implementation details for realizing
realizing C-SCORE. It also specifies the operational procedure based C-SCORE. This document specifies the metadata, formats of metadata,
on these details. the admission control procedures, and an approximation of stateless
fair queuing implemented via a strict priority (SP) scheduler.
The key component of C-SCORE is the packet state that is carried as A key component of C-SCORE is the packet state that is carried as
metadata. C-SCORE does not need to maintain flow states at core metadata. C-SCORE does not need to maintain flow states at core
nodes, yet it works as one of the FQ schedulers, which is known to nodes, yet it works as one of the fair queuing schedulers, which is
provide the best flow isolation performance. The metadata to be known to provide the best flow isolation performance. The metadata
carried in the packet header is simple and can be updated during the to be carried in the packet header is simple and can be updated
stay in the queue or before joining the queue. during the stay in the queue or before joining the queue.
2. Terminology 2. Terminology
2.1. Terms Used in This Document 2.1. Terms Used in This Document
2.2. Abbreviations 2.2. Abbreviations
BE: Best Effort BE: Best Effort
C-SCORE: Work Conserving Stateless Core Fair Queuing C-SCORE: Work Conserving Stateless Core Fair Queuing
skipping to change at page 5, line 21 skipping to change at page 5, line 4
NAS: Network Action Sub-Stack NAS: Network Action Sub-Stack
PIFO: Push-In First-Out PIFO: Push-In First-Out
PRPS: Packetized Rate Proportional Servers PRPS: Packetized Rate Proportional Servers
PSD: Post-Stack Data PSD: Post-Stack Data
RSpec: Requested Specifications RSpec: Requested Specifications
TLV: Type-Length-Value TLV: Type-Length-Value
TSpec: Traffic Specifications TSpec: Traffic Specifications
VC: Virtual Clock VC: Virtual Clock
3. Conventions Used in This Document 3. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
4. Fair Queuing Schedulers 4. Assumptions
Generalized processor sharing (GPS) suggested a paradigm for a fair
service for flows as fluid. Packetized GPS (PGPS), which implemented
GPS in the realistic packet-based environment, played a pioneering
role in this type of packet-based schedulers [PAREKH]. PGPS
determines the service order of packets in ascending order of the FT
derived by the following equation.
F(p) = max{F(p-1), V(A(p))}+L(p)/r, (1)
where p and p-1 are the pth and (p-1)th packets of a flow, F(p) is
the FT, A(p) is the arrival time, L(p) is the length of the packet p,
and r is the service rate allocated to the flow. Note that the index
for the flow i is omitted. V(t) is called the virtual time function
[PAREKH]. and is a value representing the current system progress at
time t. If the backlogged flows almost fill the link capacity, then
the system slowly progresses in terms of a flow's view, and the
virtual time increases slowly. If there is only a handful of
backlogged flows, then the virtual time increases with a higher rate.
This behavior of the virtual time function prevents an unfair
situation in which flows that entered late have relatively small FTs
thus receive services earlier for a considerable duration of time,
compared to existing flows.
F(p) represents the time that an ideal fluid system would complete
its service of packet p. In a real packetized FQ system, the packets
are served in the increasing order of the FT. The FT can be
calculated at the moment the packet arrives in the node, and the
value of FT is attached to the packet before the packet is stored in
the buffer. In general, there is a queue for each flow, the queues
are managed by FIFO manner, and the scheduler serves the queue having
the HoQ packet with the smallest FT. Alternatively, it is possible
to put all the packets in one queue and sort them, as packets are
enqueued or dequeued, according to the value of the FT. This
implementation requires a priority queue. The point of (1) is that,
in the worst case, when all the flows are active and the link is
fully used, a flow is served at an interval of L(p)/r. At the same
time, by using the work conserving scheduler, any excessive link
resources are shared among the flows. How fairly it is shared is the
main difference between various FQ schedulers.
In order to obtain F(p) in (1), the FT of the previous packet of the
flow, F(p-1), must be remembered. When a packet is received, it is
necessary to find out which flow it belongs to and find out the FT of
the latest packet of the corresponding flow. F(p-1) of this latest
packet is a value representative of the 'flow state'. The fact that
such state information must be memorized and read means a
considerable complexity at a core node managing millions of flows.
This is the main reason why such FQ schedulers are not actually used
on the Internet. In this document, we pay attention to the fact that
the fair service time interval information between packets is already
included in the FTs of the entrance node of a flow. Instead of
deriving a new FT at each core node, we will specify a method of
deriving the FT at downstream nodes by using the initial FT
calculated at the entrance node.
On the other hand, calculating V(t) also involves the complexity of
calculating the sum of r by tracking the flows currently being
serviced in real time. It is a complex calculation. Therefore,
instead of calculating V(t) accurately, methods for estimating it in
a simple way have been suggested. Among them, Virtual Clock (VC)
[ZHANG] uses the current time t instead of V(t) to determine the FT.
Self-clocked fair queuing [GOLESTANI] uses the FT of the recently
serviced packet of another flow instead of V(t). The document adopts
the VC's approach.
Stiliadis showed that this series of FQ schedulers can belong to the
Packetized Rate Proportional Servers (PRPS) [STILIADIS-RPS]. For
example, PGPS and VC are PRPS, while self-clocked fair queuing is
not. It was proved that a network with all the nodes having one of
these PRPSs guarantee the E2E latency for flow. Moreover, any PRPS
scheduler yields the same E2E latency bound [STILIADIS-RPS].
5. Assumptions
In this document, we assume there are only two classes of traffic. In this document, we assume there are only two classes of traffic.
The high priority or equivalently DetNet traffic requires guarantee The high priority or equivalently DetNet traffic requires guarantee
on latency upper bounds. All the other traffic is considered to be on latency upper bounds. All the other traffic is considered to be
the low priority or Best Effort (BE) traffic, and be completely the low priority or Best Effort (BE) traffic. High priority traffic
preempted by the high priority traffic. High priority traffic is our is our only concern. However, packets of BE traffic cannot be
only concern. preempted by the high priority traffic.
All the flows conform to their traffic specification (TSpec) All the flows conform to their traffic specification (TSpec)
parameters. In other words, with the maximum burst size Bi and the parameters. In other words, with the maximum burst size Bi and the
arrival rate ai, the accumulated arrival from flow i in any arbitrary arrival rate ai, the accumulated arrival from flow i in any arbitrary
time interval [t1, t2], t1 < t2, does not exceed Bi+(t2-t1)ai. An time interval [t1, t2], t1 < t2, does not exceed Bi+(t2-t1)ai. An
actual allocated service rate to a flow, ri, can be larger than or actual allocated service rate to a flow, ri, can be larger than or
equal to the arrival rate of the flow. As it will be shown in (6) equal to the arrival rate of the flow. As it will be shown in (6)
that, by adjusting the service rate to a flow, the E2E latency bound that, by adjusting the service rate to a flow, the E2E latency bound
of the flow can be adjusted. Note that ri is used interchangeably of the flow can be adjusted. Note that ri is used interchangeably
with the symbol r to denote the service rate of a flow. Total with the symbol r to denote the service rate of a flow under
allocated service rate to all the flows in a node does not exceed the observation. Total allocated service rate to all the flows in a node
link capacity of the node. These assumptions make the resource does not exceed the link capacity of the node. These assumptions
reservation and the admission control mandatory. make the resource reservation and the admission control mandatory.
A node, or equivalently a server, means an output port module of a A node, or equivalently a server, means an output port module of a
switching device. switching device.
The entrance node for a flow is the node located at the edge of a The entrance node for a flow is the node located at the edge of a
network, from which the flow enters into the network. A core node network, from which the flow enters into the network. A core node
for a flow is a node in the network, which is traversed by the flow for a flow is a node in the network, which is traversed by the flow
and is not the entrance node. Note that a single node can be both an and is not the entrance node. Note that a single node can be both an
entrance node to a flow and a core node for another flow. entrance node to a flow and a core node for another flow.
A packet is defined as having arrived or been serviced when its final A packet is defined as arrived or serviced when its final bit is
bit has been received by or transmitted from the node, respectively. received by or transmitted from the node, respectively.
6. Work Conserving Stateless Core Fair Queuing (C-SCORE)
6.1. Framework
FQ schedulers utilize the concept of FT that is used as the service
order assigned to a packet. The packet with the minimum FT in a
buffer is served first.
As an example, the VC scheduler [ZHANG] defines the FT to be
F(p) = max{F(p-1), A(p)} + L(p)/r, (2) 5. Framework
where (p-1) and p are consecutive packets of the flow under Fair queuing (FQ) schedulers utilize the concept of finish time (FT)
observation, A(p) is the arrival time of p, L(p) is the length of p, that is used as the service order assigned to a packet. The packet
and r is the flow service rate. The flow index is omitted. with the minimum FT in a buffer is served first.
The key idea of the FQ is to calculate the service completion times The key idea of the FQ is to calculate the ideal service completion
of packets in an imaginary ideal fluid service model and use them as times of packets in an imaginary fluid service model and use them as
the service order in the real packet-based scheduler. the service order in the real packet-based scheduler.
While having the excellent flow isolation property, the FQ needs to While having the excellent flow isolation property, FQ schedulers
maintain the flow state, F(p-1). For every arriving packet, the flow need to maintain the flow state, F(p-1). Upon the arrival of each
it belongs to has to be identified and its previous packet's FT packet, the system must perform flow identification and retrieve the
should be extracted. As the packet departs, the flow state, F(p), FT associated with the preceding packet in that flow. The system is
has to be updated as well. required to update the flow state to be F(p) concurrently with the
packet's departure.
Requirement 1: In the entrance node, it is REQUIRED to obtain the FTs
with the following equation. 0 denotes the entrance node of the flow
under observation.
F0(p) = max{F0(p-1), A0(p)}+L(p)/r. (3)
Note that if the FTs are constructed according to the above equation,
the fair distance of consecutive packets is maintained.
Requirement 2: In a core node h, it is REQUIRED to increase the FT of
a packet by an amount, d(h-1)(p), that depends on the previous node
and the packet.
Fh(p) = F(h-1)(p) + d(h-1)(p). (4)
Requirement 3: It is REQUIRED that dh(p) is a non-decreasing function
of p, within a flow active period.
Requirements 1, 2, and 3 specify how to construct the FT in a
network. The following requirements 4 and 5 specify how the FT is
used for scheduling.
Requirement 4: It is REQUIRED that a node provides service whenever [Y.3129] and [Y.3148] specify a FQ scheduler, which does not need to
there is a packet. maintain the flow state in core nodes. In the entrance node, the FTs
are obtained with the following equation, where 0 denotes the
entrance node of the flow under observation.
Requirement 5: It is REQUIRED that all packets waiting for service in F0(p) = max{F0(p-1), A0(p)}+L(p)/r. (1)
a node are served in the ascending order of their FTs.
This framework is called the work conserving stateless core fair In a core node h, the FT of a packet is increased by an amount,
queuing (C-SCORE) [C-SCORE], [Y.3129], [Y.3148], which can be d(h-1)(p), that depends on the previous node and the packet.
compared to the existing non-work conserving scheme [STOICA].
6.2. E2E Latency Bound Fh(p) = F(h-1)(p) + d(h-1)(p). (2)
For C-SCORE to guarantee E2E latency bound, the dh(p) is RECOMMENDED The framework achieves a bounded E2E latency by defining dh(p) as
to be defined as in the following. follows.
dh(p) = Lh/Rh + L/r + delta_h(p), (5) dh(p) = Lh/Rh + L/r + delta_h(p), (3)
where Lh is the observed maximum packet length in the node h over all where Lh is the observed maximum packet length in the node h over all
the flows, Rh is the link capacity of the node h, L is the maximum the flows, Rh is the link capacity of the node h, L is the maximum
packet length of the flow, r is the service rate of the flow, and packet length of the flow, r is the service rate of the flow, and
delta_h(p) is the time difference function between node h and h+1. delta_h(p) is the function that represents the time difference
between node h and h+1.
The time difference function of packet p is defined as the service The time difference function of packet p, delta_h(p), is defined as
completion time at node h and the arrival time at node h+1 of the the difference between the service completion time at node h and the
packet. It includes the clock discrepancy and the propagation delay arrival time at node h+1 of the packet. It includes the clock
between nodes. Note that this function is relatively stable over discrepancy and the propagation delay between nodes. Note that this
packets, thus should be approximated as a constant value delta_h. function is relatively stable over packets, thus can be approximated
as a constant value delta_h.
When dh(p) is decided by (5), then it is proved that When dh(p) is decided by (3), then it is proven that
Dh(p) <= (B-L)/r + sum_(j=0)^h{Lj/Rj + L/r} + sum_(j=0)^(h-1){delta_j(p)}, (6) Dh(p) <= (B-L)/r + sum_(j=0)^h{Lj/Rj + L/r} + sum_(j=0)^(h-1){delta_j(p)},(4)
where Dh(p) is the latency experienced by p from the arrival at the where Dh(p) is the latency experienced by p from the arrival at the
node 0 to the departure from node h [KAUR], [C-SCORE]. B is the node 0 to the departure from node h [KAUR], [C-SCORE]. B is the
maximum burst size of the flow under observation that p belongs to. maximum burst size of the flow under observation that p belongs to.
The term sum_(j=0)^(h-1){delta_j(p)} includes the clock discrepancies The term sum_(j=0)^(h-1){delta_j(p)} includes the clock discrepancies
and the propagation delays between the nodes. Considering that the and the propagation delays between the nodes. Considering that the
clock discrepancies can be neglected in an absolute time, this term clock discrepancies can be neglected in an absolute time reference,
actually represents the E2E propagation delay. this term effectively represents the E2E propagation delay.
Note that the latency bound in (6) is the same to the network where Note that the latency bound in (4) is the same to the network where
every node has a stateful FQ scheduler, including VC. The parameters every node has a stateful FQ scheduler. The parameters in the
in the latency bound are all intrinsic to the flow, except Lh/Rh. latency bound are all intrinsic to the flow, except Lh/Rh and
propagation delays.
6.3. Operational Procedure 6. Operational Procedures
6.3.1. Metadata 6.1. Metadata
The necessary metadata to be carried by a packet are Fh(p) and L/r. [Y.3129] and [Y.3148] specifies the metadata to be carried by a
Fh(p) is dynamic and needs to be updated every hop, in accordance packet, which are Fh(p), L, and r. Fh(p) is dynamic and needs to be
with (4). L/r is a static value for a flow. Note that these updated every hop, in accordance with (2). L and r are static values
metadata both have the unit of time. for a flow.
As a packet arrives at a core node h, it carries metadata Fh(p) and As a packet arrives at a core node h, it carries metadata Fh(p), L,
L/r. Fh(p) is pre-calculated at node h-1 with d(h-1)(p), which is a and r. Fh(p) is pre-calculated at node h-1 with d(h-1)(p), which is
function of node h-1. dh(p) can be obtained by the summation of the a function of node h-1. dh(p) can be obtained by the summation of L/
metadata L/r, the node specific parameters Lh/Rh and delta_h. L/r r, and the node specific parameters Lh/Rh and delta_h. Lh/Rh and
should be kept in a packet as metadata, to be able to calculate delta_h values should be maintained by the node h. At node h,
dh(p), in accordance with (5). Lh/Rh and delta_h values should be F(h+1)(p) value is calculated and replaces Fh(p).
maintained by the node h.
6.3.2. Header format The detailed operations regarding metadata creation and updates in
the entrance node and core nodes are specified in [Y.3129] and
[Y.3148]. The considerations for the mitigation of the complexity at
the entrance node can also be found in these references.
6.3.2.1. IPv6 header format 6.2. Header format
6.2.1. IPv6 header format
The IPv6 Hop-by-Hop (HbH) Options Header [RFC8200] should be used for The IPv6 Hop-by-Hop (HbH) Options Header [RFC8200] should be used for
carrying the metadata. It is a specific type of Extension Header carrying the metadata. It is a specific type of Extension Header
designed for information that must be examined and processed by all designed for information that must be examined and processed by all
the transit nodes along a packet's delivery path. This header is the transit nodes along a packet's delivery path. This header is
identified by a Next Header value of 0 in the IPv6 main header. The identified by a Next Header value of 0 in the IPv6 main header. The
HbH Options header consists of a sequence of variable-length options. HbH Options header consists of a sequence of variable-length options.
These options are encoded in a Type-Length-Value (TLV) format. In These options are encoded in a Type-Length-Value (TLV) format. In
this transformation, the metadata (Finish Time and L/r) are stored in this transformation, the metadata (Finish Time, L, and r) are stored
the Option Data fields of two separate Options within a single HbH in the Option Data fields of three separate Options within a single
Options header. Length of the HbH Options header should be specified HbH Options header. Length of the HbH Options header should be
by the Hdr Ext Len field. Option Type and Option Data Len fields specified by the Hdr Ext Len field. Option Type and Option Data Len
should specify the C-SCORE metadata identifier and the length of the fields should specify the C-SCORE metadata identifier and the length
metadata, which are to be determined. of the metadata, which are to be determined.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Hdr Ext Len | Option Type 1 |Opt Data Len 1 | | Next Header | Hdr Ext Len | Option Type 1 |Opt Data Len 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Data 1 (L/r) | | Option Data 1 (L) | Option Type 2 |Opt Data Len 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Type 2 |Opt Data Len 2 | Option Data 2 (FT) | | Option Data 2 (r) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Data 2 (FT Continued) | | Option Type 3 |Opt Data Len 3 | Option Data 3 (FT) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Data 3 (FT Continued) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: IPv6 HbH Options Header example for two metadata for Figure 1: IPv6 HbH Options Header example for two metadata for
C-SCORE C-SCORE
The following is the operational procedure in a transit node. The The following is the operational procedure in a transit node. The
hardware parser sees Next Header 0 in the main IPv6 header and hardware parser sees Next Header 0 in the main IPv6 header and
directs the packet to the metadata management function. This directs the packet to the metadata management function. This
functional entity scans for the Option Type corresponding to C-SCORE. functional entity scans for the Option Type corresponding to C-SCORE.
The Finish Time (48-bit, TBD) and L/r (32bit, TBD) are extracted and The Finish Time (48 bit, TBD), L (16 bit, TBD), and r (32 bit, TBD)
processed. The new FT is calculated. Since the offset is fixed are extracted and processed. The new FT is calculated. Since the
relative to the start of the HbH Options header, the functional offset is fixed relative to the start of the HbH Options header, the
entity performs a single cycle write to update the metadata field functional entity performs a single cycle write to update the
before recalculating any necessary checksums (if applicable) and metadata field before recalculating any necessary checksums (if
forwarding. applicable) and forwarding.
6.3.2.2. MPLS label format Note that L, the maximum packet length of the flow, does not need to
be precise at the byte level. Similarly r, the service rate of the
flow, can be of a coarse granularity or even several discrete pre-
defined levels. The encoding of these metadata will be specified in
a later version of this draft.
6.2.2. MPLS label format
[I-D.ietf-mpls-mna-detnet] specifies formats and mechanisms for using [I-D.ietf-mpls-mna-detnet] specifies formats and mechanisms for using
MPLS Network Actions (MNA) to support DetNet services, including MPLS Network Actions (MNA) to support DetNet services, including
bounded latency, low loss and in-order delivery. It specifies three bounded latency, low loss and in-order delivery. It specifies three
information elements of DetNet packets, which are Flow identifier information elements of DetNet packets, which are Flow identifier
(Flow-ID), Sequence information (SeqNum), Latency information (Flow-ID), Sequence information (SeqNum), Latency information
(LatencyInfo). The C-SCORE metadata Fh(p) and L/r are the Latency (LatencyInfo). The C-SCORE metadata Fh(p), L, and r are the Latency
Information, according to this specification. Information, according to this specification.
Two approaches for carrying information elements are specified: In- Two approaches for carrying information elements are specified: In-
Stack and Post-Stack MNAs. With In-Stack MNA, the DetNet-specific Stack and Post-Stack MNAs. With In-Stack MNA, the DetNet-specific
information is embedded directly within the MPLS label stack, as part information is embedded directly within the MPLS label stack, as part
of a Network Action Sub-stack (NAS). The information elements reside of a Network Action Sub-stack (NAS). The information elements reside
before the Bottom of Stack (BOS) bit. It uses a Network Action before the Bottom of Stack (BOS) bit. It uses a Network Action
Indicator (NAI) to signal that the subsequent labels in the sub-stack Indicator (NAI) to signal that the subsequent labels in the sub-stack
are actually ancillary data (Flow-ID, etc.) rather than traditional are actually ancillary data (Flow-ID, etc.) rather than traditional
switching labels. switching labels.
skipping to change at page 12, line 21 skipping to change at page 10, line 8
Bit 20 in Label Stack Entry (LSE) Format B carried in the In-Stack Bit 20 in Label Stack Entry (LSE) Format B carried in the In-Stack
NAS is defined as the P bit to indicate the presence of the Post- NAS is defined as the P bit to indicate the presence of the Post-
Stack MPLS Header in the packet after the BOS bit. Stack MPLS Header in the packet after the BOS bit.
[I-D.ietf-mpls-mna-ps-hdr] LSE Format B refers to a specialized [I-D.ietf-mpls-mna-ps-hdr] LSE Format B refers to a specialized
structure for a LSE that carries ancillary data instead of a structure for a LSE that carries ancillary data instead of a
traditional switching label. traditional switching label.
PSMHT can be located immediately following the BOS label, which PSMHT can be located immediately following the BOS label, which
includes PS-HDR-LEN and Version/Type. PS-HDR-LEN specifies the total includes PS-HDR-LEN and Version/Type. PS-HDR-LEN specifies the total
length of the post-stack metadata. Version/Type identifies the MNA- length of the post-stack metadata. Version/Type identifies the MNA-
POST-STACK-HDR. The metadata for C-SCORE are Finish Time and L/r. POST-STACK-HDR. The metadata for C-SCORE are FT, L and r. The
The number of bits required for these metadata are for further study number of bits required for these metadata are for further study and
and to be specified. FT and L/r are carried across two separate to be specified. FT, L and r are carried across three separate PSNAs
PSNAs to support hop-by-hop scheduling. If FT is of 48 bit length, to support hop-by-hop scheduling. If FT is of 48 bit length, then
then the PSNA for FT should have Post-Stack Network Action Length the PSNA for FT should have Post-Stack Network Action Length (PS-NAL)
(PS-NAL) = 1, as in Figure 2. = 1, as in Figure 2.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MNA Label | TC |0| TTL | | MNA Label | TC |0| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Opcode=2=NOOP| 0 |1|HbH|1| NASL=0|U|NAL=0| |Opcode=2=NOOP| 0 |1|HbH|1| NASL=0|U|NAL=0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x00 |Reserve| PSMH-LEN=3 | TYPE = MNA-POST-STACK-HDR = 1 | | 0x00 |Reserve| PSMH-LEN=5 | TYPE = MNA-POST-STACK-HDR = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MNA-PS-OP1 |R|R| PS-NAL=0 | POST-STACK DATA = L/r | | MNA-PS-OP1 |R|R| PS-NAL=0 | POST-STACK DATA = L |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MNA-PS-OP2 |R|R| PS-NAL=1 | POST-STACK DATA = FT | | MNA-PS-OP2 |R|R| PS-NAL=1 | POST-STACK DATA = r |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| POST-STACK DATA = r (Continued, Padding necessary) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MNA-PS-OP3 |R|R| PS-NAL=1 | POST-STACK DATA = FT |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| POST-STACK DATA = FT (Continued) | | POST-STACK DATA = FT (Continued) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
| Optional Payload + Padding | | Optional Payload + Padding |
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: Post Stack MNA Sub-Stack example with two PSNAs for two Figure 2: Post Stack MNA Sub-Stack example with three PSNAs for
metadata for C-SCORE three metadata for C-SCORE
Figure 2 is an example where the Post-Stack MNA Sub-Stack encodes two Figure 2 is an example where the Post-Stack MNA Sub-Stack encodes
different PSNAs for two metadata. Their details are as follows: three different PSNAs for three metadata. Their details are as
follows:
- The offset of the Hop-By-Hop scoped PSNA is 0. - The offset of the Hop-By-Hop scoped PSNA is 0.
- PSMH-LEN=3: This is the total length of the Post-Stack MPLS Header - PSMH-LEN=5: This is the total length of the Post-Stack MPLS Header
(PSMH). (PSMH).
- MNA-PS-OP1: Post-Stack MNA Opcode (TBD) for L/r. - MNA-PS-OP1: Post-Stack MNA Opcode (TBD) for L
- PS-NAL=0: PSNA does not contain any additional data. - PS-NAL=0: PSNA does not contain any additional data.
- MNA-PS-OP2: Post-Stack MNA Opcode (TBD) for Finish Time. - MNA-PS-OP2: Post-Stack MNA Opcode (TBD) for r
- PS-NAL=1: PSNA contains 1 additional 4-octet Ancillary Data. - PS-NAL=1: PSNA contains one additional 4-octet Ancillary Data.
- MNA-PS-OP3: Post-Stack MNA Opcode (TBD) for Finish Time
- PS-NAL=1: PSNA contains one additional 4-octet Ancillary Data.
The post-stack data r can be of 32 bit length, which can make the
padding necessary. The rule for padding will be specified by
[I-D.ietf-mpls-mna-detnet] in its future version.
Note that C-SCORE does not require Flow-ID to be carried in the Note that C-SCORE does not require Flow-ID to be carried in the
packet. This is because of its stateless nature. packet. This is because of its stateless nature.
6.3.3. Admission control for latency guarantee 6.3. Admission control for latency guarantee
The following is a recommended procedure for admission of a flow in The following is a recommended procedure for admission of a flow in
the C-SCORE framework. the C-SCORE framework.
1) An application requests a flow with T-Spec (Traffic 1) An application requests a flow with T-Spec (Traffic
Specification) and R-Spec (Requested Specification). T-Spec Specification) and R-Spec (Requested Specification). T-Spec
includes service rate r, maximum packet size L, and maximum burst includes service rate r, maximum packet size L, and maximum burst
size B. R-Spec includes the E2E latency and jitter bounds. size B. R-Spec includes the E2E latency and jitter bounds.
2) The entrance node sends a PATH message toward the egress. This 2) The entrance node sends a PATH message toward the egress. This
skipping to change at page 15, line 24 skipping to change at page 13, line 27
Since the core nodes only need to know the final committed rate r in Since the core nodes only need to know the final committed rate r in
the packet header, the complex discovery logic is restricted to the the packet header, the complex discovery logic is restricted to the
control plane and edge nodes. Large-scale networks can use this control plane and edge nodes. Large-scale networks can use this
mechanism to perform the aggregate rate discovery, where the r_path mechanism to perform the aggregate rate discovery, where the r_path
represents the available capacity for an entire bundle of flows of represents the available capacity for an entire bundle of flows of
the same path. By using this approach, the admission process also the same path. By using this approach, the admission process also
effectively maps the current congestion state of the network onto a effectively maps the current congestion state of the network onto a
single service rate parameter. single service rate parameter.
6.3.4. Role of entrance node for generation and update of FT 6.4. Compensation algorithm of time difference between nodes
It is assumed that the packet length of p, L(p), is written in the
packet header. Entrance node maintains the flow state, i.e. FT of
packet (p-1) at node 0 (F0(p-1)), the maximum packet length of the
flow (L), and the service rate allocated to the flow (r). It
operates a clock to identify the arrival time of a packet (A0(p)).
It collects the link information such as the maximum packet length of
all the flow (L0) and link capacity (R0) to calculate the delay
factor at the entrance node (d0(p)).
Upon receiving or generating packet p, it obtains F0(p) =
max{F0(p-1), A0(p)} + L(p)/r, and uses it as the FT in the entrance
node. If the queue is not empty then it puts p in a priority queue.
It also obtains F1(p) = F0(p) + L0/R0 + L/r before or during p is in
the queue. It writes F1(p) and L/r in the packet as metadata for use
in the next node 1. Finally it updates the flow state information
F0(p-1) to F0(p).
6.3.5. Role of core node for update of FT
A core node h collects the link information Lh/Rh. As in an entrance
node, Lh is a rather static value, but still can be changed over
time. Upon receiving packet p, it retrieves metadata Fh(p) and L/r,
and uses Fh(p) as the FT value of the packet. It puts p in a
priority queue. It obtains F(h+1)(p) = Fh(p) + Lh/Rh + L/r and
updates the packet metadata Fh(p) with F(h+1)(p) before or during p
is in the queue.
6.3.6. Mitigation of complexity in entrance node
Flow states still have to be maintained in entrance nodes. When the
number of flows is large, maintaining flow states can be burdensome.
However, this burden can be mitigated as follows. The notion of an
entrance node can be understood as a various edge device, including a
source itself. FT of a packet is decided based on the maximum of
F0(p-1) and A0(p); and L(p)/r. These parameters are flow-specific.
There is no need to know any other external parameters. The arrival
time of p to the network, A0(p), can be defined as the generation
time of p at the source. Then F0(p) is determined at the packet
generation time and can be recorded in the packet. In other words,
the entrance node functionality can reside in the source itself.
Therefore, we can significantly alleviate the complexity of the
proposed framework. The framework is scalable and can be applied to
any network.
6.3.7. Compensation of time difference between nodes
There are time differences between nodes, including the clock There are time differences between nodes, including the clock
discrepancies and the propagation delays. This time difference can discrepancies and the propagation delays. This time difference can
be defined as the difference between the service completion time of a be defined as the difference between the service completion time of a
packet measured at the upstream node and the arrival time of the packet measured at the upstream node and the arrival time of the
packet measured at the current node. In other words, packet measured at the current node. In other words,
delta_h(p) = A(h+1)(p) - Ch(p), delta_h(p) = A(h+1)(p) - Ch(p),
where delta_h(p) is the time difference between node h and h+1, and where delta_h(p) is the time difference between node h and h+1, and
Ch(p) is the service completion time measured at node h, for packet p Ch(p) is the service completion time measured at node h, for packet p
respectively. respectively.
FT does not need to be precise. It is used just to indicate the FT does not need to be precise. It is used just to indicate the
packet service order. Therefore, if we can assume that the packet service order. Therefore, if we can assume that the
propagation delay is constant and the clocks do not drift, then propagation delay is constant and the clocks do not drift, then
delta_h(p) can be simplified to a constant value, delta_h. In this delta_h(p) can be simplified to a constant value, delta_h.
case the delay factor in (4) can be modified to be
dh(p) = Lh/Rh + L/r + delta_h.
The time difference delta_h may be updated only once in a while. The time difference delta_h may be updated only once in a while.
The protocol for obtaining the departure time from node h, Ch(p), at The procedure for obtaining the departure time from node h, Ch(p), at
node h+1 will be elaborated in a later version of this draft. The node h+1 will be elaborated in a later version of this draft. The
Ch(p) information can be obtained on demand, or be reported Ch(p) information can be obtained on demand, or be reported
periodically. The message format will follow that of Network Time periodically. This information exchange can be based on a
Protocol (NTP), but the procedure will be simpler. The message piggybacking mechanism with packet metadata.
itself can be standalone like in NTP, or can be piggybacked on a data
packet. Alternatively, the explicit messages can be exchanged. In this case
the message format will follow that of Network Time Protocol (NTP),
but the procedure will be simpler.
Note that the time difference between non-adjacent nodes can also be Note that the time difference between non-adjacent nodes can also be
obtained similarly. This feature is useful when there are non- obtained similarly. This feature is useful when there are non-
compliant nodes in between. In this case, however, the variable compliant nodes in between. In this case, however, the variable
queuing delay from the non-compliant nodes should be taken into queuing delay from the non-compliant nodes should be taken into
account. One possible solution is to sample the time difference account. One possible solution is to sample the time difference
values over an enough interval, and take the maximum value. values over an enough interval, and take the maximum value.
6.4. Characteristics 7. Characteristics
6.4.1. Taxonomy 7.1. Taxonomy
The framework in this document, C-SCORE, is a flow level, rate based, The framework in this document, C-SCORE, is a flow level, rate based,
work conserving, asynchronous, non-periodic, and in-time solution, work conserving, asynchronous, non-periodic, and in-time solution,
according to the taxonomy suggested by according to the taxonomy suggested by
[I-D.ietf-detnet-dataplane-taxonomy]. [I-D.ietf-detnet-dataplane-taxonomy].
[I-D.ietf-detnet-dataplane-taxonomy] also defines seven suitable [I-D.ietf-detnet-dataplane-taxonomy] also defines seven suitable
categories for deterministic networking. A category is defined to be categories for deterministic networking. A category is defined to be
a set of solutions that is put together by one or more criteria, a set of solutions that is put together by one or more criteria,
where a criterion is a principle or standard by which a solution can where a criterion is a principle or standard by which a solution can
be judged or decided to be put into a certain category. be judged or decided to be put into a certain category.
C-SCORE belongs to the "flow level rate based unbounded category", C-SCORE belongs to the "flow level rate based unbounded category",
which is one of the seven suitable categories, according to this which is one of the seven suitable categories, according to this
categorization. categorization.
6.4.2. Strengths 7.2. Strengths
C-SCORE's per hop latency dominant factor is the maximum packet C-SCORE's per hop latency dominant factor is the maximum packet
length divided by the service rate of the flow. This is independent length divided by the service rate of the flow. This is independent
of other flows' parameters. As such, its most distinguishable of other flows' parameters. As such, its most distinguishable
strength is the flow isolation capability. It can assign a fine strength is the flow isolation capability. It can assign a fine
tuned E2E latency bound to a flow, by controlling the flow's own tuned E2E latency bound to a flow, by controlling the flow's own
parameters such as the service rate. Once the latency bound is parameters such as the service rate. Once the latency bound is
assigned to the flow, then it remains almost the same in spite of the assigned to the flow, then it remains almost the same in spite of the
network situation changes, such as other flows' join and leave. network situation changes, such as other flows' join and leave.
skipping to change at page 18, line 20 skipping to change at page 15, line 16
In contrast, non-work conserving solutions make it common to observe In contrast, non-work conserving solutions make it common to observe
their latency bounds. their latency bounds.
It is rate based, thus the admission condition check process is It is rate based, thus the admission condition check process is
simple, which is dependent only on the service rates of flows. This simple, which is dependent only on the service rates of flows. This
process aligns well with existing protocols. process aligns well with existing protocols.
Overall, C-SCORE suits large scale networks, at any utilization Overall, C-SCORE suits large scale networks, at any utilization
level, with various types of flows join and leave dynamically. level, with various types of flows join and leave dynamically.
7. Approximate C-SCORE via strict priority schedulers 8. Approximate C-SCORE via strict priority schedulers
7.1. General description 8.1. General description
C-SCORE requires a priority queue that sorts the packets in a queue, C-SCORE requires a priority queue that sorts the packets in a queue,
in accordance with their finish times. This may restrict the overall in accordance with their finish times. This may restrict the overall
maximum throughput of a system, when compared to the strict priority maximum throughput of a system, when compared to the strict priority
(SP) schedulers used in the current practices of switching nodes. SP (SP) schedulers used in the current practices of switching nodes. SP
schedulers are usually composed of 8 to 32 queues, and schedule the schedulers are usually composed of 8 to 32 queues, and schedule the
packets of higher priority queue first whenever they are present. packets of higher priority queue first whenever they are present.
The packets in the same queue are served on a first in first out The packets in the same queue are served on a first in first out
(FIFO) basis. It is common to find the SP schedulers in hardware (FIFO) basis. It is common to find the SP schedulers in hardware
chips for current switching nodes. chips for current switching nodes.
It would be desirable if C-SCORE can be implemented with such an SP It would be desirable if C-SCORE can be implemented with such an SP
scheduler. In this section, the architecture and algorithms for the scheduler. In this section, the architecture and algorithms for the
Approximate C-SCORE with rotating SP schedulers are specified. The Approximate C-SCORE with rotating SP schedulers are specified. The
E2E latency bound of the network of the approximate C-SCOREs is also E2E latency bound of the network of the approximate C-SCOREs is also
specified. specified.
7.2. Transit node architecture 8.2. Transit node architecture
The architecture of an approximate C-SCORE transit node, in which the The architecture of an approximate C-SCORE transit node, in which the
SP scheduler with a limited number of queues behaves as an SP scheduler with a limited number of queues behaves as an
approximate priority queue, is specified in this section. approximate priority queue, is specified in this section.
The input port module classifies the deterministic flows and best The input port module classifies the deterministic flows and best
effort (BE) flows. The deterministic flows are put into one of N effort (BE) flows. The deterministic flows are put into one of N
queues that work as rotating SP scheduler queues. If the queue k, 0 queues that work as rotating SP scheduler queues. If the queue k, 0
<= k <= N-1, is the highest priority queue at time t, then the queue <= k <= N-1, is the highest priority queue at time t, then the queue
(k+N-1)(mod N) has the lowest priority at t. BE flows are put into (k+N-1)(mod N) has the lowest priority at t. BE flows are put into
skipping to change at page 19, line 25 skipping to change at page 16, line 17
the terminal boundary of Slot i. An arriving packet p is assigned to the terminal boundary of Slot i. An arriving packet p is assigned to
Slot i if its finish time, F(p), falls within the interval Slot i if its finish time, F(p), falls within the interval
(T_(i-1),T_i]. This mapping corresponds to a discrete set of (T_(i-1),T_i]. This mapping corresponds to a discrete set of
hardware queues where packets are buffered and processed according to hardware queues where packets are buffered and processed according to
a First-In-First-Out (FIFO) discipline. The system utilizes a strict a First-In-First-Out (FIFO) discipline. The system utilizes a strict
priority (SP) scheduler to arbitrate across these queues, granting priority (SP) scheduler to arbitrate across these queues, granting
precedence to those representing the earliest finish-time. The precedence to those representing the earliest finish-time. The
scheduler operates in a work-conserving manner, providing service at scheduler operates in a work-conserving manner, providing service at
line rate whenever the system is backlogged. line rate whenever the system is backlogged.
7.3. Algorithms 8.3. Algorithms
To provide differentiated quality-of-service, the per-hop latency To provide differentiated quality-of-service, the per-hop latency
must be modulated according to the specific service rates of must be modulated according to the specific service rates of
traversing flows. By reducing the scheduling granularity, or slot traversing flows. By reducing the scheduling granularity, or slot
length (S), the system can provide tiered latency bounds based on the length (S), the system can provide tiered latency bounds based on the
number of slots occupied by a flow's maximum virtual service interval number of slots occupied by a flow's maximum virtual service interval
(L/r). Specifically, for a flow f where (n-1)S < L/r <= nS, the per- (L/r). Specifically, for a flow f where (n-1)S < L/r <= nS, the per-
hop latency is bounded by (n+1)S. This mechanism ensures fairness by hop latency is bounded by (n+1)S. This mechanism ensures fairness by
granting lower latency to flows with higher service rates (smaller L/ granting lower latency to flows with higher service rates (smaller L/
r), thereby aligning temporal performance with bandwidth allocation. r), thereby aligning temporal performance with bandwidth allocation.
skipping to change at page 20, line 9 skipping to change at page 16, line 50
packets mapped to a specific slot are fully serviced before its packets mapped to a specific slot are fully serviced before its
terminal boundary, T_i. Consequently, the service interval for any terminal boundary, T_i. Consequently, the service interval for any
packet p is strictly contained within its assigned slot. However, in packet p is strictly contained within its assigned slot. However, in
a realistic non-preemptive environment, the completion time is a realistic non-preemptive environment, the completion time is
extended by a factor of L_h/R_h, where L_h is the maximum packet extended by a factor of L_h/R_h, where L_h is the maximum packet
length and R_h is the link rate of the output link h, respectively. length and R_h is the link rate of the output link h, respectively.
Thus, the service is guaranteed to complete no later than T_i + L_h/ Thus, the service is guaranteed to complete no later than T_i + L_h/
R_h. R_h.
The approximation follows the architecture of C-SCORE. In other The approximation follows the architecture of C-SCORE. In other
words, equations (3) and (4) are still used. However (5) is replaced words, equations (1) and (2) are still used. However (3) is replaced
by the following equation (7). by the following equation (5).
d_h(p) = (L_h^max)/R_h + (n_(f,h)+1)S_h + delta_h(p), (7) d_h(p) = (L_h^max)/R_h + (n_(f,h)+1)S_h + delta_h(p), (5)
7.4. E2E latency bound of the approximate C-SCORE 8.4. E2E latency bound of the approximate C-SCORE
The E2E latency of a network with the approximate C-SCORE schedulers The E2E latency of a network with the approximate C-SCORE schedulers
is upper bounded by is upper bounded by
B/r + sum_(h=0)^H{(n_(f,h)+1)S_h + L_h/R_h} + sum_(h=0)^(H-1){delta_h(p)}, B/r + sum_(h=0)^H{(n_(f,h)+1)S_h + L_h/R_h} + sum_(h=0)^(H-1){delta_h(p)},
where S_h is the slot length at node h, n_(f,h) is an integer where S_h is the slot length at node h, n_(f,h) is an integer
specific to flow f at node h, which meets (n_(f,h)-1)S_h < L_f/r <= specific to flow f at node h, which meets (n_(f,h)-1)S_h < L_f/r <=
n_(f,h)S_h. In other words, n_(f,h)=ceiling[L_f/(r*S_h)]. The term n_(f,h)S_h. In other words, n_(f,h)=ceiling[L_f/(r*S_h)]. The term
sum_(h=0)^(H-1){delta_h(p)} includes the clock discrepancies and the sum_(h=0)^(H-1){delta_h(p)} includes the clock discrepancies and the
propagation delays between the nodes. Considering that the clock propagation delays between the nodes. Considering that the clock
discrepancies can be neglected in an absolute time, this term discrepancies can be neglected in an absolute time, this term
actually represents the E2E propagation delay. actually represents the E2E propagation delay.
8. Considerations for non-compliant nodes 9. Considerations for non-compliant nodes
There can be non-compliant end nodes and relay nodes in the network. There can be non-compliant end nodes and relay nodes in the network.
There can be naturally end nodes without necessary signalling There can be naturally end nodes without necessary signalling
capabilities for admission control. There also can be legacy nodes capabilities for admission control. There also can be legacy nodes
or the nodes with dataplane enhancement solutions other than C-SCORE. or the nodes with dataplane enhancement solutions other than C-SCORE.
How these can be compensated, or how much these nodes degrade the How these can be compensated, or how much these nodes degrade the
performance of C-SCORE will be discussed in a later version. performance of C-SCORE will be discussed in a later version.
9. Relationships to ITU-T Standards 10. Relationships to ITU-T Standards
This document is based on two ITU-T standards, Y.3129 [Y.3129] and This document is based on two ITU-T standards, Y.3129 [Y.3129] and
Y.3148 [Y.3148]. Y.3148 [Y.3148].
Recommendation ITU-T Y.3129 specifies the requirements and framework Recommendation ITU-T Y.3129 specifies the requirements and framework
for C-SCORE. The requirements are for generation, update and for C-SCORE. The requirements are for generation, update and
properties of FT. The framework describes a mechanism to guarantee properties of FT. The framework describes a mechanism to guarantee
E2E latency bounds, while meeting the requirements. It specifies how E2E latency bounds, while meeting the requirements. It specifies how
to obtain FT in core nodes, select the delay factor, and configure a to obtain FT in core nodes, select the delay factor, and configure a
network for latency guarantee. network for latency guarantee.
skipping to change at page 21, line 15 skipping to change at page 18, line 4
Recommendation ITU-T Y.3148 specifies the functional architecture, Recommendation ITU-T Y.3148 specifies the functional architecture,
functional entities, operational procedures, and packet metadata to functional entities, operational procedures, and packet metadata to
fulfil the requirements and framework specified in Recommendation fulfil the requirements and framework specified in Recommendation
ITU-T Y.3129. The operational procedures are for the stateless FQ in ITU-T Y.3129. The operational procedures are for the stateless FQ in
relay nodes, including entrance node and core nodes with creation and relay nodes, including entrance node and core nodes with creation and
update of FT values of packets. update of FT values of packets.
This document bridges the gap between the high-level ITU-T standards This document bridges the gap between the high-level ITU-T standards
and deployable protocols. It achieves this by defining the metadata and deployable protocols. It achieves this by defining the metadata
structures and the corresponding header formats for IPv6 and MPLS. structures and the corresponding header formats for IPv6 and MPLS.
Furthermore, it specifies the protocols required for admission
Furthermore, it specifies the procedures required for admission
control and time-difference compensation, alongside the technical control and time-difference compensation, alongside the technical
methodology for C-SCORE approximation. methodology for C-SCORE approximation.
10. IANA Considerations 11. IANA Considerations
There might be matters that require IANA considerations associated There might be matters that require IANA considerations associated
with metadata. If necessary, relevant text will be added in a later with metadata. If necessary, relevant text will be added in a later
version. version.
11. Security Considerations 12. Security Considerations
This section will be described later. This section will be described later.
12. Acknowledgements 13. Acknowledgements
13. Contributor
14. References 14. Contributor
14.1. Normative References 15. References
[I-D.ietf-detnet-dataplane-taxonomy] 15.1. Normative References
Joung, J., Geng, X., Peng, S., and T. T. Eckert,
"Dataplane Enhancement Taxonomy", Work in Progress,
Internet-Draft, draft-ietf-detnet-dataplane-taxonomy-05, 8
January 2026, <https://datatracker.ietf.org/doc/html/
draft-ietf-detnet-dataplane-taxonomy-05>.
[I-D.ietf-mpls-mna-detnet] [I-D.ietf-mpls-mna-detnet]
Song, X., Mirsky, G., Varga, B., Gandhi, R., and Q. Xiong, Song, X., Mirsky, G., Varga, B., Gandhi, R., and Q. Xiong,
"MPLS Network Action for Deterministic Networking", Work "MPLS Network Action for Deterministic Networking", Work
in Progress, Internet-Draft, draft-ietf-mpls-mna-detnet- in Progress, Internet-Draft, draft-ietf-mpls-mna-detnet-
00, 7 January 2026, 00, 7 January 2026,
<https://datatracker.ietf.org/doc/html/draft-ietf-mpls- <https://datatracker.ietf.org/doc/html/draft-ietf-mpls-
mna-detnet-00>. mna-detnet-00>.
[I-D.ietf-mpls-mna-ps-hdr] [I-D.ietf-mpls-mna-ps-hdr]
Rajamanickam, J., Gandhi, R., Zigler, R., and J. Dong, Rajamanickam, J., Gandhi, R., Zigler, R., Dong, J., and J.
"Post-Stack MPLS Network Action (MNA) Solution", Work in Bhattacharya, "Post-Stack MPLS Network Action (MNA)
Progress, Internet-Draft, draft-ietf-mpls-mna-ps-hdr-06, Solution", Work in Progress, Internet-Draft, draft-ietf-
20 January 2026, <https://datatracker.ietf.org/doc/html/ mpls-mna-ps-hdr-07, 24 February 2026,
draft-ietf-mpls-mna-ps-hdr-06>. <https://datatracker.ietf.org/doc/html/draft-ietf-mpls-
mna-ps-hdr-07>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", STD 86, RFC 8200, (IPv6) Specification", STD 86, RFC 8200,
DOI 10.17487/RFC8200, July 2017, DOI 10.17487/RFC8200, July 2017,
<https://www.rfc-editor.org/info/rfc8200>. <https://www.rfc-editor.org/info/rfc8200>.
14.2. Informative References [Y.3129] International Telecommunication Union, "Requirements and
framework for stateless fair queuing in large scale
networks including IMT-2020 and beyond",
ITU-T Recommendation Y.3129, April 2024.
[ANDREWS] Andrews, M., "Instability of FIFO in the permanent [Y.3148] International Telecommunication Union, "Functional
sessions model at arbitrarily small network loads", ACM architecture for stateless fair queuing in large scale
Trans. Algorithms, vol. 5, no. 3, pp. 1-29, doi: networks including IMT-2020 and beyond",
10.1145/1541885.1541894, July 2009. ITU-T Recommendation Y.3148, August 2025.
[BOUILLARD] 15.2. Informative References
Bouillard, A., Boyer, M., and E. Le Corronc,
"Deterministic network calculus: From theory to practical
implementation", in Networks and Telecommunications.
Hoboken, NJ, USA: Wiley, doi: 10.1002/9781119440284, 2018.
[C-SCORE] Joung, J., Kwon, J., Ryoo, J., and T. Cheung, "Scalable [C-SCORE] Joung, J., Kwon, J., Ryoo, J., and T. Cheung, "Scalable
flow isolation with work conserving stateless core fair flow isolation with work conserving stateless core fair
queuing for deterministic networking", IEEE Access, vol. queuing for deterministic networking", IEEE Access, vol.
11, pp. 105225 - 105247, doi:10.1109/ACCESS.2023.3318479, 11, pp. 105225 - 105247, doi:10.1109/ACCESS.2023.3318479,
2023. 2023.
[GOLESTANI] [I-D.ietf-detnet-dataplane-taxonomy]
Golestani, S. J., "A self-clocked fair queueing scheme for Joung, J., Geng, X., Peng, S., and T. T. Eckert,
broadband applications", in Proc. INFOCOM, vol. 1, pp. "Dataplane Enhancement Taxonomy", Work in Progress,
636-646, doi: 10.1109/INFCOM.1994.337677, 1994. Internet-Draft, draft-ietf-detnet-dataplane-taxonomy-05, 8
January 2026, <https://datatracker.ietf.org/doc/html/
draft-ietf-detnet-dataplane-taxonomy-05>.
[KAUR] Kaur, J. and H.M. Vin, "Core-stateless guaranteed rate [KAUR] Kaur, J. and H.M. Vin, "Core-stateless guaranteed rate
scheduling algorithms", in Proc. INFOCOM, vol.3, pp. scheduling algorithms", in Proc. INFOCOM, vol.3, pp.
1484-1492, 2001. 1484-1492, 2001.
[PAREKH] Parekh, A. and R. Gallager, "A generalized processor
sharing approach to flow control in integrated services
networks: the single-node case", IEEE/ACM Trans.
Networking, vol. 1, no. 3, pp. 344-357, June 1993.
[STILIADIS-LRS]
Stiliadis, D. and A. Anujan, "Latency-rate servers: A
general model for analysis of traffic scheduling
algorithms", IEEE/ACM Trans. Networking, vol. 6, no. 5,
pp. 611-624, 1998.
[STILIADIS-RPS]
Stiliadis, D. and A. Anujan, "Rate-proportional servers: A
design methodology for fair queueing algorithms", IEEE/ACM
Trans. Networking, vol. 6, no. 2, pp. 164-174, 1998.
[STOICA] Stoica, I. and H. Zhang, "Providing guaranteed services
without per flow management", ACM SIGCOMM Computer
Communication Review, vol. 29, no. 4, pp. 81-94, 1999.
[THOMAS] Thomas, L., Le Boudec, J., and A. Mifdaoui, "On cyclic
dependencies and regulators in time-sensitive networks",
in Proc. IEEE Real-Time Syst. Symp. (RTSS), York, U.K.,
pp. 299-311, December 2019.
[Y.3129] International Telecommunication Union, "Requirements and
framework for stateless fair queuing in large scale
networks including IMT-2020 and beyond",
ITU-T Recommendation Y.3129, April 2024.
[Y.3148] International Telecommunication Union, "Functional
architecture for stateless fair queuing in large scale
networks including IMT-2020 and beyond",
ITU-T Recommendation Y.3148, August 2025.
[ZHANG] Zhang, L., "Virtual clock: A new traffic control algorithm
for packet switching networks", in Proc. ACM symposium on
Communications architectures & protocols, pp. 19-29, 1990.
Authors' Addresses Authors' Addresses
Jinoo Joung Jinoo Joung
Sangmyung University Sangmyung University
Email: jjoung@smu.ac.kr Email: jjoung@smu.ac.kr
Jeong-dong Ryoo Jeong-dong Ryoo
ETRI ETRI
Email: ryoo@etri.re.kr Email: ryoo@etri.re.kr
 End of changes. 87 change blocks. 
457 lines changed or deleted 266 lines changed or added

This html diff was produced by rfcdiff 1.49. The latest version is available from https://github.com/ietf-tools/rfcdiff