[Detnet] Fwd: review DetNet OAM framework draft

Greg Mirsky <gregimirsky@gmail.com> Tue, 11 May 2021 13:11 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: detnet@ietfa.amsl.com
Delivered-To: detnet@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 195123A1733; Tue, 11 May 2021 06:11:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 4.305
X-Spam-Level: ****
X-Spam-Status: No, score=4.305 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, GB_SUMOF=5, HTML_COMMENT_SAVED_URL=1.391, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_HTML_ATTACH=0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SHvDEDKsD0MM; Tue, 11 May 2021 06:10:56 -0700 (PDT)
Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 13A7B3A1731; Tue, 11 May 2021 06:10:54 -0700 (PDT)
Received: by mail-lf1-x12b.google.com with SMTP id i9so21995760lfe.13; Tue, 11 May 2021 06:10:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=pxCUrWmAZoiDbJPNrwuE/po9IkvlzJZuPPpcmlDP3cY=; b=FznGa4c9t7FqjcoJPk1cX5G7pp6lz8GkVjQjnngaDyM81w+Xoq+7Ug0VMdXkOA6/7H TcoN/0Sb7TxIRNSx9ch2qsxgDv/flajOo+YA6di/kILkM/BQ6PqTxcs64J1wmYo1AZRG j1VdMAqGFPt7ILWvNh3iMUafz1vzRGJdomY2r2aTVDEVhtBZnk+xDqy3Qm2idF0wmJfi bs5N/C1gXus3DWV5AT0+SNdPtIc2wACUQuthneHnJAJ5r8NPQ6JXuPaRd2dQNql0/SNN 0vom1vrtTfXU82ZvhGUMiVRwjXBDFufPDUHXSFLNGLwe0Tu96bYmkJsvOQQtpYtE8Abb vTHQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=pxCUrWmAZoiDbJPNrwuE/po9IkvlzJZuPPpcmlDP3cY=; b=m94lJkvHi6PtRwpaywbVj1ze83MRy1EZxRgkDODiuFeI6Y5hWazn93aldZiVM2cEuN wA8O0dkhOIVPml4vxUsNWU4ODdlB00kqKDCeMJiVBUgEpv68ukcdfQiru0fAhGnvxgMG JxdyZMfpixjnrOYWNQ9BmQ8CIP2rwgPkNqRJoOdzQrbTkx9bJAF4yd9KYWzrPysIJgJG KHlshaUkPICzhXHnCTFw4bh6pCRYcrlxH2cN3y3XT1ApKrKOIGwrxV5gfrza3QAUJbvG dyCxW03hPfB2a1bfPOk/HwBLoYeLMMFeoU3LwW5Heho4vG8gSIbe378C1zgw0kBrwxdc kDlg==
X-Gm-Message-State: AOAM531MIT04HjhfjQc7pyj/AJpomeHjid8739sqbvu2uZegVT4l4gvI HU7K6U6jJmvHlcosMR8bsh6jJHnd0IvyICKrUeq6l6xR39L4rg==
X-Google-Smtp-Source: ABdhPJxL4jnkLo5s2Poiu7lSXMJZS379VGKqurrb3n2+Hk92NGiVtaCxOWnGAtDALuq2q8L0LwyNSLxPPzOGh9/yXdM=
X-Received: by 2002:ac2:5fe5:: with SMTP id s5mr19619560lfg.364.1620738652458; Tue, 11 May 2021 06:10:52 -0700 (PDT)
MIME-Version: 1.0
References: <AM9PR07MB7204DF42315052CE9C620692F2409@AM9PR07MB7204.eurprd07.prod.outlook.com> <CO1PR11MB488199682C35E517C0E2A444D85F9@CO1PR11MB4881.namprd11.prod.outlook.com> <CA+RyBmUsO3Qr9SwJELyd7JnkwWPnhCaji2X8WVHgVoD=JYuoNA@mail.gmail.com>
In-Reply-To: <CA+RyBmUsO3Qr9SwJELyd7JnkwWPnhCaji2X8WVHgVoD=JYuoNA@mail.gmail.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Tue, 11 May 2021 06:10:41 -0700
Message-ID: <CA+RyBmUVq8KyL4gC+v_U9YKgx+vawueYukj2D0StD1eXWoUTGQ@mail.gmail.com>
To: DetNet WG <detnet@ietf.org>, DetNet Chairs <detnet-chairs@ietf.org>
Content-Type: multipart/mixed; boundary="000000000000a78aa905c20d9fe6"
Archived-At: <https://mailarchive.ietf.org/arch/msg/detnet/qFUvxSPJTBJCnWiJRStGlWw4E2U>
Subject: [Detnet] Fwd: review DetNet OAM framework draft
X-BeenThere: detnet@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussions on Deterministic Networking BoF and Proposed WG <detnet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/detnet>, <mailto:detnet-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/detnet/>
List-Post: <mailto:detnet@ietf.org>
List-Help: <mailto:detnet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/detnet>, <mailto:detnet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 May 2021 13:11:04 -0000

Now to the list

Regards,
Greg
---------- Forwarded message ---------
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Sun, May 9, 2021 at 12:12 PM
Subject: Re: review DetNet OAM framework draft
To: Pascal Thubert (pthubert) <pthubert@cisco.com>
Cc: draft-ietf-detnet-oam-framework@ietf.org <
draft-ietf-detnet-oam-framework@ietf.org>, detnet@ietf.org <detnet@ietf.org>


Hi Pascal,
I much appreciate your thorough review and the most helpful comments.
Please find my notes in-lined below under the tag GIM>>.

Regards,
Greg

On Thu, Apr 29, 2021 at 6:07 AM Pascal Thubert (pthubert) <
pthubert@cisco.com> wrote:

> Dear authors and all
>
>
>
> I do support this document but do not see that in its present shape it is
> ready for WGLC. Please find some review comments below.
>
>
>
> - Sadly RFC 8655 does not define either “latency” or “delay”. Reading this
> draft, I now see it as an oversight. I’d suggest that we define them in the
> terminology section. it is my sense that RFC 8655 uses “delay”  with the
> meaning of incremental flight duration, and uses “latency” for the total
> time end-to-end time across the network, IOW, minimal_latency + delay <=
> bounded_latency.
>
GIM>> Thank you, Pascal, for bringing this to the discussion. I agree with
you and I strongly believe that it is utterly important to set the
dictionary so that we can discuss technology and use cases based on the
common understanding of terms. I like your suggestion to provide
definitions of "latency" and "delay" in the Terminology section. Before
going into definitions, I'd like us to define the scope of these terms.
Would it be for the draft and all DetNet OAM documents or it can serve as
the clarification of the terminology used in DetNet in general since RFC
8655?
About the definitions. Do you think that the interpretation of "delay" is
close to the definition of the residence time (RFC 8169):
   Residence time is the sum of the difference between
   the time of receipt at an ingress interface and the time of
   transmission from an egress interface for each node along the network
   path from an ingress node to an egress node.
Also, in RFC 8169 we've assumed that the e2e delay, i.e., e2e latency
consists of two parts - fixed propagation delay determined by the link
speed and variable portion, i.e., residence time. In your opinion, could
this model be applied to DetNet?

>
>
>
>
> - Their work -> that work ?
>
> GIM>> Yes, you're right. Thank you.

>
>
> - the abstract and the introduction indicate that the document is a problem statement / functional requirement draft, and indeed the text reads like a requirement document. But the title says framework. Which is it?
>
>
>
> - same question about the intended status (std track). Is that really what we want here?
>
> GIM>> Thank you for pointing this out. Switched to Informational track.

>
>
> - “It is critical for the
>
>       quality of information obtained using an active method that
>
>       generated test packets are in-band with the monitored data flow.
>
> ”. this duplicates text in 3.3 and does not belong in a terminology. Remove?
>
> GIM>> I agree with you. Text in Section 3.3 is stronger but seems out of
place in that section. I think moving it up into Section 3 might be
appropriate. What do you think?

>
>
> - “                                                                                                         OAM represents the
>
>    essential elements of the network operation and necessary for OAM
>
>    resources that need to be accounted for to maintain the network
>
>    operational.
>
> ” does not parse too well
>
> GIM>> I propose the following update:
 OLD TEXT:
   OAM represents the
   essential elements of the network operation and necessary for OAM
   resources that need to be accounted for to maintain the network
   operational.
NEW TEXT:
   Because OAM is an
   essential element of the network operation,  resources, necessary for
   OAM, need to be accounted for in addition to DetNet flows.

Does the update read better?

>
>
> - “information about the state of the network being
>
>    collected
>
> ” is being? Maybe remove that text ? it seems to repeat the first sentence of the paragraph.
>
> GIM>> The purpose of the sentence is in the final part "... and sent to
the controller".Would the following update make it better:
OLD TEXT:
   In either way, information about the state of the network being
   collected and sent to the controller.
NEW TEXT:
   In either way, information is collected and sent to the controller.

>
>
> - “   Also, we can characterize methods of transporting OAM information
>
>    relative to the path of data.  For instance, OAM information may be
>
>    transported out-of-band or in-band with the data flow.
>
> ” this deserves its own subsection with some more explanation and possibly some references to relevant IETF work
>
> GIM>>  Thank you for highlighting this question. I propose adding
references to work at the IPPM WG. Here's the updated text:
NEW TEXT:
   Also, we can characterize methods of transporting OAM information
   relative to the path of data.  For instance, OAM information may be
   transported in-band or out-of-band with the data flow.  In case of
   the former, the telemetry information uses resources allocated for
   the monitored DetNet flow.  If an in-band method of transporting
   telemetry is used, the amount of generated information needs to be
   carefully analyzed, and additional resources must be reserved.
   [I-D.ietf-ippm-ioam-data] defines the in-band transport mechanism
   where telemetry information is collected in the data packet on which
   information is generated.  Two tracing methods are described - end-
   to-end, i.e., from the ingress and egress nodes, and hop-by-hop,
   i.e., like end-to-end with additional information from transit nodes.
   [I-D.ietf-ippm-ioam-direct-export] and
   [I-D.mirsky-ippm-hybrid-two-step] are examples of out-of-band
   telemetry transport.  In the former case, information is transported
   by each node traversed by the data packet of the monitored DetNet
   flow in a specially constructed packet.  In the latter, information
   is collected in a sequence of follow-up packets that traverse the
   same path as the data packet of the monitored DetNet flow.  In both
   methods, transport of the telemetry can avoid using resources
   allocated for the DetNet domain.
>
>
>
> “   Similarly, the destination does not receive packets from different
>
>    flows through its interface.
>
> ” sorry I do not get the intended meaning
>
> GIM>> We're pointing to the distinction between the Continuity Check (CC)
and Connectivity Verification (CV). The former only verifies that a path
between two systems exists, while the latter, in addition to that path,
verifies that the system does not receive packets from any flow other than
one being monitored. Associated with the CV, is a misconnection error state
and conditions of raising and clearing it. I propose removing the paragraph
with that sentence and update the first paragraph as follows:
OLD TEXT:
   In addition to the Continuity Check, DetNet solutions have to verify
   the connectivity.  This verification considers additional
   constraints, i.e., the absence of misconnection.
NEW TEXT:
   In addition to the Continuity Check, DetNet solutions have to verify
   the connectivity.  This verification considers additional
   constraints, i.e., the absence of misconnection.  The misconnection
   error state is entered after several consecutive test packets from
   other DetNet flows received.  The definition of the conditions of
   entry and exit for misconnection error state is outside the scope of
   this document.

>
>
> - section 3.3 –“It is worth noting that the test and data packets MUST follow the
>
>    same path, i.e., the connectivity verification has to be conducted
>
>    in-band without impacting the data traffic.”
>
>  there appears to be a tension between the bandwidth that is reserved for the flow and the extra bandwidth that the OAM needs .
>
> Do we need to know in advance how much OAM there will be? Must that be included in the reservation?
>
> GIM>> Yes, the impact of OAM, whether active or hybrid, must be evaluated
and accounted for when reserving resources in the DetNet domain. The update
above is intended to clarify that:
   Because OAM is an
   essential element of the network operation,  resources, necessary for
   OAM, need to be accounted for in addition to DetNet flows.
>
>
>
>
>
> - section 3.4. What do we do for PREOF?
>
> GIM>> Excellent question, thank you! I think of two updates:

   - in Section 3.4:

   Also, tracing can be used for the discovery of the Path
   Maximum Transmission Unit or location of elements of
   PREOF for the particular route in the DetNet domain.


   - in Section 6:

   DetNet OAM MUST support the discovery of PREOF along a route in
   the given DetNet domain.


>
> - “The network has isolated and identified the cause of the fault.”
>
> Again I fail to understand the intention.
>
> GIM>> Indeed. How about we change this text as the following:
OLD TEXT:
   The network has isolated and identified the cause of the fault.  For
   instance, the replication process behaves not as expected to a
   specific intermediary router.
NEW TEXT:
   An ability to localize the network defect and provide its
   characterization are necessary elements of network operation.

* Fault localization, a process of deducing the location of a network
failure
   from a set of observed failure indications, might be achieved, for
   example, by tracing the route of the DetNet flow in which the network
   failure was detected.  Another method of fault localization can
   correlate reports of failures from a set of interleaving sessions
   monitoring path continuity.
* Fault characterization is a process of identifying the root cause
  of the problem.  For instance, misconfiguration or malfunction of
  PREOF elements can be the cause of erroneous packet replication or
  extra packets being flooded in the DetNet domain.


>
>
> -“                                  One of the advantages of the use of
>
>    AMM in a DetNet domain with the IP data plane is that the marking is
>
>    applied to a data flow, thus ensuring that measured metrics are
>
>    directly applicable to the DetNet flow.
>
> ” isn’t this the case for IPPM in general? Isn’t the benefit of AMM more related to conciseness?
>
> GIM>> You are right.  Do you think that the update below makes the text
better:
OLD TEXT:
   One of the advantages of the use of   AMM in a DetNet domain with
   the IP data plane is that the marking is   applied to a data flow, thus
   ensuring that measured metrics are   directly applicable to the DetNet
   flow.
NEW TEXT:
   As with all on-path telemetry methods,
   AMM in a DetNet domain with the IP data plane is natively in-band in
   respect to the monitored DetNet flow.  Because the marking is applied
   to a data flow, measured metrics are directly applicable to the
   DetNet flow.  AMM minimizes the additional load on the DetNet domain
   by using nodal collection and computation of performance metrics in
   combination with optionally using out-of-band telemetry collection
   for further network analysis.

>
>
> - Maybe we should discuss the tension between RFC 8939 and the capability to use separate packets for OAM, since the flow is routed on the 5/6 tuple. Does that mean that only in-situ is feasible with RFC 8939 (ref draft-ietf-ippm-ioam-ipv6-options)?
>
> GIM>> I agree, that the definition of the DetNet IP flow is an important
aspect when considering the applicability of IOAM in a DetNet domain. That
may be discussed in this document, though, as I think of it, that
discussion seems also in place in draft-ietf-detnet-ip-oam
<https://datatracker.ietf.org/doc/draft-ietf-detnet-ip-oam/>. What do you
think?

>
>
> Section 4. This section needs more thoughts. For instance there are packets like PTP packets for which it is important to know the transit duration in each hop. Isn’t that true for DetNet as well?
>
> GIM>> We can discuss if and how RFC 8169
<https://datatracker.ietf.org/doc/rfc8169/> may be applicable. Perhaps in a
new discussion thread.

> Radio links have numerous metrics that can be reported (think RFC 8175), we need to know that information per hop as opposed to per path/circuit. We need to know if the hop does store-and-forward or passthrough – e.g., in the former case using smaller frames / packets / segments / fragments and/or network coding may improve the latency. Maybe we could dump the WG collective mind with a dedicated thread?
>
> GIM>> I agree, a new discussion thread will certainly help. I've been
thinking of per node metrics too. I think that to be useful, per node
metrics might as well be for the given ordered pair of ingress-egress
interfaces. I understand that blows the amount of information we generate
and collect, but by doing on-node PM calculations exported will be median,
percentiles values. Definetely needs discussion.


>
>
> - “   DetNet needs to implement a self-healing and self-optimization
>
>    approach.”
>
> Though I agree with the intention and the next sentences, I’m not sure this is the best wording. Those words are classical for routing protocols that
>
> (re)compute a path, which is disruptive or consecutive to a disruption. We do not want disruptions in DetNet. There must be enough PREOF to ensure the SLO
>
>  without repairing. The RAW problem is actually NOT to use all the redundancy at all times in order to save battery and spectrum. I’d rather say that
>
> ” in the face of events that impact the network operation (e.g., link up/down, node crash/reboot, flows starting and ending)
>
> the DetNet Controller need to perform repair and re-optimization actions in order to permanently ensure the SLO of all active flows
>
> with minimal waste of resources”.
>
> GIM>> Pascal, thank you for the suggested text. Accepted with gratitude

>
>
> - section 5.2: Is it the intention to include the repair action in OAM work?
>
> GIM>> Thank you for the question. Thinking of it, OAM role is the
detection of a performance metric degradation. Restorative actions, in my
opinion, are taken by the control or management plane. Since the detection
of performance monitoring and SLO are discussed in other sections, the
value of Section 5.2 may be not sufficient to keep it in the document. What
do you think?

>
>
> - section 6. Why only MPLS data plane? It seems that the requirement are not dependent on the underlying technology.
>
> GIM>> That is my cut'n'paste mistake.

>
>
> - I’m not sure about requirement 5, 6, 10, 11, and of the use of “excellent” in 16.
>
> GIM>> Let's remove the "excellent" :)
GIM>> The intention of listing OAM functionality in that particular manner
as requirements is to provide the check list for the gap analysis and
defining extensions. Are you concerned wth the described OAM functions or
the manner requirements are expressed?

>
>
> - I read requirement 16 as the only service assurance requirement. A very important requirement in my view, and I think we want more. We also want to ensure that a scheduled transmission happens as scheduled/when, that the guaranteed resources like buffers are available per reservation..
>
> GIM>> Thank you for the suggestion. I've tried to capture these resource
reservation parameters in the new requirement:
NEW TEXT:
    18.  DetNet OAM MUST support monitoring levels of resources allocated
        for the particular DetNet flow.  Such resources include but not
        limited to buffer utilization, scheduler transmission calendar.
>
>
>
> Many thanks to the authors for all the great work!
>
> GIM>> Many thanks for your great review, much appreciated!

>
>
> Pascal
>
>
>
>
>
> *From:* detnet <detnet-bounces@ietf.org> *On Behalf Of *Janos Farkas
> *Sent:* jeudi 29 avril 2021 0:28
> *To:* detnet@ietf.org
> *Subject:* [Detnet] review DetNet OAM framework draft
>
>
>
> WG,
>
>
>
> Please review the DetNet OAM framework draft:
> https://datatracker.ietf.org/doc/draft-ietf-detnet-oam-framework/.
>
>
>
> The authors consider the draft in a pretty good shape, do not see any
> major piece missing.
>
>
>
> WG review and comments would be great in order to develop the draft
> towards WG Last Call.
>
>
>
> Please send your comments to the list before the next informal OAM
> discussion on May 11 or bring it to the informal meeting:
> https://ietf.webex.com/ietf/j.php?MTID=m27bb8ee31f025670e4071bfdfcd47432.
> (More details on the OAM informal meetings:
> https://mailarchive.ietf.org/arch/msg/detnet/4ik3tI-E3uhOFNaKhpkZiWwyiZc/
> .)
>
>
>
> Regards,
>
> Janos
>
>
>