Re: [ippm] Some thoughts on draft-mhmcsfh-ippm-pam

Adrian Farrel <adrian@olddog.co.uk> Thu, 25 August 2022 07:56 UTC

Return-Path: <adrian@olddog.co.uk>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF553C15948D; Thu, 25 Aug 2022 00:56:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.804
X-Spam-Level:
X-Spam-Status: No, score=-1.804 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_00=-1.9, HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8hIKsGLizf_D; Thu, 25 Aug 2022 00:56:29 -0700 (PDT)
Received: from mta7.iomartmail.com (mta7.iomartmail.com [62.128.193.157]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2A0B9C1522AA; Thu, 25 Aug 2022 00:56:27 -0700 (PDT)
Received: from vs4.iomartmail.com (vs4.iomartmail.com [10.12.10.122]) by mta7.iomartmail.com (8.14.7/8.14.7) with ESMTP id 27P7uN11029362; Thu, 25 Aug 2022 08:56:23 +0100
Received: from vs4.iomartmail.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7A0AA4604A; Thu, 25 Aug 2022 08:56:23 +0100 (BST)
Received: from vs4.iomartmail.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5A6D846043; Thu, 25 Aug 2022 08:56:23 +0100 (BST)
Received: from asmtp2.iomartmail.com (unknown [10.12.10.249]) by vs4.iomartmail.com (Postfix) with ESMTPS; Thu, 25 Aug 2022 08:56:23 +0100 (BST)
Received: from ioxnode1.iomartmail.com (ioxnode1.iomartmail.com [10.12.10.68]) (authenticated bits=0) by asmtp2.iomartmail.com (8.14.7/8.14.7) with ESMTP id 27P7uNeZ001012 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 25 Aug 2022 08:56:23 +0100
Date: Thu, 25 Aug 2022 08:56:23 +0100
From: Adrian Farrel <adrian@olddog.co.uk>
To: Greg Mirsky <gregimirsky@gmail.com>
Cc: draft-mhmcsfh-ippm-pam@ietf.org, IETF IPPM WG <ippm@ietf.org>
Message-ID: <1736368444.77775.1661414183151@www.getmymail.co.uk>
In-Reply-To: <CA+RyBmXXTZk+LobuRzMCGs5hPOULi85py4GGCDk91a793qDgzw@mail.gmail.com>
References: <0cb301d8a4c9$2eba1b40$8c2e51c0$@olddog.co.uk> <CA+RyBmW9iqpPz0Xxcgki7v3TbMG1=ydcGv4D=ytwCwpwMt9U=g@mail.gmail.com> <070701d8b27f$1b5add00$52109700$@olddog.co.uk> <CA+RyBmXXTZk+LobuRzMCGs5hPOULi85py4GGCDk91a793qDgzw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Priority: 3
Importance: Normal
X-Mailer: Open-Xchange Mailer v7.10.5-Rev38
X-Originating-IP: 81.174.197.152
X-Originating-Client: open-xchange-appsuite
X-Thinkmail-Auth: adrian@olddog.co.uk
X-TM-AS-GCONF: 00
X-TM-AS-Product-Ver: IMSVA-9.1.0.2090-9.0.0.1002-27098.006
X-TM-AS-Result: No--36.646-10.0-31-10
X-imss-scan-details: No--36.646-10.0-31-10
X-TMASE-Version: IMSVA-9.1.0.2090-9.0.1002-27098.006
X-TMASE-Result: 10--36.646500-10.000000
X-TMASE-MatchedRID: pS5owHKhBO2nvGCyBToTI8G0UNgaZpYqnSd2cRY8xQNWcVQdwdqmz45h luopTaWkbSL9AN4UjltdGdXpDccA9EtHpMdfrXMKX9knSHW8uXWRgLeuORRdEjS26AlD1Z3DP9s +bu07mxRfShUER1/uQgkj9soPk8vT019FKRuRnCYK3Ma88LL+bgv/9UzFeXITwLkNMQzGl5AChu Hh+c6LFmAQPhBplYgOh6rDybrcUxDkSXq4F+K6fuJOzIOycbNPVOLMRauooBHW2YYHslT0IzHKp ounjPyfSZe4955OdR8SNYTqxQmfszZLCt1q4UOHlVHM/F6YkvRNLPQl0QAltCQ9RpUKMagaINgI pAA8LuPwf/0ZM0FVlj2FT75yDfBljphWM87aam3gyusN8bViiF/d6ediod7YaY0Tho8c7iQ7d4F 04dq2HMGUmR6xwzaqTRSi19klTntGCmQwtPLiXQXGi/7cli9j/4+GQcsmc+J8vx8dQICa65e6EU +Pc0dXXVsEWVqYqagpqjiZNhCgArEdg3YlRRHLqLiIn4tHBVws9Im7mOi/Zk6P0WALGml4RosHV xKIUf6YL447TCUrrxkP3mYLz4BQavlTNmzwJizYKEy1cds1x6HErxDyhjvnH82viJ3EvH+6PD0F dea44NCUIsgGEbTxg6VO9s8e6fle2wadKADqqpU7Bltw5qVL09la3X7jayaq5EfSA5z3P/NpdUW x/Lv4G6xn0zvUb2huYHOGT0YDG4AX6fAgrrYeVV4ZZmbE3YxzGpNq69FY/iDGLLbElle9AQP+5J V1VlrvS6HmuaouhuJxI3IDnCghqqg2TSZUFr2eAiCmPx4NwHJnzNw42kCx3KlreiiUQaAWefvMt +drgg==
X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/PXnhc8yuZ6I9riRdVTNkZSwwr7o>
Subject: Re: [ippm] Some thoughts on draft-mhmcsfh-ippm-pam
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Aug 2022 07:56:32 -0000

Greg,
This looks to have addressed all of my comments. Thanks.
Adrian
On 24/08/2022 23:51 Greg Mirsky <gregimirsky@gmail.com> wrote:


Hi Adrian,
thank you for your feedback and clarifications. Please find my follow-up notes under the GIM2>> tag. Attached is the new working version of the draft and diff highlighting the updates.
I'm grateful for your comments and look forward to more discussions.

Regards,
Greg

On Wed, Aug 17, 2022 at 2:20 PM Adrian Farrel <adrian@olddog.co.uk> wrote:

Hi again Greg,

 

If you think I supplied enough text to be named as a Contributor, that’s fine. Otherwise, I’d be happy with just an acknowledgement.

GIM2>> Welcome aboard! 


 

More details below.

 

Cheers,

Adrian

 

From: Greg Mirsky <gregimirsky@gmail.com>
Sent: 16 August 2022 22:58
To: Adrian Farrel <adrian@olddog.co.uk>
Cc: draft-mhmcsfh-ippm-pam@ietf.org; IETF IPPM WG <ippm@ietf.org>
Subject: Re: Some thoughts on draft-mhmcsfh-ippm-pam

 

Hi Adrian,

tons of thanks for your kind words supporting our work. Your comments and proposed updates are greatly appreciated. We've discussed them and prepared updates that are in-lined below under the GIM>> tag. Attached, please find the diff highlighting updates and the new working version.

Adrian, we greatly appreciate your insightful comments and practical proposals and would be honored if you agree to join this work as a contributor.

 

Regards,

Greg

 

On Sun, Jul 31, 2022 at 3:35 AM Adrian Farrel <adrian@olddog.co.uk> wrote:

Hi authors and IPPM working group,

Greg asked me if  would have a look at this draft, and I am happy to do
so because we have an increased "need" to deliver SLOs for advanced uses
including IETF Network Slices.

It seems reasonable, to me, to start this work by defining the metrics
(as this document does) before we get into how to record or distribute
them.

Cheer,
Adrian

==

Passive voice

I would like you to inject some more precision into the text (starting
with the Abstract) so that the reader can know who assess the service is
being delivered in compliance with its specified quality.

GIM>> We propose the following update clarifying that PAM can be used by providers and/or users of a Network Slice service:

OLD TEXT:

   Specifically, PAM can be used to assess whether a service is provided

   in compliance with its specified quality, i.e., in accordance with

   its defined SLOs.

NEW TEXT:

   Specifically, PAM

   can be used by providers and/or users of the Network Slice service to

   assess whether the service is provided in compliance with its

   specified quality, i.e., in accordance with its defined SLOs.

 

[AF] Right, this clarifies your intent. Thanks.

[AF] It opens up the question (for me), however, about whether there is a distinction between measurements performed by the user of a service and measurements performed by the provider of a service. I suspect this is practical as much as philosophical, but I don’t understand enough about the mechanisms to know whether it matters.

GIM2>> That is a very interesting question, thank you. I think that there should exist a view of the service that both operator and user can share. An operator might have more information for the root-cause investigation, problem localization, and troubleshooting. Perhaps we can get to more details in a future version.


 

There is a chicken/egg situation here, I think. If you define SLOs that
cannot be reported against because adequate metrics do not exist, then
you have surely done something wrong! Conversely, if you define metrics
that are interesting, but are not the sort of thing used in the
expression of an SLO, then you may be limiting the value of the work.

So what?

Well, I think that, notwithstanding that you *do* discuss SLOs, you
mght make it clearer that your metrics are "useful for the definition
and monitoring of SLOs."

GIM>> A very useful clarification, thank you for your suggestion. Proposed update:

OLD TEXT:

    These metrics, referred to as Precision Availability Metrics (PAM),

   can be used to assess the service levels that are being delivered.

NEW TEXT:

   These metrics, referred to as Precision Availability Metrics (PAM),

   are useful for defining and monitoring of SLOs.

 

[AF] wfm

 

Indeed, 3.1, in its discussion of violation intervals, seems to be
looking at the violation of SLOs rather than simply reporting metrics.

Of course, it is also true that an operator may want to measure the
aspects of the network behaviour to make judgements beyond the delivery
of the SLOs. And a customer might want to measure the behaviour of the
service to determine its suitability for use by applications even when
the SLOs are being met.

---

Your paragraph on the meaning of "precision" at the top of page 4 is
timely. I know that it seems pedantic, but the precision with respect
to the delivery of an SLO only applies as the value of the metric
approaches that specified in the SLO. We do not care (should not care?)
about the precision of the metric when the value of the metric is a
long way removed from that specified in the SLO (for better or worse).

GIM>> That is a very interesting question. It seems unlikely that a different, more accurate measurement method would be used when a value of a performance metric is in the zone of the specified threshold (optimal or critical). On the other hand, it might be helpful that a monitoring system adjusts monitoring rate depending on how close to a threshold is the metric value. I think that is one of the open questions for further discussion. To hildlight it, we propose this update:

OLD TEXT:

   It should be noted that "precision" refers to what is

   being assessed, not to the mechanism used to measure it; in other

   words, it does not refer to the precision of the mechanism with which

   actual service levels are measured.  The specification and

   implementation of methods that provide for accurate measurements is a

   separate topic independent of the definition of the metrics in which

   the results of such measurements would be expressed.

NEW TEXT:

   It should be noted that precision refers to what is

   being assessed, not the mechanism used to measure it; in other words,

   it does not refer to the precision of the mechanism with which actual

   service levels are measured.  Furthermore, the precision, with

   respect to the delivery of an SLO, only applies when the metric value

   approaches the specified threshold levels in the SLO.  The

   specification and implementation of methods that provide for accurate

   measurements is a separate topic independent of the definition of the

   metrics in which the results of such measurements would be expressed.

 

[AF] That’s good. And I like that an implementation might reduce the measurement frequency when things are very good or very bad since measurement (per Heisenberg?) may tend to degrade the performance of the thing being measured.

 

In 2.1, I wonder whether you want to make some statement about SLEs
being out of scope.

GIM>> Thank you for the suggestion, agreed. Here's the update:

NEW TEXT:

   Service Level Expectations, as defined in Section 4.1 of

   [I-D.ietf-teas-ietf-network-slices], are outside the scope of this

   document. 

 

[AF] Fine.

[AF] I might add “…because it is in the nature of SLEs that they define parts of the SLA that are not easily measured.

GIM2>> Accepted, thank you. 


 

In 3.1, your use of "degraded" is doubtful.

A reduction from "exceptionally good" to "very, very good" is a
degradation, but not one we care about with an SLO that says "good
enough".

So perhaps
OLD
   *  VI is a time interval during which at least one of the performance
      parameters degraded compared to its pre-defined optimal level
      threshold.

   *  SVI is a time interval during which at least one the performance
      parameters degraded compared to its pre-defined critical
      threshold.
NEW
   *  VI is a time interval during which at least one of the performance
      parameters degraded below its pre-defined optimal level threshold.

   *  SVI is a time interval during which at least one the performance
      parameters degraded below its pre-defined critical threshold.
END

GIM>> We agree and accept the proposed text update, thank you. 

 

In 3.1

   *  Consequently, VFI is a time interval during which all performance
      objectives are at or better than their respective pre-defined
      optimal levels.  In such a case, the service is in compliance with
      its specification.

The last sentence here is debatable! It is true that the service will
be in compliance with its specification during the VFI, but the implied
converse is not true. That is, a service still may be in compliance with
its specification during a VI (and compliance might depend on the VI
count or ration). Indeed, the service could be in compliance even during
an SVI.

GIM>> Agree and removed the last sentence. 

 

The last sentence of 3.1 could use a pointer to the definition of ratios

GIM>> Added a forward reference to Section 3.2.

 

although it is a bit obvious, is it worth noting (for the benefit of
IPPM and to avoid BMWG) that these metrics are necessarily in-service
metrics?

GIM>> Do you refer to metrics introduced in Section 3.1 or all metrics defined in the draft? I think that that is the latter. If you agree, I'd add that characterization as a generic earlier than Section 3.1. WDYT? 

 

[AF] Yes, sorry, I was sloppy and made the comment against 3.1 when it does, as you say, apply to the whole document.

 

In 3.2 you have count of packets but not count of bytes. Is that OK?

GIM>> We've discussed your question. It is not obvious how to count violated bytes(octets) and/or severely violated bytes(octets). Simply the number of octets in a violated packet? Another question for the further discussion, thank you.

 

[AF] Yes, I think I meant that (in general) all of the bytes in a violated packet are violated bytes. The issue being (of course?) that if packets alternate small and large, and if the large packets all violate, one measure might imply 50% violation while the other might exceed 90%.  “For future discussion” is a fine way forward.

 

3.2 talks about "EIs". Do you mean "VIs" or do you need to introduce a
definition?

GIM>> Thank you for catching my editorial sloppiness. Fixed. 

 

In 3.2

   Determining the condition in which the path is currently with respect
   to availability/unavailability is helpful.

The use of "path" is worrying in this context. Can you say "service"? Or
at least "connectivity"?

GIM>> Would the following update be acceptable:

OLD TEXT:

   Determining the condition in which the path is currently with respect

   to availability/unavailability is helpful. 

NEW TEXT:

   Determining the condition in which the monitored service is currently

   with respect to availability/unavailability is helpful.

 

[AF] Yes, although to fix the English…

NEW NEW

   Determining the current condition of the monitored service

   with respect to availability/unavailability is helpful.

GIM2>> Thank you. 


 

Then in 3.3 you have

   VI, SVI, and VFI characterize the communication between two nodes
   relative to the level of required and acceptable performance and when
   the performance level degrades below an acceptable level.

This puts the SLO specification very much into the context of a P2P
communication. I think you need to justify this somewhere with a
discussion of how a service is decomposed into connectivity constructs
and how the SLOs are applied to each of these even if they are stated as
applying to the whole service. This is particularly important when you
look at the VIR and SVIR for a service that comprises multiple
connectivity constructs only one of which is under performing.

GIM>> A very accurate observation, thank you. We propose a new paragraph into the Section 3.3 (it becomes the second pararagraph):

NEW TEXT:

   It is worth noting that a service might include a set of connectivity

   constructs.  An SLO might apply to all the constructs, or some

   constructs are assigned different SLO values or even different sets

   of SLOs.  It is worth noting that a composite service might include a

   set of connectivity constructs.  An SLO might apply to all the

   constructs, or some constructs are assigned different sets of SLOs.

   For the purpose of PAM, each connectivity construct that composes the

   service can be monitored for its own SLO conformance as a sub-

   service.  The composition of PAMs of these sub-services can be viewed

   as the PAM of the composite service.  The composition of PAMs of

   these sub-services can be viewed as the PAM of the composite service.

 

[AF] Seems you duplicated the first couple of sentences, here.

[AF] Otherwise, this looks good.

GIM2>> Indeed, thank you for catching that. 


 

 3.2 has...

   switching
   between periods requires ten consecutive intervals, shorter
   conditions may not be adequately reflected.

No clue here as to where this requirement often comes from, nor what
the process of switching means. 3.3 gives some clues as to what is
going on, so a forward pointer would help. But even 3.3 doesn't explain
why 10.

GIM>> Indeed, ten intervals is only an example, and we update the text to reflect that:

NEW TEXT:

   But because

   the transition between service availability/unavailability periods is

   based on a pre-defined number of consecutive intervals, e.g., ten,

   shorter conditions may not be adequately reflected. 

 

[AF] OK

 

Is the definition of VIR really what you want? It seems odd that the
existence of an SVI reduces the VIR.  You could define...
   *  violated interval ratio (VIR) is the ratio of the combined number
      of VIs and SVIs to the total number of time unit intervals in a
      time of the availability periods during a fixed measurement
      interval.

GIM>> Agreed and gratefully accepted. 


---

4.

   For example, an SLA might state
   that any given SLO applies only to a certain percentage of packets

Is this really...

   For example, an SLA might state
   that any given SLO applies to at least a certain percentage of
   packets

GIM>> You are correct. Updated the text with your suggestion. 


---

4.

s/To support statistical services/To support statistical SLOs/

GIM>> Thank you, accepted. 


---

In 4 you have

   The definition of histogram metrics is for further study.

I wasn't clear whether you intend that to be in a future version of this
document or in a separate document. Section 6 helps. Maybe include a
forward pointer?

GIM>> Added the reference:

NEW TEXT:

   The definition of histogram metrics is for further study

   (see Section 6).