Re: [ippm] Some thoughts on draft-mhmcsfh-ippm-pam

Greg Mirsky <gregimirsky@gmail.com> Wed, 24 August 2022 22:51 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D3A46C14F720; Wed, 24 Aug 2022 15:51:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.093
X-Spam-Level:
X-Spam-Status: No, score=-7.093 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_HTML_ATTACH=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Pfow1r6c0NcI; Wed, 24 Aug 2022 15:51:55 -0700 (PDT)
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0EE6AC14CF0E; Wed, 24 Aug 2022 15:51:55 -0700 (PDT)
Received: by mail-lf1-x129.google.com with SMTP id s1so23115661lfp.6; Wed, 24 Aug 2022 15:51:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=XXyFdSQwdFKRR3IW9yBsG1kyJN9v9THsiLbhVyNMiDw=; b=YjbJum3rKlgmtEEF8ft6QnBuKeSiaXU86NfFbp7eJ9+YK7E/eGuy7XtsVyyNJHB+70 Nb/GlIFt7Px411ivNt3qYkPVMihHNgGWlunZpESZ7+TG+stNgxpYi6n1OpitC8jKrqzX Rr1jHLQGlTtOE5rx1lcHbEPXoat5HYUKSLhf3I6+che/XqTP/EGj8wlpv39G12V2CROX DY3e7uz5+i4vvPXB38mFJQaJH1Ii0k1aZnSUG35rrWg85wfZy7FppTYnQNWeiSUkFQKy +Y6yv9B/iqpKpMYzB3XaJVNG4CiPjmjOLtTPyDWmRcj6wRBpwL0qbE812TRJRyXWves2 sgOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=XXyFdSQwdFKRR3IW9yBsG1kyJN9v9THsiLbhVyNMiDw=; b=xvIT0/8sAJ5zqcGVreJ8nNWYP2lZZhBntFYceBbaPHRrhedObQsLEIIsHEV0mm3ccQ EnLUG+MJMLDi4OoXz8ww+sBeV1X7jBvRTMJ1mn/2dmNQxCUnS38aNvZ6P/YloEl9PyIk 76g5C3IjukrDgVGZTjcBr1saLbEdYArejlGqN+OS9tZ1B6Jl/zoYgSlFaNOmbTZH44FV L6gJ5ITISnzsNSuqDP3b1BvLickSXlE6OIecNH5VI+MrDAygGEquMdGWfpIrCiDQPPWQ pP70FKZaILL1XJ41gdnQ+5h+dqTqr/7rgfk8iRnnCqUt9F2rP9H5qIGW4v/hf3gx8TQN yZoQ==
X-Gm-Message-State: ACgBeo3Gy6VMLkeDvlHE4T5l9LUndMuV+4wso+c6cWfJ3jTpn7MUVqNq cbSqP3jNSr1rbREDQYcDK27ke+4UMhIDWWUhr88=
X-Google-Smtp-Source: AA6agR569Ec2hLUC/AWITFiU6YSki5QBGqq2+jsG0cdh9C1bYQB5MeO0ZMMoM+aKws3JAIEtHoZPebmpVVTW+7/FXEA=
X-Received: by 2002:a05:6512:214e:b0:492:bd6f:ce29 with SMTP id s14-20020a056512214e00b00492bd6fce29mr338665lfr.310.1661381513124; Wed, 24 Aug 2022 15:51:53 -0700 (PDT)
MIME-Version: 1.0
References: <0cb301d8a4c9$2eba1b40$8c2e51c0$@olddog.co.uk> <CA+RyBmW9iqpPz0Xxcgki7v3TbMG1=ydcGv4D=ytwCwpwMt9U=g@mail.gmail.com> <070701d8b27f$1b5add00$52109700$@olddog.co.uk>
In-Reply-To: <070701d8b27f$1b5add00$52109700$@olddog.co.uk>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Wed, 24 Aug 2022 15:51:41 -0700
Message-ID: <CA+RyBmXXTZk+LobuRzMCGs5hPOULi85py4GGCDk91a793qDgzw@mail.gmail.com>
To: Adrian Farrel <adrian@olddog.co.uk>
Cc: draft-mhmcsfh-ippm-pam@ietf.org, IETF IPPM WG <ippm@ietf.org>
Content-Type: multipart/mixed; boundary="000000000000ed60ce05e704865f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/O-qiRp9yVlZ9b_qr1rrsqOEgMMA>
Subject: Re: [ippm] Some thoughts on draft-mhmcsfh-ippm-pam
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Aug 2022 22:51:59 -0000

Hi Adrian,
thank you for your feedback and clarifications. Please find my follow-up
notes under the GIM2>> tag. Attached is the new working version of the
draft and diff highlighting the updates.
I'm grateful for your comments and look forward to more discussions.

Regards,
Greg

On Wed, Aug 17, 2022 at 2:20 PM Adrian Farrel <adrian@olddog.co.uk> wrote:

> Hi again Greg,
>
>
>
> If you think I supplied enough text to be named as a Contributor, that’s
> fine. Otherwise, I’d be happy with just an acknowledgement.
>
GIM2>> Welcome aboard!

>
>
> More details below.
>
>
>
> Cheers,
>
> Adrian
>
>
>
> *From:* Greg Mirsky <gregimirsky@gmail.com>
> *Sent:* 16 August 2022 22:58
> *To:* Adrian Farrel <adrian@olddog.co.uk>
> *Cc:* draft-mhmcsfh-ippm-pam@ietf.org; IETF IPPM WG <ippm@ietf.org>
> *Subject:* Re: Some thoughts on draft-mhmcsfh-ippm-pam
>
>
>
> Hi Adrian,
>
> tons of thanks for your kind words supporting our work. Your comments and
> proposed updates are greatly appreciated. We've discussed them and prepared
> updates that are in-lined below under the GIM>> tag. Attached, please find
> the diff highlighting updates and the new working version.
>
> Adrian, we greatly appreciate your insightful comments and practical
> proposals and would be honored if you agree to join this work as a
> contributor.
>
>
>
> Regards,
>
> Greg
>
>
>
> On Sun, Jul 31, 2022 at 3:35 AM Adrian Farrel <adrian@olddog.co.uk> wrote:
>
> Hi authors and IPPM working group,
>
> Greg asked me if  would have a look at this draft, and I am happy to do
> so because we have an increased "need" to deliver SLOs for advanced uses
> including IETF Network Slices.
>
> It seems reasonable, to me, to start this work by defining the metrics
> (as this document does) before we get into how to record or distribute
> them.
>
> Cheer,
> Adrian
>
> ==
>
> Passive voice
>
> I would like you to inject some more precision into the text (starting
> with the Abstract) so that the reader can know who assess the service is
> being delivered in compliance with its specified quality.
>
> GIM>> We propose the following update clarifying that PAM can be used by
> providers and/or users of a Network Slice service:
>
> OLD TEXT:
>
>    Specifically, PAM can be used to assess whether a service is provided
>
>    in compliance with its specified quality, i.e., in accordance with
>
>    its defined SLOs.
>
> NEW TEXT:
>
>    Specifically, PAM
>
>    can be used by providers and/or users of the Network Slice service to
>
>    assess whether the service is provided in compliance with its
>
>    specified quality, i.e., in accordance with its defined SLOs.
>
>
>
> [AF] Right, this clarifies your intent. Thanks.
>
> [AF] It opens up the question (for me), however, about whether there is a
> distinction between measurements performed by the user of a service and
> measurements performed by the provider of a service. I suspect this is
> practical as much as philosophical, but I don’t understand enough about the
> mechanisms to know whether it matters.
>
GIM2>> That is a very interesting question, thank you. I think that there
should exist a view of the service that both operator and user can share.
An operator might have more information for the root-cause investigation,
problem localization, and troubleshooting. Perhaps we can get to more
details in a future version.

>
>
> There is a chicken/egg situation here, I think. If you define SLOs that
> cannot be reported against because adequate metrics do not exist, then
> you have surely done something wrong! Conversely, if you define metrics
> that are interesting, but are not the sort of thing used in the
> expression of an SLO, then you may be limiting the value of the work.
>
> So what?
>
> Well, I think that, notwithstanding that you *do* discuss SLOs, you
> mght make it clearer that your metrics are "useful for the definition
> and monitoring of SLOs."
>
> GIM>> A very useful clarification, thank you for your suggestion. Proposed
> update:
>
> OLD TEXT:
>
>     These metrics, referred to as Precision Availability Metrics (PAM),
>
>    can be used to assess the service levels that are being delivered.
>
> NEW TEXT:
>
>    These metrics, referred to as Precision Availability Metrics (PAM),
>
>    are useful for defining and monitoring of SLOs.
>
>
>
> [AF] wfm
>
>
>
> Indeed, 3.1, in its discussion of violation intervals, seems to be
> looking at the violation of SLOs rather than simply reporting metrics.
>
> Of course, it is also true that an operator may want to measure the
> aspects of the network behaviour to make judgements beyond the delivery
> of the SLOs. And a customer might want to measure the behaviour of the
> service to determine its suitability for use by applications even when
> the SLOs are being met.
>
> ---
>
> Your paragraph on the meaning of "precision" at the top of page 4 is
> timely. I know that it seems pedantic, but the precision with respect
> to the delivery of an SLO only applies as the value of the metric
> approaches that specified in the SLO. We do not care (should not care?)
> about the precision of the metric when the value of the metric is a
> long way removed from that specified in the SLO (for better or worse).
>
> GIM>> That is a very interesting question. It seems unlikely that a
> different, more accurate measurement method would be used when a value of a
> performance metric is in the zone of the specified threshold (optimal or
> critical). On the other hand, it might be helpful that a monitoring system
> adjusts monitoring rate depending on how close to a threshold is the metric
> value. I think that is one of the open questions for further discussion. To
> hildlight it, we propose this update:
>
> OLD TEXT:
>
>    It should be noted that "precision" refers to what is
>
>    being assessed, not to the mechanism used to measure it; in other
>
>    words, it does not refer to the precision of the mechanism with which
>
>    actual service levels are measured.  The specification and
>
>    implementation of methods that provide for accurate measurements is a
>
>    separate topic independent of the definition of the metrics in which
>
>    the results of such measurements would be expressed.
>
> NEW TEXT:
>
>    It should be noted that precision refers to what is
>
>    being assessed, not the mechanism used to measure it; in other words,
>
>    it does not refer to the precision of the mechanism with which actual
>
>    service levels are measured.  Furthermore, the precision, with
>
>    respect to the delivery of an SLO, only applies when the metric value
>
>    approaches the specified threshold levels in the SLO.  The
>
>    specification and implementation of methods that provide for accurate
>
>    measurements is a separate topic independent of the definition of the
>
>    metrics in which the results of such measurements would be expressed.
>
>
>
> [AF] That’s good. And I like that an implementation might reduce the
> measurement frequency when things are very good or very bad since
> measurement (per Heisenberg?) may tend to degrade the performance of the
> thing being measured.
>
>
>
> In 2.1, I wonder whether you want to make some statement about SLEs
> being out of scope.
>
> GIM>> Thank you for the suggestion, agreed. Here's the update:
>
> NEW TEXT:
>
>    Service Level Expectations, as defined in Section 4.1 of
>
>    [I-D.ietf-teas-ietf-network-slices], are outside the scope of this
>
>    document.
>
>
>
> [AF] Fine.
>
> [AF] I might add “…because it is in the nature of SLEs that they define
> parts of the SLA that are not easily measured.
>
GIM2>> Accepted, thank you.

>
>
> In 3.1, your use of "degraded" is doubtful.
>
> A reduction from "exceptionally good" to "very, very good" is a
> degradation, but not one we care about with an SLO that says "good
> enough".
>
> So perhaps
> OLD
>    *  VI is a time interval during which at least one of the performance
>       parameters degraded compared to its pre-defined optimal level
>       threshold.
>
>    *  SVI is a time interval during which at least one the performance
>       parameters degraded compared to its pre-defined critical
>       threshold.
> NEW
>    *  VI is a time interval during which at least one of the performance
>       parameters degraded below its pre-defined optimal level threshold.
>
>    *  SVI is a time interval during which at least one the performance
>       parameters degraded below its pre-defined critical threshold.
> END
>
> GIM>> We agree and accept the proposed text update, thank you.
>
>
>
> In 3.1
>
>    *  Consequently, VFI is a time interval during which all performance
>       objectives are at or better than their respective pre-defined
>       optimal levels.  In such a case, the service is in compliance with
>       its specification.
>
> The last sentence here is debatable! It is true that the service will
> be in compliance with its specification during the VFI, but the implied
> converse is not true. That is, a service still may be in compliance with
> its specification during a VI (and compliance might depend on the VI
> count or ration). Indeed, the service could be in compliance even during
> an SVI.
>
> GIM>> Agree and removed the last sentence.
>
>
>
> The last sentence of 3.1 could use a pointer to the definition of ratios
>
> GIM>> Added a forward reference to Section 3.2.
>
>
>
> although it is a bit obvious, is it worth noting (for the benefit of
> IPPM and to avoid BMWG) that these metrics are necessarily in-service
> metrics?
>
> GIM>> Do you refer to metrics introduced in Section 3.1 or all metrics
> defined in the draft? I think that that is the latter. If you agree, I'd
> add that characterization as a generic earlier than Section 3.1. WDYT?
>
>
>
> [AF] Yes, sorry, I was sloppy and made the comment against 3.1 when it
> does, as you say, apply to the whole document.
>
>
>
> In 3.2 you have count of packets but not count of bytes. Is that OK?
>
> GIM>> We've discussed your question. It is not obvious how to count
> violated bytes(octets) and/or severely violated bytes(octets). Simply the
> number of octets in a violated packet? Another question for the further
> discussion, thank you.
>
>
>
> [AF] Yes, I think I meant that (in general) all of the bytes in a violated
> packet are violated bytes. The issue being (of course?) that if packets
> alternate small and large, and if the large packets all violate, one
> measure might imply 50% violation while the other might exceed 90%.  “For
> future discussion” is a fine way forward.
>
>
>
> 3.2 talks about "EIs". Do you mean "VIs" or do you need to introduce a
> definition?
>
> GIM>> Thank you for catching my editorial sloppiness. Fixed.
>
>
>
> In 3.2
>
>    Determining the condition in which the path is currently with respect
>    to availability/unavailability is helpful.
>
> The use of "path" is worrying in this context. Can you say "service"? Or
> at least "connectivity"?
>
> GIM>> Would the following update be acceptable:
>
> OLD TEXT:
>
>    Determining the condition in which the path is currently with respect
>
>    to availability/unavailability is helpful.
>
> NEW TEXT:
>
>    Determining the condition in which the monitored service is currently
>
>    with respect to availability/unavailability is helpful.
>
>
>
> [AF] Yes, although to fix the English…
>
> NEW NEW
>
>    Determining the current condition of the monitored service
>
>    with respect to availability/unavailability is helpful.
>
GIM2>> Thank you.

>
>
> Then in 3.3 you have
>
>    VI, SVI, and VFI characterize the communication between two nodes
>    relative to the level of required and acceptable performance and when
>    the performance level degrades below an acceptable level.
>
> This puts the SLO specification very much into the context of a P2P
> communication. I think you need to justify this somewhere with a
> discussion of how a service is decomposed into connectivity constructs
> and how the SLOs are applied to each of these even if they are stated as
> applying to the whole service. This is particularly important when you
> look at the VIR and SVIR for a service that comprises multiple
> connectivity constructs only one of which is under performing.
>
> GIM>> A very accurate observation, thank you. We propose a new paragraph
> into the Section 3.3 (it becomes the second pararagraph):
>
> NEW TEXT:
>
>    It is worth noting that a service might include a set of connectivity
>
>    constructs.  An SLO might apply to all the constructs, or some
>
>    constructs are assigned different SLO values or even different sets
>
>    of SLOs.  It is worth noting that a composite service might include a
>
>    set of connectivity constructs.  An SLO might apply to all the
>
>    constructs, or some constructs are assigned different sets of SLOs.
>
>    For the purpose of PAM, each connectivity construct that composes the
>
>    service can be monitored for its own SLO conformance as a sub-
>
>    service.  The composition of PAMs of these sub-services can be viewed
>
>    as the PAM of the composite service.  The composition of PAMs of
>
>    these sub-services can be viewed as the PAM of the composite service.
>
>
>
> [AF] Seems you duplicated the first couple of sentences, here.
>
> [AF] Otherwise, this looks good.
>
GIM2>> Indeed, thank you for catching that.

>
>
>  3.2 has...
>
>    switching
>    between periods requires ten consecutive intervals, shorter
>    conditions may not be adequately reflected.
>
> No clue here as to where this requirement often comes from, nor what
> the process of switching means. 3.3 gives some clues as to what is
> going on, so a forward pointer would help. But even 3.3 doesn't explain
> why 10.
>
> GIM>> Indeed, ten intervals is only an example, and we update the text to
> reflect that:
>
> NEW TEXT:
>
>    But because
>
>    the transition between service availability/unavailability periods is
>
>    based on a pre-defined number of consecutive intervals, e.g., ten,
>
>    shorter conditions may not be adequately reflected.
>
>
>
> [AF] OK
>
>
>
> Is the definition of VIR really what you want? It seems odd that the
> existence of an SVI reduces the VIR.  You could define...
>    *  violated interval ratio (VIR) is the ratio of the combined number
>       of VIs and SVIs to the total number of time unit intervals in a
>       time of the availability periods during a fixed measurement
>       interval.
>
> GIM>> Agreed and gratefully accepted.
>
>
> ---
>
> 4.
>
>    For example, an SLA might state
>    that any given SLO applies only to a certain percentage of packets
>
> Is this really...
>
>    For example, an SLA might state
>    that any given SLO applies to at least a certain percentage of
>    packets
>
> GIM>> You are correct. Updated the text with your suggestion.
>
>
> ---
>
> 4.
>
> s/To support statistical services/To support statistical SLOs/
>
> GIM>> Thank you, accepted.
>
>
> ---
>
> In 4 you have
>
>    The definition of histogram metrics is for further study.
>
> I wasn't clear whether you intend that to be in a future version of this
> document or in a separate document. Section 6 helps. Maybe include a
> forward pointer?
>
> GIM>> Added the reference:
>
> NEW TEXT:
>
>    The definition of histogram metrics is for further study
>
>    (see Section 6).
>