Re: [Teas-ns-dt] Availability

"Dongjie (Jimmy)" <jie.dong@huawei.com> Wed, 03 June 2020 15:55 UTC

Return-Path: <jie.dong@huawei.com>
X-Original-To: teas-ns-dt@ietfa.amsl.com
Delivered-To: teas-ns-dt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 15CDD3A0D96 for <teas-ns-dt@ietfa.amsl.com>; Wed, 3 Jun 2020 08:55:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Q3ZTju-ryCdw for <teas-ns-dt@ietfa.amsl.com>; Wed, 3 Jun 2020 08:55:54 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 667F63A0D94 for <teas-ns-dt@ietf.org>; Wed, 3 Jun 2020 08:55:53 -0700 (PDT)
Received: from lhreml705-chm.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id 08156ABE26DA607F361B; Wed, 3 Jun 2020 16:55:50 +0100 (IST)
Received: from dggeme704-chm.china.huawei.com (10.1.199.100) by lhreml705-chm.china.huawei.com (10.201.108.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1913.5; Wed, 3 Jun 2020 16:55:48 +0100
Received: from dggeme754-chm.china.huawei.com (10.3.19.100) by dggeme704-chm.china.huawei.com (10.1.199.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1913.5; Wed, 3 Jun 2020 23:55:46 +0800
Received: from dggeme754-chm.china.huawei.com ([10.6.80.77]) by dggeme754-chm.china.huawei.com ([10.6.80.77]) with mapi id 15.01.1913.007; Wed, 3 Jun 2020 23:55:46 +0800
From: "Dongjie (Jimmy)" <jie.dong@huawei.com>
To: Eric Gray <eric.gray=40ericsson.com@dmarc.ietf.org>, Greg Mirsky <gregimirsky@gmail.com>
CC: Jeff Tantsura <jefftant.ietf@gmail.com>, Kiran Makhijani <kiranm@futurewei.com>, "teas-ns-dt@ietf.org" <teas-ns-dt@ietf.org>
Thread-Topic: [Teas-ns-dt] Availability
Thread-Index: AdY4HfMdDO+ydZ5wRD6WLlyw/XAJzAAF+XTA//+2fQCAAAcOgIAABosAgAACRACAAAPtgIAAAoGAgAADeACAAShlAIAAA1aAgAAK6gD//fuBwA==
Date: Wed, 3 Jun 2020 15:55:46 +0000
Message-ID: <a638071fbe9c491cba692f8d98dfbea5@huawei.com>
References: <MN2PR15MB31030C424C7AEF28118B5704978A0@MN2PR15MB3103.namprd15.prod.outlook.com> <BYAPR13MB24377B5E3FD0DEB85599C724D98A0@BYAPR13MB2437.namprd13.prod.outlook.com> <22dcfd45-85ce-4300-a973-765b8575c4dd@Spark> <CA+RyBmXXKjGPA7Fgwp+axVfnjx-iUySjyW8JF4Au3awDOUHTrQ@mail.gmail.com> <09306ffd-5ac5-4006-a9fc-4ede36b5b4d3@Spark> <CA+RyBmVMcfKhr4dDTnb00muPuSWaAaLvkteZ+To8BXj5v0CfUA@mail.gmail.com> <0432c69e-1151-404d-893c-cd240c5531a3@Spark> <CA+RyBmVBSph4dkgNUSNLmx0x67mJZAqTM31J-B4VJ2x5xrO4gA@mail.gmail.com> <a9aee371-521c-4a5a-ae60-3d742b58e77b@Spark> <MN2PR15MB310332371723B9153EBC8F29978B0@MN2PR15MB3103.namprd15.prod.outlook.com> <CA+RyBmVLGN9xxi5wFp6PcTzuOBJZ-+2C7B2mxYGfko0AYFuhnA@mail.gmail.com> <MN2PR15MB3103B908ECF0E56B13E1E934978B0@MN2PR15MB3103.namprd15.prod.outlook.com>
In-Reply-To: <MN2PR15MB3103B908ECF0E56B13E1E934978B0@MN2PR15MB3103.namprd15.prod.outlook.com>
Accept-Language: en-US, zh-CN
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.45.188.200]
Content-Type: multipart/alternative; boundary="_000_a638071fbe9c491cba692f8d98dfbea5huaweicom_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/teas-ns-dt/qpoara9CltoaA9BAynhSQrKRCZ4>
Subject: Re: [Teas-ns-dt] Availability
X-BeenThere: teas-ns-dt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TEAS Network Slicing Design Team <teas-ns-dt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/teas-ns-dt>, <mailto:teas-ns-dt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/teas-ns-dt/>
List-Post: <mailto:teas-ns-dt@ietf.org>
List-Help: <mailto:teas-ns-dt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/teas-ns-dt>, <mailto:teas-ns-dt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jun 2020 15:55:57 -0000

Hi Eric and Greg,

I also think that the service availability should be based on the conformant of all SLOs listed in the SLA. Failure in meeting any one of the SLOs should be considered unavailable for the service.

And in addition to the failure or outage case, some SLO of one network slice may be impacted by the misbehavior of traffic in other network slices. Thus to meet the required availability, the impact between slices also needs to be taken into consideration.

Best regards,
Jie

From: Teas-ns-dt [mailto:teas-ns-dt-bounces@ietf.org] On Behalf Of Eric Gray
Sent: Wednesday, June 3, 2020 12:39 AM
To: Greg Mirsky <gregimirsky@gmail.com>
Cc: Eric Gray <eric.gray=40ericsson.com@dmarc.ietf.org>rg>; Jeff Tantsura <jefftant.ietf@gmail.com>om>; Kiran Makhijani <kiranm@futurewei.com>om>; teas-ns-dt@ietf.org
Subject: Re: [Teas-ns-dt] Availability

Greg,

              This agrees with what Jeff said.

              That is, the conditions under which ALL SLO are met is the intersection of service availability for each individual SLO.

              As I said (separately and very recently), however, there is a very high likelihood of correlated failures for many SLO values (e.g. – excessive packet loss and delay variation are very likely to occur in the same outage).

              For any service having a relatively high availability, it is extremely likely that – outside of infrequent outages – the observed behavior with respect to all SLO will be that the service is well within bounds.  If this is not the case, and that fact is detected by the service (provider), chances are very good that there will be some corrective response aimed at returning the service to the condition where the service performance with respect to all SLO is again well within bounds.

--
Eric

From: Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>>
Sent: Tuesday, June 2, 2020 12:00 PM
To: Eric Gray <eric.gray@ericsson.com<mailto:eric.gray@ericsson.com>>
Cc: Jeff Tantsura <jefftant.ietf@gmail.com<mailto:jefftant.ietf@gmail.com>>; Kiran Makhijani <kiranm@futurewei.com<mailto:kiranm@futurewei.com>>; teas-ns-dt@ietf.org<mailto:teas-ns-dt@ietf.org>; Eric Gray <eric.gray=40ericsson.com@dmarc.ietf.org<mailto:eric.gray=40ericsson.com@dmarc.ietf.org>>
Subject: Re: [Teas-ns-dt] Availability
Importance: High

Hi Eric,
it is much better to have a use case to discuss, thank you. Do you think that availability includes all performance-related parameters, e.g., packet loss ratio, latency, and jitter? My interpretation of the definition in the draft is that availability is the intersection of time periods when each performance-related SLO is within the agreed range. And I think that it is important to see it as an intersection because if during the same period, for example, the packet loss ratio is 0% but latency is above its threshold, that period of time is considered as unavailability period for the given TS. What do you think?

Regards,
Greg

On Tue, Jun 2, 2020 at 8:48 AM Eric Gray <eric.gray@ericsson.com<mailto:eric.gray@ericsson.com>> wrote:
Jeff,

              I have to disagree.

              In a simple example, if availability requires a service to be conformant with all SLO parameters at least 99.999 % of the time, then it is possible to have 100% packet loss for the remaining .001 % of the time and still have the service be within SLA.

              To help people to get their heads around this percentage, .001 % of a year is a little over 5 minutes and 15 seconds.

              So, a service may not be meeting any or all of its other SLO parameters and – as long as it is still meeting its availability SLO parameter, it is still within SLA.

              Obviously, under those circumstances, availability does matter.

--
Eric

From: Teas-ns-dt <teas-ns-dt-bounces@ietf.org<mailto:teas-ns-dt-bounces@ietf.org>> On Behalf Of Jeff Tantsura
Sent: Monday, June 1, 2020 6:07 PM
To: Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>>
Cc: Kiran Makhijani <kiranm@futurewei.com<mailto:kiranm@futurewei.com>>; teas-ns-dt@ietf.org<mailto:teas-ns-dt@ietf.org>; Eric Gray <eric.gray=40ericsson.com@dmarc.ietf.org<mailto:40ericsson.com@dmarc.ietf.org>>
Subject: Re: [Teas-ns-dt] Availability

Hi Greg,

A service (SLA) could have 1 or more SLO’s associated with it.
A SLO is met (TRUE), when its objective is met (within boundaries specified).
Usually a SLA is composed of a set of SLO’s with logical AND, e.g if any of SLO’s is FALSE -> SLA (or else)

Example:
If  SLO (availability) is met but SLO (packet_loss) isn’t, availability becomes an irrelevant objective.

Cheers,
Jeff
On Jun 1, 2020, 2:55 PM -0700, Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>>, wrote:
Hi Jeff,
thank you for the clarification. Does measuring uptime considers whether all metrics included in SLO are within their respective acceptable limits? In other words, if the quality of the TS degraded, due to, for example, excessive packet loss, below the requested threshold, would that time period be attributed to the Service uptime period? In my experience, uptime of a node (router, server) is easy to express. Uptime of a service? Much appreciate it if you help with an example or a reference to the definition.

Regards,
Greg

On Mon, Jun 1, 2020 at 2:46 PM Jeff Tantsura <jefftant.ietf@gmail.com<mailto:jefftant.ietf@gmail.com>> wrote:
Hi Greg,

SLO - is an objective (as the name suggests), not a metric. A metric without a context is meaningless.
SLO makes use of the metrics gathered to derive whether the objective has been met.

Example:
SLO (availability) = uptime 90% over 10 hours
total_time=10h
uptime=8h

using the metrics above we can conclude that the total_availability = 80%, which is less than the service objective set (90%) ->  SLA(or else)

Hope this clarifies

Cheers,
Jeff
On Jun 1, 2020, 2:32 PM -0700, Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>>, wrote:
Hi Jeff,
in my reading of the definition, it is the intersection of metrics already listed in the SLO. If that is the case, how useful is another metric that is only a reflection of other metrics?

Regards,
Greg

On Mon, Jun 1, 2020 at 2:24 PM Jeff Tantsura <jefftant.ietf@gmail.com<mailto:jefftant.ietf@gmail.com>> wrote:
Greg,

I thought the definition provided was pretty clear and comprehendible, why do we need to rephrase it?

Cheers,
Jeff
On Jun 1, 2020, 2:00 PM -0700, Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>>, wrote:
Hi Jeff,
if we define availability as the ratio of the period all requested in the SLO metrics are within an acceptable range to the time since the service was handed to the customer (I propose to refer to this metric as "availability ratio"), then I think it can be expressed as
[\bigcap _{i=1}^{n}A_{i}], where Ai is the time period the particular metric remained within its acceptable boundary.

Regards,
Greg

On Mon, Jun 1, 2020 at 1:35 PM Jeff Tantsura <jefftant.ietf@gmail.com<mailto:jefftant.ietf@gmail.com>> wrote:
Mostly agree with Eric/Kiran

It should not be removed, but further clarified.
Network/service availability is a measurable metric, availability = uptime/total_time(uptime+downtime)
Rule of thumb - a service is deemed available when all the SLO’s associated with it are met(TRUE).
In a complex/multidimensional service, different objects might have different availability metrics .
For simplicity sake - total_availability(normalized metric) = Σ(subservice-1..subservice-n), so both, per SLO as well as composite metrics can be used.

Cheers,
Jeff
On Jun 1, 2020, 10:08 AM -0700, Kiran Makhijani <kiranm@futurewei.com<mailto:kiranm@futurewei.com>>, wrote:
Thanks! I support not removing it.
Sticking with individual SLO seems to be a right decision but can be deferred to NBI document. we need not state that here.
-Kiran

From: Teas-ns-dt <teas-ns-dt-bounces@ietf.org<mailto:teas-ns-dt-bounces@ietf.org>> On Behalf Of Eric Gray
Sent: Monday, June 1, 2020 7:35 AM
To: teas-ns-dt@ietf.org<mailto:teas-ns-dt@ietf.org>
Subject: [Teas-ns-dt] Availability

I agree that the definition needs to be cleaned up, but I disagree that it should be omitted.

A part of what probably should be cleaned up is the part that talks about service degradation.  In general, this is an important factor in determining availability, but it is a bit vague for the purpose of definition.

I also disagree that availability is not measurable.

As a proof of concept for measuring , if there are any mandatory measurable objectives, then failing to meet any of those objectives makes the service measurably unavailable.  That is, if you can determine if specific mandatory objectives are being met, then you can determine if they are not being met and therefore determine if the service is unavailable.

Availability is an important aspect of any service, because it is understood that the higher the required availability, the more difficult (and thus expensive) it is to provide that service.

Defining availability as a fraction as we have done in the draft, allows for services that may experience a certain amount of outages over a service period.  A service request may ask for as high an availability as the provider and requester have agreed to (under the terms they agreed to) in advance.

Note that this elevates the importance of having (at least mostly) measurable objectives, simply because you cannot determine if a non-measurable objective is being met – hence you cannot (necessarily) determine the availability of any service that depends on that objective.

It is further interesting to note that the notion of a service depending on objectives that it cannot determine are not being met is a non-sequitur.

Measuring availability in terms of mandatory objectives – as a whole – is the simplest approach; one could group one or more mandatory objectives and define an availability separately for the group – thus allowing for a higher degree of acceptance for failing to meet one set of service objectives compared to others.

If we were going to do that, it would probably be better to define availability as a parameter that applies individually to service objectives.

In my opinion we should at least initially stick to the simple case, where availability is defined as a service objective, rather than as a parameter of every service objective – but I am willing to go either way.

--
Eric
--
Teas-ns-dt mailing list
Teas-ns-dt@ietf.org<mailto:Teas-ns-dt@ietf.org>
https://www.ietf.org/mailman/listinfo/teas-ns-dt
--
Teas-ns-dt mailing list
Teas-ns-dt@ietf.org<mailto:Teas-ns-dt@ietf.org>
https://www.ietf.org/mailman/listinfo/teas-ns-dt