Re: [Teas] Network Slicing design team definitions - isolation and resolution

Zhenghaomian <zhenghaomian@huawei.com> Thu, 30 April 2020 01:54 UTC

Return-Path: <zhenghaomian@huawei.com>
X-Original-To: teas@ietfa.amsl.com
Delivered-To: teas@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5C42E3A0CCB for <teas@ietfa.amsl.com>; Wed, 29 Apr 2020 18:54:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.72
X-Spam-Level:
X-Spam-Status: No, score=-2.72 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H2=-0.82, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hDJTqmeD_5Iv for <teas@ietfa.amsl.com>; Wed, 29 Apr 2020 18:54:07 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 749873A0CB9 for <teas@ietf.org>; Wed, 29 Apr 2020 18:54:07 -0700 (PDT)
Received: from lhreml703-chm.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id 1CFD1FC160F046FF83D1; Thu, 30 Apr 2020 02:54:06 +0100 (IST)
Received: from lhreml703-chm.china.huawei.com (10.201.108.52) by lhreml703-chm.china.huawei.com (10.201.108.52) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1913.5; Thu, 30 Apr 2020 02:54:05 +0100
Received: from DGGEML401-HUB.china.huawei.com (10.3.17.32) by lhreml703-chm.china.huawei.com (10.201.108.52) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P256) id 15.1.1913.5 via Frontend Transport; Thu, 30 Apr 2020 02:54:05 +0100
Received: from DGGEML531-MBS.china.huawei.com ([169.254.5.240]) by DGGEML401-HUB.china.huawei.com ([fe80::89ed:853e:30a9:2a79%31]) with mapi id 14.03.0487.000; Thu, 30 Apr 2020 09:54:02 +0800
From: Zhenghaomian <zhenghaomian@huawei.com>
To: Kiran Makhijani <kiranm@futurewei.com>, "Joel M. Halpern" <jmh@joelhalpern.com>, "teas@ietf.org" <teas@ietf.org>, "LUIS MIGUEL CONTRERAS MURILLO" <luismiguel.contrerasmurillo@telefonica.com>
Thread-Topic: [Teas] Network Slicing design team definitions - isolation and resolution
Thread-Index: AdYekjK31Pp41NdDQuOKHmSg8Ld2/w==
Date: Thu, 30 Apr 2020 01:54:02 +0000
Message-ID: <E0C26CAA2504C84093A49B2CAC3261A43F8377F6@dggeml531-mbs.china.huawei.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.24.176.178]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/teas/CAMrXayp7A7lNnFB-_Urnf5nkLs>
Subject: Re: [Teas] Network Slicing design team definitions - isolation and resolution
X-BeenThere: teas@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Traffic Engineering Architecture and Signaling working group discussion list <teas.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/teas>, <mailto:teas-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/teas/>
List-Post: <mailto:teas@ietf.org>
List-Help: <mailto:teas-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/teas>, <mailto:teas-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Apr 2020 01:54:12 -0000

Hi, Kiran, 

Thank you for the reply, and +1 thanks to Luis with the useful sharing. More feedback inline...

Best wishes,
Haomian

-----邮件原件-----
发件人: Kiran Makhijani [mailto:kiranm@futurewei.com] 
发送时间: 2020年4月28日 2:40
收件人: Zhenghaomian <zhenghaomian@huawei.com>om>; Joel M. Halpern <jmh@joelhalpern.com>om>; teas@ietf.org
主题: RE: [Teas] Network Slicing design team definitions - isolation and resolution

Haomian,
Just catching up with this thread. +1 with Luis's email. Some clarifications inline

-----Original Message-----
From: Teas <teas-bounces@ietf.org> On Behalf Of Zhenghaomian
Sent: Saturday, April 25, 2020 7:31 PM
To: Joel M. Halpern <jmh@joelhalpern.com>om>; Kiran Makhijani <kiranm@futurewei.com>om>; teas@ietf.org
Subject: Re: [Teas] Network Slicing design team definitions - isolation and resolution

Hi, Kiran, Joel, 

This is really interesting discussion, I thought I understood the SLO clearly but now I am re-confused... Regarding SLO we may have two different understandings:
1) The service level threshold. In this case when the objective goes beyond certain value, the delivery is failed. For example if the user is asking for an availability of '>99%' and the delivery is '98%', then the user may reject to pay for the service. And this is very much like an SLA, i.e., in a 'contract'. 
2) The service level objective function. In this case the user may demand the transport slice to be running under the policy 'some parameters is maximized/minimized'. For example, the user want the slice to be in 'minimal latency', then it's nothing to do whether the latency is 7ms or 9ms, it should be minimized. 
For both 1) and 2), of course a combination among multiple parameters should be allowed. 

[KM] I think 1) is spot on. But it is at business layer. Not something a transport slice concern with. 
For 2) I think of an industry control e.g. a robot can not take command too early because of maybe assembly line timing (let's ignore this for now).
But an objective can not be simply "maximized/minimized". How does a network deal with the 2 flows one that can tolerate 15 ms and other 5 ms?
[Haomian] Good to be option 1). You mentioned 'business layer' a few times in your reply, but I did not find it in the existing drafts. I understand the business layer as neither 'a part of TSC NBI about the TS request' nor 'a part of TSC SBI about the TS realization', but some agreement between the user and the provider as a rule (or contract?) for delivery which are non-technical. Please correct me if wrong. 
For option 2) I feel it more mathematical, as every demand can be described by a 'objective function'. Multiple SLOs or multiple flows can be represented with some weights in a single objective function. As we are not talking option 2) for SLO, let's discard this...

The discussion so far reminds me that none of the above is accurate. When introducing the 'ranging', for example 'latency between 7ms and 9ms', does it imply that 'a 5ms-latency slice is a bad slice'? It should not be...

[KM] then the SLO violation occurred. 
In our definitions we want to emphasize that (a) what are the requested SLOs (b) continuously monitor and make effort to meet them, through TSC and NC NBI, (c) if violations still happened after all just make it known to upper layer. The impact is a part of the business layer.
[Haomian] From your reply I understood as (a) SLO need to be in TSC NBI model for requesting; (b) publication/subscription would be needed through NBI to monitor the TS status; (c) report event if any. I think this is good scope for SLO application, i.e., to be focused on TSC NBI. 

The 'lifetime'/'lifecycle' statement is also confusing when you say 'A transport slice would ask: do not exceed 95gbps for the lifetime of the service'. I am curious on 'what will happen if 95gbps is exceeded', does it mean the slice should be cut down? When the quality of the slice is exceeding the expectation of user's request (or user's tolerance), what would be the outcome? 

[KM] With lifetime, I meant that slices share resources over the same infrastructure and can be withdrawn/deployed on demand. When the service is not active, there should not be any resource allocated to that slice associated with the service. For example, customer may ask a high bandwidth replication slice between midnight to 3:00 am. So the SLOs apply for that time.
[Haomian] Maybe terminology issue, usually what you described is called 'scheduled SLO' which is only valid in given time intervals. Lifecycle/Lifetime to me, is a description of the TS status in a certain time such as 'created/under maintenance/shut down'. The good news is that the term is not used in the draft so far.  

IMHO, when we specify the SLO, besides how to represented we also need to understand what is the criteria to judge whether the SLO is achieved or not, and what is going to happen if the SLO is not satisfies. For SLA these are clear but for SLO it's not. 

[KM] We (or I) do not consider "what is going to happen if the SLO is not satisfied" in the scope of transport slice work. Let the business layer handle this. But according to known SLOs, a transport slice should utilize monitoring and remedy tools to prevent SLO degradation. For example if a link is starting to get congested, either move slice traffic to a different path or offload other traffic from that link. Each network makes it own choice (SBI) on some instruction (NBI) from the transport controller.
[Haomian] I like your statement on consideration about SLO satisfaction, but not your example. "Getting congested" is a phenomenon, not a result measurable in the network. The congestion may affect the latency, but in this case the SLO should focus on latency but not congestion. Maybe you are aware of another thread discussing the necessity of 'isolation as a kind of SLO', I believe other people have similar concern about it. 

Thanks
-Kiran

BTW, regarding the isolation, I don't see the necessity to argue whether it should be in SLO or not. The isolation itself, can either be requested by the user of the transport slice (then from NBI of TSC) to express the demand of reliability, or be offered by the provider of the transport slice (then from the SBI of TSC) to achieve the SLO requested from the user. In other words, if the user requests certain level of isolation in an SLO, such isolation should be provided; if the user does not request certain level of isolation (no isolation request in SLO), then there may be some isolation provided to satisfy the user's request. 

Best wishes,
Haomian

-----邮件原件-----
发件人: Teas [mailto:teas-bounces@ietf.org] 代表 Joel M. Halpern
发送时间: 2020年4月25日 3:55
收件人: Kiran Makhijani <kiranm@futurewei.com>om>; teas@ietf.org
主题: Re: [Teas] Network Slicing design team definitions - isolation and resolution

Focusing first on your last comment, but retaining the rest for context.

As I understand a Transport slice, it is all about giving a well-defined traffic behavior to the set of traffic the user injects into the slice.

This has many dimensions in order for it to be deliverable.  It has to include the limits on what the user is allowed to inject.  It has to include how reliably / often / ... the given level will be met.
In common usage, the SLOs are the numbers used to fill these things in. 
Coupled with reliability / assurance levels, customer limits, and other things, to construct an SLA.  An SLO of 95 gigabits does not mean anything without the concomitant knowledge of how often it will be met. 
This gets even messier with things like availability, as you have to actually define what it means for the service to be available.  (E.g. an IP service that drops half the packets at random is probably not actually available.)

Saying 95 gigabits for 9500 flows (not sure that is even a meaningful parameter for packet service delivery) without a reliability is meaningless.  It can not be 100%.  Even the existing text notes that things break.  If it is 50%, it is not much of a commitment.  Which is why you tie the objectives to the other parameeters.  And thus need SLAs even if there is no money changing hands.

Your text on ranges also left me confused.
The typical case where one wants a delay range one actually wants a guarantee of a minimum delay, and a separate bound on delay variation. 
I do not know of a case where the customer asked the provider to slow down his traffic.  Nor do I know of a case where a customer said "do not give me more than the stated bandwidth."  the customer may want to control what rate he sends, but that is separate.

Hence, I am left very confused by your answers.

Yours,
Joel

PS: It is very well known, and frankly sensible, that operators when making these promises look at the cost of meeting the agreement, the operational complexity of delivering it often enough, and the price of failure.  And then choose different techniques to deliver the service based on those trade-offs.  Which, in practice, suggests that the orchestration likely needs visiblity to the other parameters such as how often it can fail.  (For example, you use FRR if you need to recover faster.  Based on your analysis of likely failures, non-FRR recovery times, ....)  We leave these kinds of things to the operators, as discussion of the business parameters themselves are clearly out of scope for the IETF.  But we have to recognize in our definitions that they are an intrinsic part of the task.

On 4/24/2020 3:40 PM, Kiran Makhijani wrote:
> Hi Joel,
> Please see inline.
> 
> -----Original Message-----
> From: Teas <teas-bounces@ietf.org> On Behalf Of Joel M. Halpern
> Sent: Thursday, April 23, 2020 5:56 PM
> To: teas@ietf.org
> Subject: [Teas] Network Slicing design team definitions - isolation 
> and resolution
> 
> I have serious problems with these aspects of the document.
> 
> Let's start with the first discussion of isolation.  This occurs in the description of the security objective.  I have no problem with including observable security properties in objectives.  telling the customer "your traffic will be encrypted across this slice" seems a useful thing to say.  We could even discuss how the security properties of the encryption should be described, although we need to be careful about how we do so.  But isolation of flows in the network is not an external observable, not something a customer can even ask about.  Fundamentally, isolation is a way for an operator to meet a set of agreements around a set of objectives.  And this document says repeatedly that it is not about how, only what.
> [KM] There's a particular challenge with this document. Our first goal is to define transport-network slices in an independent, standalone manner, at the same reuse or build on existing/progressing IETF work. It gets off-balanced easily.
> But yours's is a fair justification. Isolation can be removed from security. I wanted this explanation so that when we develop next level of details we have some idea on what should be considered.
> 
> This gets worse when we get to the section on resolution of guarantees.
> The text begins by talking about hard vs soft guarantees.  And then explains that it means the difference between effects from interfering traffic and effects from hardware (or, presumably, software) failures.
> Except that from the perspective of the user of the slice, that is irrelevant.  If I have been given a commitment that I can send 95 gigabits per second 95% of the month, and I discover that I was unable to do so for more than 5% of the time, I as a customer will expect to invoke whatever remedy my contract gives me.  If the contract had tried to draw a distinction bassd on cause, I would have said "no, I am paying for the service.  If you can't give it to me, I'll get it somewhere else."  The customer wants the commitment met,not excuses about which cause occurred.
> [KM] I do not see a transport slice as a "contract". The typical transport slice expresses the knowledge of how much of an SLO and what failures your service can endure. for the user of the transport slice (Who is offering a service) ability to describe this is not irrelevant. When network provider receives request, they know what kind of effort they should make to keep service running. (e.g. packet loss for AR/VR should be way lower than 4K media service).
> 
> The resolution then wanders over into the issue of tolerance.  As the section correctly observes, things break.  Agreements are generally written to allow a certain amount of that.  The classic 5-9s was exactly for this purpose.  It told you how well the service would be delivered.
> IP operators can and do make promises with commitments and frequencies.
> But it is not expressed as a tolerance the way this describes it in my experience.  Rather, it is expressed by having several objectives and differing levels of commitment for them.  As a purely fictitional example to try to be clear, one might have a series of loss comitments:
>      No more than 10% loss - 99.999% of the time
>      No more than 1% loss - 99.99% of the time
>      ...
> That does not match the description of "Resolution of guarantee" in this definition document.
> [KM] This may have been misunderstood. The text is "certain tolerance level of that objective", alternately can be read as 'range' for example: bounded latency (delivery time not to exceed 9 ms, or between 7-9ms, delay variation is in 0.5 to 1ms range and so on).
> 
> And then we get section 4.1.1 which is a discussion of isolation.  It is
> a discussion which assumes the previous two elements are correct.   My
> recommendation is, rather than debating whether it is correct, is to simply remove all of 4.1.1.  If some folks involved wanted it, which is what is said on the call, then those folks need to speak up and explain what value it adds.  Currently, it misleads and confuses the reader.
> 
> As a final note, I suspect that the usage of SLO and SLA in this document does not match industry usage.  Some effort to address the mismatch might help us avoid further disconnects such as the above.
> [KM] I have always seen SLA at the business level but transport slice is something that can still be described in a tangible resource-requests. Therefore, 95gbps per month is not a description of a transport slice. A transport slice would ask: do not exceed 95gbps for max 9500 flows where each flow gets 1gbps from pt 'a' to 'b' or 'c' to 'd' for the lifetime of the service.
> -Kiran
> 
> Yours,
> Joel M. Halpern
> 
> _______________________________________________
> Teas mailing list
> Teas@ietf.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> ietf.org%2Fmailman%2Flistinfo%2Fteas&amp;data=02%7C01%7Ckiranm%40futur
> ewei.com%7C3a14db2095eb4646145808d7e7ea6c9f%7C0fee8ff2a3b240189c753a1d
> 5591fedc%7C1%7C0%7C637232866346233496&amp;sdata=cFEYu%2BcEy3YEHTjFK7tq
> maVZghv%2F4VLFDaoX07OZRXg%3D&amp;reserved=0
> 
> _______________________________________________
> Teas mailing list
> Teas@ietf.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> ietf.org%2Fmailman%2Flistinfo%2Fteas&amp;data=02%7C01%7Ckiranm%40futur
> ewei.com%7Cc9496536676b401b4a3f08d7e98a0af9%7C0fee8ff2a3b240189c753a1d
> 5591fedc%7C1%7C0%7C637234651428679693&amp;sdata=i3VBHBaSTxrJc%2BftV0pa
> aExwWX9SrrRbjFNNk0BecEU%3D&amp;reserved=0
> 

_______________________________________________
Teas mailing list
Teas@ietf.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fteas&amp;data=02%7C01%7Ckiranm%40futurewei.com%7Cc9496536676b401b4a3f08d7e98a0af9%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637234651428679693&amp;sdata=i3VBHBaSTxrJc%2BftV0paaExwWX9SrrRbjFNNk0BecEU%3D&amp;reserved=0
_______________________________________________
Teas mailing list
Teas@ietf.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Fmailman%2Flistinfo%2Fteas&amp;data=02%7C01%7Ckiranm%40futurewei.com%7Cc9496536676b401b4a3f08d7e98a0af9%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637234651428679693&amp;sdata=i3VBHBaSTxrJc%2BftV0paaExwWX9SrrRbjFNNk0BecEU%3D&amp;reserved=0