Re: [OPSAWG] Comments on Service Assurance for Intent-Based Networking Architecture (e.g. draft-claise-opsawg-service-assurance-architecture)
Benoit Claise <bclaise@cisco.com> Tue, 04 August 2020 12:39 UTC
Return-Path: <bclaise@cisco.com>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D1F223A07CE; Tue, 4 Aug 2020 05:39:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -13.649
X-Spam-Level:
X-Spam-Status: No, score=-13.649 tagged_above=-999 required=5 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.949, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CP19IYCuKHvF; Tue, 4 Aug 2020 05:39:07 -0700 (PDT)
Received: from aer-iport-2.cisco.com (aer-iport-2.cisco.com [173.38.203.52]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 596463A0A9A; Tue, 4 Aug 2020 05:39:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=21092; q=dns/txt; s=iport; t=1596544746; x=1597754346; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to; bh=B8VD2UI+V9DuSX+fypfZr9sLPROjCqjaAxo7iqpQEgU=; b=jFReU6v3F94D4EsGjy+zousdLbV6w44r4yNCW1IB/VHBxmZrGplXuNAB OTUR8ciL+qrNuVWQOJU6rf1a8sfKuj3Vvz+5yDqSe9C8DaUQ4ZbUlYNhe xCtAhi8rkP6jhXnD+mMxA+2Ro1Hi/uNQr8ouPJlBsZfuiSlzeoDOdOjNM c=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0DxAAA1Vilf/xbLJq1WChsBAQEBAQEBAQUBAQESAQEBAwMBAQFAgUqBI1IGgXIBIBIsjTaIGZwOCwEBAQwBAS8EAQGETAKCJSU4EwIDAQELAQEFAQEBAgEGBG2FaIVxAQEBAwEtTAULCw4DAQMBASQLSQYIBg0GAgEBgyKCXSCxbHSBNIVSg0eBQIE4jSiBQT+BESeBaxI+Lj6EBwkEEYYOBI9hlUuPYoEFgmyZfwUHAx6CfI5WKI4BkiaUEYsUAgQLAhWBaiOBVzMaCBsVgyRQGQ2OKxcUbgEJjRo/AzA3AgYIAQEDCY0tgkYBAQ
X-IronPort-AV: E=Sophos; i="5.75,434,1589241600"; d="scan'208,217"; a="28472825"
Received: from aer-iport-nat.cisco.com (HELO aer-core-2.cisco.com) ([173.38.203.22]) by aer-iport-2.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 04 Aug 2020 12:39:01 +0000
Received: from [10.55.221.38] (ams-bclaise-nitro5.cisco.com [10.55.221.38]) (authenticated bits=0) by aer-core-2.cisco.com (8.15.2/8.15.2) with ESMTPSA id 074Cd0fi006885 (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 4 Aug 2020 12:39:01 GMT
To: Alexander Clemm <alex@futurewei.com>, "draft-claise-opsawg-service-assurance-architecture@ietf.org" <draft-claise-opsawg-service-assurance-architecture@ietf.org>
Cc: "opsawg@ietf.org" <opsawg@ietf.org>, "nmrg@irtf.org" <nmrg@irtf.org>
References: <BY5PR13MB37936AF1D0EEAA2C9C7749FEDB710@BY5PR13MB3793.namprd13.prod.outlook.com> <76ecb76b-efe8-499a-95ec-2602fa0248ef@cisco.com> <CH2PR13MB379932FF300A9903F83E8898DB4D0@CH2PR13MB3799.namprd13.prod.outlook.com>
From: Benoit Claise <bclaise@cisco.com>
Message-ID: <d5bca505-5b44-3ae2-ce3d-5371e27b6e93@cisco.com>
Date: Tue, 04 Aug 2020 14:39:00 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2
MIME-Version: 1.0
In-Reply-To: <CH2PR13MB379932FF300A9903F83E8898DB4D0@CH2PR13MB3799.namprd13.prod.outlook.com>
Content-Type: multipart/alternative; boundary="------------8F4228C62215296F393AC0CE"
Content-Language: en-US
X-Authenticated-User: bclaise
X-Outbound-SMTP-Client: 10.55.221.38, ams-bclaise-nitro5.cisco.com
X-Outbound-Node: aer-core-2.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/_wLEwNemG5ZQDSmPbe3Yx4VqHD4>
Subject: Re: [OPSAWG] Comments on Service Assurance for Intent-Based Networking Architecture (e.g. draft-claise-opsawg-service-assurance-architecture)
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Aug 2020 12:39:10 -0000
Thanks Alex, We'll make sure to introduce the required text in the next draft versions. Regards, Benoit > > Hi Benoit, > > thanks for the response. By and large we are on the same page and I > support this work. And as you know clearly I am of the school who > believes in exception-driven management and providing actionable > information, not raw data. > > Anyway, as mentioned there should perhaps be greater emphasis on the > value in maintaining a dependency graph in general, and explaining how > it can complement / aid operational tasks from troubleshooting to > impact analysis. It would be good to add some bits on how and where > to instrument this effectively (not necessarily all pushed onto device > agents; there will be also a role for controllers etc in this) I > remain sceptical regarding the specific use case of continuous > maintaining of a synthetically derived health score but am looking > forward to progression of this work further iterations of the drafts. > > --- Alex > > *From:* Benoit Claise <bclaise@cisco.com> > *Sent:* Friday, July 31, 2020 3:42 AM > *To:* Alexander Clemm <alex@futurewei.com>; > draft-claise-opsawg-service-assurance-architecture@ietf.org > *Cc:* opsawg@ietf.org; nmrg@irtf.org > *Subject:* Re: Comments on Service Assurance for Intent-Based > Networking Architecture (e.g. > draft-claise-opsawg-service-assurance-architecture) > > Hi Alex, > > Thanks for engaging. > > Hi Benoit, > > I have seen your presentations on Service Assurance for > Intent-Based Networking Architecture and read your drafts with > interest (draft-claise-opsawg-service-assurance-yang-05 and > draft-claise-opsawg-service-assurance-architecture-03). > Interesting stuff on which I do have a couple of comments. > > The basis for the drafts is in essence a proposal for Model-Based > Reasoning, in which you capture dependencies between objects and > make inferences by traversing the corresponding graph. MBR based > on dependency graphs allows to reason about the impact and > propagation of the status or health of one object on the status or > health of dependent objects “downstream” from it. Likewise, > traversing the same graph in the opposite direction (from the > “downstream” or dependent objects) allows to identify potential > root causes for symptoms observed by those objects, although this > seems to be not so much your focus. > > While MBR as a concept makes sense and has a long tradition in > network management, there are also a number of considerable issues > with it, and I was wondering about your perspective and mitigation > strategies for these. For one, their effectiveness depends on the > model being “complete”. In most cases, there are myriads of > interdependencies which are difficult to capture comprehensively. > The model is still useful for many applications as a starting > point, but rarely captures the full reality. As long as users are > clear about that, this is not an issue. > > Point taken about the myriads of interdependencies and graph completeness. > As you observe, even if the graph is not complete, this is useful. > Especially when we can assure (networking) components within the > assurance graph. > That way, the graph will tell us where the problem is not, which is > equally important as telling where the problem is/might be.... > assuming we have complete heuristics for that component assurance > obviously ... which implies that the heuristics need to improve along > the time. > > > > However, the one thing where I have a bit of concern in your model > is that you use it to draw conclusions about the health of the > dependent objects (for example, your end-to-end service). It > seems that a derived health score will be no substitute for > monitoring the actual health, and should not lull users into a > false sense of security that as long as they monitor components of > a system or service, that they don’t need to be concerned with > monitoring the system or service as a whole. In reality I believe > the value (although there still is a value) is more limited than > that. I believe that this should be clearly acknowledged and > discussed in the drafts. > > This is the exact reason why I wrote in the slides: "This complements > the end-to-end synthetic testing" > Indeed, the way service assurance is usually done is with end to end > probing: OWAMP/TWAMP/IP SLA with delay, packet loss, jitter > threshold-based, etc. . When the SLA degrades, the end to end probing > can't really tell which components in the network degrades (granted, > there are exceptions).The network is viewed as a black box. Combining > the inferred health score from the assurance graph with the end-to-end > probing provides the required correlation to have more of a network > crystal view > > Point very well taken, "This complements the end-to-end synthetic > testing" concept is not mentioned in the draft. I will add it. Thanks. > > > A second set of issues concerns the intensity of maintaining the > graph and of continuously updating the dependencies. In a > realistic system you will have many objects with even more > interdependencies. Maintaining derived health state can become > computationally very expensive, which suggests a number of > mitigation strategies: for one, don’t continuously maintain this > but compute this only “on demand”. > > Yes. That's one way > > Second, perhaps don’t maintain this on the server at all, at least > to the extent that you expect the server to be a networking > device. It seems much more feasible to perform these type of > Model-Based Reasoning computations in an Operations Support System > or application outside the network, not within the network. > However, it is not clear that YANG models and Netconf/Restconf > would be applied there. It seems to me the drafts should add > clarification on where those models would be expected to be > deployed and how/would keep them updated. As an OSS tool, your > proposal makes sense, but trying to process this on networking > devices strikes me as very heavy, in particular given the > limitations as per the earlier point. So, IMHO I think you may > want to consider adding an according section that discusses these > aspects in the draft, specifically the architecture draft. > > The architecture, with the YANG module, is actually designed to cover > distributed graphs. > We can stream all metrics (whether YANG leaf, MIB variable, CLI, > syslog, what have you) to an OSS, sure > However, I believe into data aggregation as we know that we're going > to quickly reach the streaming capabilities limitations. > And I also believe into each components being responsible for its > assurance, to the best of its knowledge. > Hence the proposal to go via a SAIN agent, inside or outside a router, > to send the inferred health score and symptoms to the OSS. > In the end, what do operational teams care about? > 1. knowing that an interface, a router, part of the network works > fine ... until they tell me otherwise > 2. collecting all the metrics in a big data lake to draw the same > or better conclusion > Ideally we need both, but we face two schools here. I'm more of in the > school of providing information, as opposed to the much data. This > would reduce the cost of managing networks. > > Regards, Benoit >
- [OPSAWG] Comments on Service Assurance for Intent… Alexander Clemm
- Re: [OPSAWG] Comments on Service Assurance for In… Benoit Claise
- Re: [OPSAWG] Comments on Service Assurance for In… Alexander Clemm
- Re: [OPSAWG] Comments on Service Assurance for In… Benoit Claise