[OPSAWG] Service Assurance for Intent-based Networking Architecture

stefan vallin <stefan@wallan.se> Wed, 20 November 2019 10:38 UTC

Return-Path: <stefan@wallan.se>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 65F6E120047 for <opsawg@ietfa.amsl.com>; Wed, 20 Nov 2019 02:38:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wallan-se.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Sd3PgjqafuUF for <opsawg@ietfa.amsl.com>; Wed, 20 Nov 2019 02:38:28 -0800 (PST)
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 078FE1208EE for <opsawg@ietf.org>; Wed, 20 Nov 2019 02:38:28 -0800 (PST)
Received: by mail-lf1-x129.google.com with SMTP id l28so10271113lfj.1 for <opsawg@ietf.org>; Wed, 20 Nov 2019 02:38:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wallan-se.20150623.gappssmtp.com; s=20150623; h=from:content-transfer-encoding:mime-version:subject:date:references :to:in-reply-to:message-id; bh=2Id/90qsljb6uscexgzEWt/qRZwDT9O4IdmeKoBzT6Q=; b=2NwS8gl2W8HFVnTV71PRxcc+ZKFdT9bOPti4cgkakUk0UPg5Z4driXqq6p/erxCej1 AZpfH+VlEDfQeQ0NRW/C9WtkGZNkGf+87sCGgrCNjc4JEI3bwqqakQ6UmL7iOrc9bnAK 9BRZlmtdK0JAZ1niA+rztN2pWxwSy0tQDfC8Q4SYfPAqHac3JFyo0fhMXBu8F0x0pu6R VVpDvZYnpK2dULNCZL1pnYfYAFCwRRaQqKHFj8mElOFi8gpmsyT8ddCV85mAH4fDY9zY dYwRrM9BiC4o7zd7I0zf9dgUlG6xKOzXhe+f/DJGNv1rwME9QuYIBe5srq3G5tK2dnVI eTaQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:date:references:to:in-reply-to:message-id; bh=2Id/90qsljb6uscexgzEWt/qRZwDT9O4IdmeKoBzT6Q=; b=HVVG4BQDoaHcun6qSvQBpub5YepT3mjeNmHCC9OKK2EsPGcK9iZn+XBYt1PRc5BKVM bfcYNGpUJhLA3k3FBcOgfYaET9PgzdKl8Qqw7A56HZZVreWpUtkZdVpj1QeMj9gcOQoC Odvz+GYoe/8vi1DLKMfpj7wsQW2Feja+ikMDDqCU1K9KYhbSKhc9FFFzFE82x3TMXvtN 3fIMkMsgVVb2vRX2HdyHs84bkdhhpYHiI15psPmGmc3jH8olBGJ+SXm0ajAYbAJ8h3wq koX1JxM/dChUJEx/HVtP6euXYO1PzJd5QId8bcw1weHl1CUHBzB0C891BG2teGnOdkpG 7MBQ==
X-Gm-Message-State: APjAAAVlXEcGcPxO0OX/X1rK3q1iAdn5teV9cHpVBxToNrDW5Jmg8RYM bOdpbQ/ul4LkIAySkIjPPyOT4OhxkhQ=
X-Google-Smtp-Source: APXvYqy9BMxYT5Y7/4ShCXskRq3SmQAwSXJDEYl27u4ROygozddfMb3wsLvD7jgtAV5RMRvrbZDpKw==
X-Received: by 2002:a05:6512:71:: with SMTP id i17mr2188387lfo.113.1574246306061; Wed, 20 Nov 2019 02:38:26 -0800 (PST)
Received: from [172.28.52.43] (c80-72-4-125.g30.se. [80.72.4.125]) by smtp.gmail.com with ESMTPSA id k9sm12687970lfj.97.2019.11.20.02.38.24 for <opsawg@ietf.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Nov 2019 02:38:25 -0800 (PST)
From: stefan vallin <stefan@wallan.se>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Wed, 20 Nov 2019 11:37:17 +0100
References: <5D45EBA8-4DD3-420C-A91A-AA804D6216B4@charter.com> <BYAPR11MB258458B4AE8882195FE1845BDA4F0@BYAPR11MB2584.namprd11.prod.outlook.com>
To: "opsawg@ietf.org" <opsawg@ietf.org>
In-Reply-To: <BYAPR11MB258458B4AE8882195FE1845BDA4F0@BYAPR11MB2584.namprd11.prod.outlook.com>
Message-Id: <034B974F-C977-46F6-BFC8-F2EC2B369CAB@wallan.se>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/XM1LJ1R3uTtCg-HKq2RvsQnv1Ew>
Subject: [OPSAWG] Service Assurance for Intent-based Networking Architecture
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2019 10:38:30 -0000

Hi Benoit!
Thanks for bringing the issue of service assurance to the table. More work is needed on this topic.

Some high-level comments on the drafts

The drafts present a YANG module for performing reasoning across a “generic” service tree.
This has existed in assurance systems for a long time: inventory based systems as well as fault managers had modules for this, like Micromuse Impact, OpenView Service Navigator etc. 
The overall idea is to feed KPIs, events, alarms etc into the tree and reasoning upwards to do service impact analysis and downwards to do root-cause analysis.

I have some concerns based on the above:
1) The draft should be renamed. Claiming a service tree being *the* architecture for intent-based network assurance is maybe too ambitious. There are so many other things needed for service assurance in intent-based networking:
- how to represent service tests and service SLA monitoring as part of the intent?
- how to monitor the data-plane as such
- how to represent closed-loop policies
- and more
So I think the draft should be renamed to the relevant, more limited scope: A "YANG data-model for representing generic service trees”, or something

2) Looking back at practical experience with above  mentioned tools, it was very hard to get the approach to work (or someone else out there is smarter?). 
The dependencies between services/subservices are subtle. Either the dependencies get too coarse-grained so that everything gets red, or too fine-grained to have too few hits. Who are to express this knowledge in a large multi-vendor network? The classical service impact tools had fairly advanced algorithms attached to the graph to try to capture this, not just a dependency link. It more or less is as complex as the knowledge acquisition problem for classical rule-based AI, how many dependencies do we need to express until we can do a good job in the service tree? 

Since it is configuration data in the model, I guess it is assumed that the orchestrator should set up all of this. But in many cases this is "hidden" in other domains, like networking protocols and vendor specific details. Also with networks becoming more dynamic and virtual, it is hard to see how the service structures can be statically maintained as *configuration*.

Another underlying challenge is that most of the network problems are configuration related, and not detected by device instrumentation.
Non-optimal QoS config, firewall rules etc, these are not detected by the firewall or router, no alarms.

Hard for you to comment on this, I know, it is just a reality check.

3) Relationship to other YANG service models. 
In your approach the service tree is a separate tree from the concrete service trees like L3VPN service model.
Have you considered an approach to augment these concrete models with generic assurance state and dependency information instead of maintaining a separate tree? Maintaining parallell trees might result in inconsistencies at the end

4) Relationship to the Alarm YANG RFC8632
There are several opportunities to reuse definitions and concepts from RFC8632
- You could add alarms in your module according to the service tree, see especially Section 3.6.  Root Cause, Impacted Resources
- You could use alarm-types as one kind of symptom (there are many others like active measurements with TWAMP etc)

Hope this helps to flesh out more details in your work
br Stefan