Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08

Michael Richardson <mcr+ietf@sandelman.ca> Thu, 22 September 2022 12:10 UTC

Return-Path: <mcr@sandelman.ca>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 026B4C14CF1C for <opsawg@ietfa.amsl.com>; Thu, 22 Sep 2022 05:10:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.907
X-Spam-Level:
X-Spam-Status: No, score=-6.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S3n-pv3V4wn1 for <opsawg@ietfa.amsl.com>; Thu, 22 Sep 2022 05:10:51 -0700 (PDT)
Received: from relay.sandelman.ca (relay.cooperix.net [IPv6:2a01:7e00:e000:2bb::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2BC2C14F719 for <opsawg@ietf.org>; Thu, 22 Sep 2022 05:10:51 -0700 (PDT)
Received: from dooku.sandelman.ca (sulu.imp.fu-berlin.de [160.45.114.22]) by relay.sandelman.ca (Postfix) with ESMTPS id 394B71F4AF; Thu, 22 Sep 2022 12:10:49 +0000 (UTC)
Received: by dooku.sandelman.ca (Postfix, from userid 179) id 92E571A01CD; Thu, 22 Sep 2022 14:10:48 +0200 (CEST)
From: Michael Richardson <mcr+ietf@sandelman.ca>
To: Benoit Claise <benoit.claise@huawei.com>, opsawg@ietf.org
In-reply-to: <1a2b8d3a-d353-2065-e2f5-1b8c7fe32560@huawei.com>
References: <74351.1663022748@dooku> <1a2b8d3a-d353-2065-e2f5-1b8c7fe32560@huawei.com>
Comments: In-reply-to Benoit Claise <benoit.claise@huawei.com> message dated "Tue, 20 Sep 2022 14:41:42 +0200."
X-Mailer: MH-E 8.6+git; nmh 1.7.1; GNU Emacs 27.1
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Date: Thu, 22 Sep 2022 14:10:48 +0200
Message-ID: <286119.1663848648@dooku>
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/3fC1gCU_uDUfTlpezR0ElQI7nFc>
Subject: Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Sep 2022 12:10:55 -0000

Benoit Claise <benoit.claise@huawei.com> wrote:
    > Thanks for your review.
    > And sorry for the delay: I was not too sure how to react to this
    > review. Another review after WGLC, to be integrated in IETF LC?
    > Document

meh, sorry.

    > On 9/13/2022 12:45 AM, Michael Richardson wrote:
    >> I have read draft-ietf-opsawg-service-assurance-architecture at the request
    >> of a few people.  This is not part of any directorate review (that I
    >> remember, or that shows up in my review list).  If it's useful for me to plug
    >> this in somewhere, let me know.
    >>
    >> I find the document well written, and to me rather ambitious.
    >> That might be because my level of understanding of modern network management
    >> is poor.
    >>
    >> I found section 3.1.1. Circular Dependencies to be interesting, and I think
    >> telling.   As soon as I saw "DAG" in the previous section, I was all, "yeah, but..."
    >> I'm not convinced that the process described in 3.1.1 is something that a
    >> computer program can do, versus that it (the service and the components that
    >> build the service) has to designed to be cycle from from the beginning.
    >> It seems to me that this document either has to constrain what services can
    >> be built by deciding upon a canonical way to describe many things, or that
    >> different vendors will create interoperable models only by chance.

    > Typically, it's only when assurance graphs are combined that we might have
    > circular dependencies. So in practice, we don't believe we are going to see
    > many instances of those.

okay, that's reasonable.  It seems like a lot of text to deal with a problem
that won't occur very often.

    >> overlooked later on.  The broken thing never gets repaired, and then
    >> some other fault or maintenance causes an actual failure.

    > Actually, it depends on the intent.
    > If the intent is to get have a backup link all the time, then yes, the
    > service continue to operate with a lower score.

got it.

    >> b) components are marked for maintenance, which have service impacting
    >> effects, but during which, other components fail.  To make analogy,
    >> you don't care so much if your car steering system does not operate
    >> while the starter motor is not operational.  But, as soon as you fix the
    >> starter motor (taking hours to day), you find that you still can not
    >> go.   You could have fixed both systems in parallel/currently, if only
    >> you'd known.

    > There are two cases here.
    > 1. you knew (from the assurance graph) that car steering system did not
    > operate when going for maintenance for the starter motor.
    >     In such a case, you could be solving both in parallel during maintenance

    > 2. you don't know, and you will learn about the broken down car steering
    > system when back from the starter motor maintenance
    >     ... at the time of recomputing the assurance graph and looking at the
    > health of each subservice

Yes... so I guess I wonder how to always be in case 1.

    >> (c) is in many ways that the DAG *itself* might need to be updated.
    >> How do you transition from one dependancy DAG to another dependancy DAG?
    >> I guess that section 3.9 gets into this, but it seems rather weak.

    > Proposal:
    > 1. we need to add the concept that service depending on the under-maintenance
    > subservices will receive the "under maintenance" symptom and has to take into
    > account in his health computation. How? We don't want to in the specific of
    > health aggregation in this specification.

okay.  Where would that occur?  Or is it really vendor dependent?

    > 2. add some text that the DAG might have to recomputed after a subservice
    > coming out of maintenance.

Doesn't that go without saying?

    >> 3.8. Timing
    >> Starts talking about NTP, and synchronization.
    >> Then goes into garbage collection, and I think that maybe this transition in
    >> the text could be better presented.

    > You are right.
    > We propose to move the following text (which is not consequent enough to
    > deserve its own section) just before 3.1

    > The SAIN architecture requires time synchronization, with Network
    > Time Protocol (NTP) [RFC5905  <https://datatracker.ietf.org/doc/html/rfc5905>] as a candidate, between all elements:
    > monitored entities, SAIN agents, Service orchestrator, the SAIN
    > collector, as well as the SAIN orchestrator.  This guarantees the
    > correlations of all symptoms in the system, correlated with the right
    > assurance graph version.

good.

    > And rename section 3.8 "Timing" to "Garbage Collection"
    >>
    >>
    >> I feel that this SAIN architecture is quite ambitious, and I'm not sure that
    >> there is enough here to actually create interoperable implementations.

    > My group created a prototype. I know of another one.
    > And there is an opensource implementation (presented by Prof Benoit Donnet in
    > the past).
    > The interop part will be with linking YANG modules, which we addressed with
    > the circular dependencies.

Cool.... i suggest an implementation experience section for the IESG review.
But, are these implementations involving multi-vendor systems under management?

--
Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
 -= IPv6 IoT consulting =-