Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08

Benoit Claise <benoit.claise@huawei.com> Thu, 22 September 2022 20:37 UTC

Return-Path: <benoit.claise@huawei.com>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EE4E1C1524A3 for <opsawg@ietfa.amsl.com>; Thu, 22 Sep 2022 13:37:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.208
X-Spam-Level:
X-Spam-Status: No, score=-4.208 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U6Ucp5ST7uDn for <opsawg@ietfa.amsl.com>; Thu, 22 Sep 2022 13:37:24 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49B03C14CF10 for <opsawg@ietf.org>; Thu, 22 Sep 2022 13:37:13 -0700 (PDT)
Received: from fraeml736-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4MYRqZ3Zz2z67LR8; Fri, 23 Sep 2022 04:35:22 +0800 (CST)
Received: from [10.195.246.227] (10.195.246.227) by fraeml736-chm.china.huawei.com (10.206.15.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 22:37:08 +0200
Content-Type: multipart/alternative; boundary="------------P0Dm55yiU0EjJW7MQYb60Re1"
Message-ID: <c9b79a22-810b-f458-43d5-8a1ebbea5bfa@huawei.com>
Date: Thu, 22 Sep 2022 22:37:03 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2
Content-Language: en-GB
To: Michael Richardson <mcr+ietf@sandelman.ca>, opsawg@ietf.org
References: <74351.1663022748@dooku> <1a2b8d3a-d353-2065-e2f5-1b8c7fe32560@huawei.com> <286119.1663848648@dooku>
From: Benoit Claise <benoit.claise@huawei.com>
In-Reply-To: <286119.1663848648@dooku>
X-Originating-IP: [10.195.246.227]
X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To fraeml736-chm.china.huawei.com (10.206.15.217)
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/IqAuY2WWPwDc5W9K5LgZHCZfSfw>
Subject: Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Sep 2022 20:37:26 -0000

Hi Michael,

See inline.

On 9/22/2022 2:10 PM, Michael Richardson wrote:
> Benoit Claise<benoit.claise@huawei.com>  wrote:
>      > Thanks for your review.
>      > And sorry for the delay: I was not too sure how to react to this
>      > review. Another review after WGLC, to be integrated in IETF LC?
>      > Document
>
> meh, sorry.
>
>      > On 9/13/2022 12:45 AM, Michael Richardson wrote:
>      >> I have read draft-ietf-opsawg-service-assurance-architecture at the request
>      >> of a few people.  This is not part of any directorate review (that I
>      >> remember, or that shows up in my review list).  If it's useful for me to plug
>      >> this in somewhere, let me know.
>      >>
>      >> I find the document well written, and to me rather ambitious.
>      >> That might be because my level of understanding of modern network management
>      >> is poor.
>      >>
>      >> I found section 3.1.1. Circular Dependencies to be interesting, and I think
>      >> telling.   As soon as I saw "DAG" in the previous section, I was all, "yeah, but..."
>      >> I'm not convinced that the process described in 3.1.1 is something that a
>      >> computer program can do, versus that it (the service and the components that
>      >> build the service) has to designed to be cycle from from the beginning.
>      >> It seems to me that this document either has to constrain what services can
>      >> be built by deciding upon a canonical way to describe many things, or that
>      >> different vendors will create interoperable models only by chance.
>
>      > Typically, it's only when assurance graphs are combined that we might have
>      > circular dependencies. So in practice, we don't believe we are going to see
>      > many instances of those.
>
> okay, that's reasonable.  It seems like a lot of text to deal with a problem
> that won't occur very often.
I don't disagree but that specific point was provided as feedback.
>
>      >> overlooked later on.  The broken thing never gets repaired, and then
>      >> some other fault or maintenance causes an actual failure.
>
>      > Actually, it depends on the intent.
>      > If the intent is to get have a backup link all the time, then yes, the
>      > service continue to operate with a lower score.
>
> got it.
>
>      >> b) components are marked for maintenance, which have service impacting
>      >> effects, but during which, other components fail.  To make analogy,
>      >> you don't care so much if your car steering system does not operate
>      >> while the starter motor is not operational.  But, as soon as you fix the
>      >> starter motor (taking hours to day), you find that you still can not
>      >> go.   You could have fixed both systems in parallel/currently, if only
>      >> you'd known.
>
>      > There are two cases here.
>      > 1. you knew (from the assurance graph) that car steering system did not
>      > operate when going for maintenance for the starter motor.
>      >     In such a case, you could be solving both in parallel during maintenance
>
>      > 2. you don't know, and you will learn about the broken down car steering
>      > system when back from the starter motor maintenance
>      >     ... at the time of recomputing the assurance graph and looking at the
>      > health of each subservice
>
> Yes... so I guess I wonder how to always be in case 1.
>
>      >> (c) is in many ways that the DAG *itself* might need to be updated.
>      >> How do you transition from one dependancy DAG to another dependancy DAG?
>      >> I guess that section 3.9 gets into this, but it seems rather weak.
>
>      > Proposal:
>      > 1. we need to add the concept that service depending on the under-maintenance
>      > subservices will receive the "under maintenance" symptom and has to take into
>      > account in his health computation. How? We don't want to in the specific of
>      > health aggregation in this specification.
>
> okay.  Where would that occur?
In the SAIN collector (see figure 1), whose scope is not covered by this 
spec.
> Or is it really vendor dependent?
>
>      > 2. add some text that the DAG might have to recomputed after a subservice
>      > coming out of maintenance.
>
> Doesn't that go without saying?
>
>      >> 3.8. Timing
>      >> Starts talking about NTP, and synchronization.
>      >> Then goes into garbage collection, and I think that maybe this transition in
>      >> the text could be better presented.
>
>      > You are right.
>      > We propose to move the following text (which is not consequent enough to
>      > deserve its own section) just before 3.1
>
>      > The SAIN architecture requires time synchronization, with Network
>      > Time Protocol (NTP) [RFC5905<https://datatracker.ietf.org/doc/html/rfc5905>] as a candidate, between all elements:
>      > monitored entities, SAIN agents, Service orchestrator, the SAIN
>      > collector, as well as the SAIN orchestrator.  This guarantees the
>      > correlations of all symptoms in the system, correlated with the right
>      > assurance graph version.
>
> good.
>
>      > And rename section 3.8 "Timing" to "Garbage Collection"
>      >>
>      >>
>      >> I feel that this SAIN architecture is quite ambitious, and I'm not sure that
>      >> there is enough here to actually create interoperable implementations.
>
>      > My group created a prototype. I know of another one.
>      > And there is an opensource implementation (presented by Prof Benoit Donnet in
>      > the past).
>      > The interop part will be with linking YANG modules, which we addressed with
>      > the circular dependencies.
>
> Cool.... i suggest an implementation experience section for the IESG review.
If you speak about RFC 7942, it mentions:

    We recommend that the Implementation Status section should be removed
    from Internet-Drafts before they are published as RFCs.


So isn't sufficient to have this information in the write-up.
You can write down: "Huawei has a prototype implementation of this 
architecture and specifically of the YANG module"

Regards, Benoit
> But, are these implementations involving multi-vendor systems under management?
>
> --
> Michael Richardson<mcr+IETF@sandelman.ca>, Sandelman Software Works
>   -= IPv6 IoT consulting =-
>
>
>