Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08

Benoit Claise <benoit.claise@huawei.com> Thu, 22 September 2022 20:37 UTC

Content-Type: multipart/alternative; boundary="------------P0Dm55yiU0EjJW7MQYb60Re1"
Message-ID: <c9b79a22-810b-f458-43d5-8a1ebbea5bfa@huawei.com>
Date: Thu, 22 Sep 2022 22:37:03 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2
Content-Language: en-GB
To: Michael Richardson <mcr+ietf@sandelman.ca>, opsawg@ietf.org
References: <74351.1663022748@dooku> <1a2b8d3a-d353-2065-e2f5-1b8c7fe32560@huawei.com> <286119.1663848648@dooku>
From: Benoit Claise <benoit.claise@huawei.com>
In-Reply-To: <286119.1663848648@dooku>
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/IqAuY2WWPwDc5W9K5LgZHCZfSfw>
Subject: Re: [OPSAWG] review of draft-ietf-opsawg-service-assurance-architecture-08
Precedence: list

Hi Michael,

See inline.

On 9/22/2022 2:10 PM, Michael Richardson wrote:
> Benoit Claise<benoit.claise@huawei.com>  wrote:
>      > Thanks for your review.
>      > And sorry for the delay: I was not too sure how to react to this
>      > review. Another review after WGLC, to be integrated in IETF LC?
>      > Document
>
> meh, sorry.
>
>      > On 9/13/2022 12:45 AM, Michael Richardson wrote:
>      >> I have read draft-ietf-opsawg-service-assurance-architecture at the request
>      >> of a few people.  This is not part of any directorate review (that I
>      >> remember, or that shows up in my review list).  If it's useful for me to plug
>      >> this in somewhere, let me know.
>      >>
>      >> I find the document well written, and to me rather ambitious.
>      >> That might be because my level of understanding of modern network management
>      >> is poor.
>      >>
>      >> I found section 3.1.1. Circular Dependencies to be interesting, and I think
>      >> telling.   As soon as I saw "DAG" in the previous section, I was all, "yeah, but..."
>      >> I'm not convinced that the process described in 3.1.1 is something that a
>      >> computer program can do, versus that it (the service and the components that
>      >> build the service) has to designed to be cycle from from the beginning.
>      >> It seems to me that this document either has to constrain what services can
>      >> be built by deciding upon a canonical way to describe many things, or that
>      >> different vendors will create interoperable models only by chance.
>
>      > Typically, it's only when assurance graphs are combined that we might have
>      > circular dependencies. So in practice, we don't believe we are going to see
>      > many instances of those.
>
> okay, that's reasonable.  It seems like a lot of text to deal with a problem
> that won't occur very often.
I don't disagree but that specific point was provided as feedback.
>
>      >> overlooked later on.  The broken thing never gets repaired, and then
>      >> some other fault or maintenance causes an actual failure.
>
>      > Actually, it depends on the intent.
>      > If the intent is to get have a backup link all the time, then yes, the
>      > service continue to operate with a lower score.
>
> got it.
>
>      >> b) components are marked for maintenance, which have service impacting
>      >> effects, but during which, other components fail.  To make analogy,
>      >> you don't care so much if your car steering system does not operate
>      >> while the starter motor is not operational.  But, as soon as you fix the
>      >> starter motor (taking hours to day), you find that you still can not
>      >> go.   You could have fixed both systems in parallel/currently, if only
>      >> you'd known.
>
>      > There are two cases here.
>      > 1. you knew (from the assurance graph) that car steering system did not
>      > operate when going for maintenance for the starter motor.
>      >     In such a case, you could be solving both in parallel during maintenance
>
>      > 2. you don't know, and you will learn about the broken down car steering
>      > system when back from the starter motor maintenance
>      >     ... at the time of recomputing the assurance graph and looking at the
>      > health of each subservice
>
> Yes... so I guess I wonder how to always be in case 1.
>
>      >> (c) is in many ways that the DAG *itself* might need to be updated.
>      >> How do you transition from one dependancy DAG to another dependancy DAG?
>      >> I guess that section 3.9 gets into this, but it seems rather weak.
>
>      > Proposal:
>      > 1. we need to add the concept that service depending on the under-maintenance
>      > subservices will receive the "under maintenance" symptom and has to take into
>      > account in his health computation. How? We don't want to in the specific of
>      > health aggregation in this specification.
>
> okay.  Where would that occur?
In the SAIN collector (see figure 1), whose scope is not covered by this 
spec.
> Or is it really vendor dependent?
>
>      > 2. add some text that the DAG might have to recomputed after a subservice
>      > coming out of maintenance.
>
> Doesn't that go without saying?
>
>      >> 3.8. Timing
>      >> Starts talking about NTP, and synchronization.
>      >> Then goes into garbage collection, and I think that maybe this transition in
>      >> the text could be better presented.
>
>      > You are right.
>      > We propose to move the following text (which is not consequent enough to
>      > deserve its own section) just before 3.1
>
>      > The SAIN architecture requires time synchronization, with Network
>      > Time Protocol (NTP) [RFC5905<https://datatracker.ietf.org/doc/html/rfc5905>] as a candidate, between all elements:
>      > monitored entities, SAIN agents, Service orchestrator, the SAIN
>      > collector, as well as the SAIN orchestrator.  This guarantees the
>      > correlations of all symptoms in the system, correlated with the right
>      > assurance graph version.
>
> good.
>
>      > And rename section 3.8 "Timing" to "Garbage Collection"
>      >>
>      >>
>      >> I feel that this SAIN architecture is quite ambitious, and I'm not sure that
>      >> there is enough here to actually create interoperable implementations.
>
>      > My group created a prototype. I know of another one.
>      > And there is an opensource implementation (presented by Prof Benoit Donnet in
>      > the past).
>      > The interop part will be with linking YANG modules, which we addressed with
>      > the circular dependencies.
>
> Cool.... i suggest an implementation experience section for the IESG review.
If you speak about RFC 7942, it mentions:

    We recommend that the Implementation Status section should be removed
    from Internet-Drafts before they are published as RFCs.


So isn't sufficient to have this information in the write-up.
You can write down: "Huawei has a prototype implementation of this 
architecture and specifically of the YANG module"

Regards, Benoit
> But, are these implementations involving multi-vendor systems under management?
>
> --
> Michael Richardson<mcr+IETF@sandelman.ca>, Sandelman Software Works
>   -= IPv6 IoT consulting =-
>
>
>

[OPSAWG] review of draft-ietf-opsawg-service-assu… Michael Richardson
Re: [OPSAWG] review of draft-ietf-opsawg-service-… Michael Richardson
Re: [OPSAWG] review of draft-ietf-opsawg-service-… Benoit Claise
Re: [OPSAWG] review of draft-ietf-opsawg-service-… Benoit Claise
Re: [OPSAWG] review of draft-ietf-opsawg-service-… Michael Richardson
Re: [OPSAWG] review of draft-ietf-opsawg-service-… Benoit Claise
Re: [OPSAWG] review of draft-ietf-opsawg-service-… Michael Richardson