Re: [CCAMP] I-D Action: draft-ietf-ccamp-alarm-module-01 / purging alarms

NICK HANCOCK <nick.hancock@adtran.com> Thu, 01 November 2018 15:55 UTC

Return-Path: <nick.hancock@adtran.com>
X-Original-To: ccamp@ietfa.amsl.com
Delivered-To: ccamp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B47D1286E3 for <ccamp@ietfa.amsl.com>; Thu, 1 Nov 2018 08:55:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xEE883KeI5Sb for <ccamp@ietfa.amsl.com>; Thu, 1 Nov 2018 08:55:41 -0700 (PDT)
Received: from us-smtp-delivery-128.mimecast.com (us-smtp-delivery-128.mimecast.com [63.128.21.128]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 44CAF124BAA for <ccamp@ietf.org>; Thu, 1 Nov 2018 08:55:41 -0700 (PDT)
Received: from ex-hc2.corp.adtran.com (ex-hc3.adtran.com [76.164.174.83]) (Using TLS) by dkim.mimecast.com with ESMTP id us-mta-164-t-KU3YjqPjiuRq6hVcFVgA-1; Thu, 01 Nov 2018 11:55:36 -0400
Received: from ex-mb3.corp.adtran.com ([fe80::60aa:f95:ad49:a0f1]) by ex-hc2.corp.adtran.com ([fe80::a019:449b:3f62:28e5%10]) with mapi id 14.03.0382.000; Thu, 1 Nov 2018 10:55:35 -0500
From: NICK HANCOCK <nick.hancock@adtran.com>
To: stefan vallin <stefan@wallan.se>
CC: "CCAMP (ccamp@ietf.org)" <ccamp@ietf.org>
Thread-Topic: [CCAMP] I-D Action: draft-ietf-ccamp-alarm-module-01 / purging alarms
Thread-Index: AdPa5fIkPyZzyJEqSzah/pG8RBI0vQHZf5mAI+UcRfA=
Date: Thu, 1 Nov 2018 15:55:35 +0000
Message-ID: <BD6D193629F47C479266C0985F16AAC7011EA322C0@ex-mb3.corp.adtran.com>
References: <BD6D193629F47C479266C0985F16AAC7F06E09B2@ex-mb1.corp.adtran.com> <CBBAA9B4-8F31-479B-9616-F67FB6AB6D85@wallan.se>
In-Reply-To: <CBBAA9B4-8F31-479B-9616-F67FB6AB6D85@wallan.se>
Accept-Language: en-US, en-GB
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0FEVFJBTiIsImlkIjoiMmU0YTM5YjQtZjExNS00ZTYzLTg5NWItNWExNTUzZjUzMjc5IiwicHJvcHMiOlt7Im4iOiJDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiR0IifV19LHsibiI6IlF1ZXN0aW9uMSIsInZhbHMiOltdfSx7Im4iOiJRdWVzdGlvbjIiLCJ2YWxzIjpbXX0seyJuIjoiUXVlc3Rpb24zIiwidmFscyI6W119XX0sIlN1YmplY3RMYWJlbHMiOltdLCJUTUNWZXJzaW9uIjoiMTcuMi4xMS4wIiwiVHJ1c3RlZExhYmVsSGFzaCI6IlMzaDU1QlNvS2c0NnZYRnBhakFUc2dONXd1S3RQa3lKYVZYNWdsR1wvSFpPTGc1aCtYQm53YVZCMFowbDhzdEFxIn0=
x-originating-ip: [172.20.62.160]
MIME-Version: 1.0
X-MC-Unique: t-KU3YjqPjiuRq6hVcFVgA-1
Content-Type: multipart/alternative; boundary="_000_BD6D193629F47C479266C0985F16AAC7011EA322C0exmb3corpadtr_"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/n1x6-YymbKR5AVR0C4hLtuGvx88>
Subject: Re: [CCAMP] I-D Action: draft-ietf-ccamp-alarm-module-01 / purging alarms
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Nov 2018 15:55:48 -0000

Hi Stefan,

I would like to briefly come back to a topic that we discussed by in May: purging alarms.

Specifically to the subsequent behaviour of the system when a client purges active alarms from the alarm list (or shelved alarm list).
You clarified this situation with your reply
“If the instrumentation state of the alarm changes at a later point a new alarm entry, (same keys of course), would appear.”

However, I do not see any reference to this expected behaviour when describing purging alarms in either the Internet draft or the YANG module itself.
I believe it is important to describe the expected behaviour explicitly for interoperability reasons, otherwise vendors may implement differing behaviours in their systems with this regard.

What is your opinion?

Best regards
Nick

From: stefan vallin [mailto:stefan@wallan.se]
Sent: Wednesday, May 02, 2018 4:30 PM
To: NICK HANCOCK
Cc: CCAMP (ccamp@ietf.org)
Subject: Re: [CCAMP] I-D Action: draft-ietf-ccamp-alarm-module-01 / purging alarms

Hi Nick!
Thanks for your comments, really important feedback.
See inline:

I am trying to understand the expected behavior of the ‘purge-alarms’ RPC, specifically when the client requests alarms that are active to be purged from the alarm list.

If you delete an alarm that is active for a resource, does the alarm immediately reappear for that resource after the RPC has executed, assuming that the undesirable state in that resource still exists and the instrumentation still detects it? This would be my expectation.

I would expect not, the client should know what it is doing, this is an administrative action and it is a matter of defining the right filters.
Really annoying if I deliberately delete an alarm and it pops up.
If the instrumentation state of the alarm changes at a later point a new alarm entry, (same keys of course), would appear.
Furthermore, as you mention, alarms without a clear would not work in your proposal.

On the other hand your question made me think of another mechanism which would address your issue and also the case of "mid level managers" in SNMP lingo.
Say you have an NMS with an alarm list for a large set of devices. You might want to be able to “synchronise” the alarm list in the NMS versus the devices. Or just the instrumentation in the same device for that matter. We could think of having an action/RPC “synchronise-alarms”. In your case it would update the alarm list versus the instrumentation state, in the NMS case it would reach out to the devices and make sure the alarm list represents what is out there.

Opinions from the Nick and the list?




The description statement of ‘purge-alarms’ explicitly says that the RPC can typically be used to delete alarms that are in a closed operator state. In the Internet-draft you also write “Closed alarms are good candidates for being deleted.”
But a closed alarm may still be active from the resource life-cycle point of view and thus should IMHO immediately reappear, i.e., a new entry in the list be created. However, if this were the case the list ‘operator-state-change’ would be initially empty, with the effect that a previously alarm closed by the operator would no longer be in the operator state closed. This may, however, not necessarily be undesirable.

In the case of an active alarm that also lacks a corresponding clear, i.e., has-clear = 'false', if the instrumentation is no longer able to detect the undesirable state that previously existed, then such an alarm would indeed be removed from the alarm list and would not reappear.

In the YANG descriptions and in the Internet-draft only the alarm list is discussed with respect to purging. What about the shelved-alarm list? There may, for example, be cleared alarms in that list that could also be purged, without necessarily having to first un-shelve those alarms.

Good catch, need to update the draft to support administrative actions on the shelved alarms as well!

br Stefan




Could you clarify?

Best regards
Nick
This message has been classified General Business by NICK HANCOCK on Monday, 23 April 2018 at 11:36:08.

From: CCAMP [mailto:ccamp-bounces@ietf.org] On Behalf Of stefan vallin
Sent: Thursday, February 08, 2018 1:42 PM
To: CCAMP (ccamp@ietf.org<mailto:ccamp@ietf.org>)
Subject: Re: [CCAMP] I-D Action: draft-ietf-ccamp-alarm-module-01.txt

Hi All!
We have now posted a new version of the draft. Updates based on comments from Balazs, Joey and Nick.

The major changes are:
1) Changed typedef operator-state into
- writeable-operator-state (not including includes shelved and un-shelved)
   can be set by an operator
- operator-state that can be read (includes shelved and un-shelved)

2) Added a leaf to the alarms in the shelf that states the name of the shelf.

3) Shelf criteria resource changed from leaf to leaf-list

4) Shelf criteria alarm type qualifier changed to regexp

5) Added text: "A server SHOULD describe how long it retains cleared/closed alarms: until manually purged or if it has an automatic removal policy.

6) Clarified that shelving/unshelving is done by shelf configuration and cannot be performed on individual alarms.

7) Clarified presedence order of resource naming ( 1 YANG instance identifier, 2 SNMP OID, 3 string )

8) Clarified that the alarm summary numbers do not include shelved alarms

9) Added a typedef resource-match which is a flexible way to identify resources.
Used in shelf criteria and alarm inventory

10) Clarified that an empty shelf indicates "shelf all”.

11) Added security considerations

12) Added reviewers to ack

There are still some outstanding items on the mailing list (from Nick and Joey).
But we felt it was time to publish the changes from some earlier discussions, see below.
Please let us know if we forgot something. I will top-post including these issues, summarising the discussions.
- severity filtering/shelving, needs further discussion
- temporal ordering, needs further discussion
- tagging of alarm types in alarm inventory, needs further discussion
- notifications for changed shelves, will add this in a later version
- probable cause description field, will add this in a later version

/stefan & martin


Stefan Vallin
stefan@wallan.se<mailto:stefan@wallan.se>
+46705233262

On 08 Feb 2018, at 12:33, internet-drafts@ietf.org<mailto:internet-drafts@ietf.org> wrote:


A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Common Control and Measurement Plane WG of the IETF.

       Title           : YANG Alarm Module
       Authors         : Stefan Vallin
                         Martin Bjorklund
            Filename        : draft-ietf-ccamp-alarm-module-01.txt
            Pages           : 62
            Date            : 2018-02-08

Abstract:
  This document defines a YANG module for alarm management.  It
  includes functions for alarm list management, alarm shelving and
  notifications to inform management systems.  There are also RPCs to
  manage the operator state of an alarm and administrative alarm
  procedures.  The module carefully maps to relevant alarm standards.


The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-ccamp-alarm-module/<https://datatracker.ietf.org/doc/draft-ietf-ccamp-alarm-module/>

There are also htmlized versions available at:
https://tools.ietf.org/html/draft-ietf-ccamp-alarm-module-01<https://tools.ietf.org/html/draft-ietf-ccamp-alarm-module-01>
https://datatracker.ietf.org/doc/html/draft-ietf-ccamp-alarm-module-01<https://datatracker.ietf.org/doc/html/draft-ietf-ccamp-alarm-module-01>

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-ietf-ccamp-alarm-module-01<https://www.ietf.org/rfcdiff?url2=draft-ietf-ccamp-alarm-module-01>


Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org<http://tools.ietf.org/>;.

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/<ftp://ftp.ietf.org/internet-drafts/>

_______________________________________________
CCAMP mailing list
CCAMP@ietf.org<mailto:CCAMP@ietf.org>
https://www.ietf.org/mailman/listinfo/ccamp<https://www.ietf.org/mailman/listinfo/ccamp>