Re: [CCAMP] Comments on draft-vallin-ccamp-alarm-module-01

stefan vallin <stefan@wallan.se> Tue, 14 November 2017 11:47 UTC

Return-Path: <stefan@wallan.se>
X-Original-To: ccamp@ietfa.amsl.com
Delivered-To: ccamp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D72D2124B17 for <ccamp@ietfa.amsl.com>; Tue, 14 Nov 2017 03:47:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wallan-se.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FUrZZIChszPe for <ccamp@ietfa.amsl.com>; Tue, 14 Nov 2017 03:47:27 -0800 (PST)
Received: from mail-lf0-x22e.google.com (mail-lf0-x22e.google.com [IPv6:2a00:1450:4010:c07::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0B0F1120726 for <ccamp@ietf.org>; Tue, 14 Nov 2017 03:47:27 -0800 (PST)
Received: by mail-lf0-x22e.google.com with SMTP id a132so21946488lfa.7 for <ccamp@ietf.org>; Tue, 14 Nov 2017 03:47:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wallan-se.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=wVsDUwxq81Uylgi1Z0FB1tL0FUksu9/Ks7nBwfEE2cY=; b=BcD7eeaduQAK12vzKoZJm1tTweTiRrX/uoPGE/t1jp4QZbDo0CSNh8YqzldzNK2GPW XOzyH/+9NpDQV3qRaxOl4tMg9LuwAYVTq/a3AM+rhB2C2TnDShFGTJxwBXBECOH1EljG lEo3OL1vXJQRTWZxV8c6q5i1NURTQsnWZ8dGqT5spcuyehdlR9+kWJgJQOYEmMCrR6G8 XUyiMUY/WtqOUiCX1/L4m0rCV1HqUC9C0ImT20u4nhbloBOr21lek1RC3zMo6pRz02sS rrC2j50bAPyfAUEMwJ4Xa/VVhCyZQuE7gdDzlF//dGpZf0Lj0e26DAYOy447OxhvqOAw 0dDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=wVsDUwxq81Uylgi1Z0FB1tL0FUksu9/Ks7nBwfEE2cY=; b=nkCa7PbcmLRBpFclM+PfVdYLw8hfbUVu/clyRq9awFuL8rohaa6g+iW82HfaS5IIEo TNFLu4n8GFVtpspWnliJ1aMOF6M5SaZWJgweLmUHO5Mn9sMPuw1sFiqAUZOqL5mSIu6Q e3XZUA3tB/hKblj3YcbhJRskyK2ei74PksK9VHr0kRKgnYiDkx3xoLvDrAUMrZKscfBH JEv2ixtD9mRG4GmeDtch9KP29XtFYN+Ii1NYteHhfgapjjtuKEQAC0QfubWUayL8zAbE 01jdS0RqhXhJCwj2FLpgWdzFulQBn6L6/HEEuRuwctqnrGLgLJmvaZE6YIS8xnEUgSjC zTSw==
X-Gm-Message-State: AJaThX5PJiWcGOfRr2JDzSgzfn3SWGfpVtvefs8Nit4XTULgQDShgtFX Dm8Z+wr1yebH2XZ9IWJYcvxQgmY5ZQfCHQ==
X-Google-Smtp-Source: AGs4zMaI0MsTeCJeUgAdx8dGfTb5cLajcd3aClw8ZE+LK7HCRvHghCpgaaybnXNUdlTyCEV6jDkJyA==
X-Received: by 10.25.234.15 with SMTP id i15mr3611307lfh.227.1510660045083; Tue, 14 Nov 2017 03:47:25 -0800 (PST)
Received: from [10.217.9.116] ([129.178.182.68]) by smtp.gmail.com with ESMTPSA id r89sm254866ljr.16.2017.11.14.03.47.23 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 14 Nov 2017 03:47:24 -0800 (PST)
Content-Type: multipart/alternative; boundary="Apple-Mail=_40FD22E0-A0BA-4BAC-8F2A-36DFBA420C9F"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: stefan vallin <stefan@wallan.se>
In-Reply-To: <093d66bd-e03d-ecbd-b933-33b8b1d1dddf@ericsson.com>
Date: Tue, 14 Nov 2017 12:47:21 +0100
Cc: ccamp@ietf.org
Message-Id: <BC53C8B6-2D8B-4F7D-868E-7179B3FAD1B0@wallan.se>
References: <093d66bd-e03d-ecbd-b933-33b8b1d1dddf@ericsson.com>
To: Balazs Lengyel <balazs.lengyel@ericsson.com>
X-Mailer: Apple Mail (2.3124)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/uWInrSvLsSp9-bdju7NVLPicvT0>
Subject: Re: [CCAMP] Comments on draft-vallin-ccamp-alarm-module-01
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Nov 2017 11:47:35 -0000

Hi Balazs!
Thanks for your comments!

See answers inline

br Stefan and Martin

> On 14 Nov 2017, at 03:36, Balazs Lengyel <balazs.lengyel@ericsson.com> wrote:
> 
> Hello,
> 
> Generally the module is good, but some comments:
> 
> 4.5.1) Add as a recommendation: An alarm agent SHOULD describe how long it retains cleared/closed alarms: till manually purged or if it has an automatic removal policy.
> 
> 
OK, will add and clarify.
> 5.1.1) State that shelving is effectively a way for the operator to suppress an alarm type/resource. Using the word suppress or block would make it easier to understand why do we do shelving.
> 
> 
Ok.
> For us it is a common situation that the operator does not want to suppress an alarm type, just change the severity level e.g. stating that a particular alarm type is not critical just minor. I am missing this functionality
> 
> 
I have kept this out by design. I consider this being a function of the alarm application, the client. 
If the system, the server, has “wrong” severity levels that should be managed by the system by local means.

Furthermore this notion assumes that there is a one-to-one mapping between alarm types and severity levels.
This is not the case, an alarm can go [(warning), (minor), (minor, cleared), (major)]
For example, a utilisation alarm for a link 70%, 80%, 90%.

(Same reason that I omitted the ITU Alarm Severity Assignment Profile)


> typedef resource) It should be stated that when the resource is modeled both in YANG and in SNMP and possibly has other addresses (simple string, distinguished name) which should appear in the resource leaf/type. IS there a fixed precedence between these address types? IMHO there should be.
> 
> 
Ok, will clarify
The priority order is:
     typedef resource {
       type union {
1)         type instance-identifier {
           require-instance false;
         }
2)      type yang:object-identifier;
3)         type string;

Note well that there is the alt-resource*  leaf. 
       |  +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
       |     +--ro time-created             yang:date-and-time
       |     +--ro resource                 resource
       |     +--ro alarm-type-id            alarm-type-id
       |     +--ro alarm-type-qualifier     alarm-type-qualifier
       |     +--ro alt-resource*            resource


This lets a server put the instance-identifier in the resource leaf which is used as key, and the OID in the alt-resource leaf for example.

> typedef severity) Which severity should be assigned to a loss of redundancy type problem? E.g. Link to one of the 2 license servers down.
> 
> 
I think that we have good tactics on setting severity levels revisiting the original texts in X.733. It reasons about service affecting or not.
See below, so it would be minor or warning
         enum minor {
           value 3;
           description
             "The 'minor' severity level indicates the existence of a
              non-service affecting fault condition and that corrective
              action should be taken in order to prevent a more serious
              (for example, service affecting) fault.  Such a severity
              can be reported, for example, when the detected alarm
              condition is not currently degrading the capacity of the
              resource.";
         }
         enum warning {
           value 4;
           description
             "The 'warning' severity level indicates the detection of
              a potential or impending service affecting fault, before
              any significant effects have been felt.  Action should be
              taken to further diagnose (if necessary) and correct the
              problem in order to prevent it from becoming a more
              serious service affecting fault.";
         }

Note also that a consistent use of the severity level descriptions makes a “service affecting flag” redundant.

> container alarm-shelving)
> 
> - the description starts with "This list ..."
> 
> 
Will fix :)
> - Is it possible to unshelve an alarm? When will an alarm have operator-state un-shelved? If an un-shelved alarm is acknowledged, will the systems still remember it was unshelved? IMHO if we support unshelving it should be a separate property not part of operator state. I would need to know that the alarm was unshelved AND acknowledged, both.
> 
> 
Shelving and unshelving is done by configuration. See my response to the email from Joey and Nick from Adtran on this as well. 
alarm-shelving/shelf
Unshelving is deleting the corresponding shelf configuration.

Operator states shelved/unshelved can *not* be set by an operator, only read.
These states are set by the instrumentation.
Ack will stay in the operator-state history:
       +--ro operator-state-change* [time] {operator-actions}?
       |  +--ro time        yang:date-and-time
       |  +--ro operator    string
       |  +--ro state       operator-state
       |  +--ro text?       string

For example
operator-state-change* list
* T1:  joe, ack
* T2: joe, shelved (set by instrumentation)
* T3: joe, un-shelved (set by instrumentation)


> container alarm inventory)
> 
> - the description starts with "This list ..."
> 
> 
Will fix :)
> As this container contains alarm-types, it should be called "container alarm-TYPE-inventory"
> 
> 
Possibly, will consider changing the name
> container summary)
> 
> "This container gives a summary of number of alarms and shelved alarms"
> If the operator decided to suppress (shelve) an alarm, it 
> SHOULD NOT be part of the summary counters. I shelved it because 
> I am not interested, than don't count it. 
> If I try to assess my network node's health and I decided that typeX alarms are 
> not relevant for me, then an alarm count indicating I have 100 typeX suppressed alarms is misleading.
> We need counters that exclude shelved alarms.

I will improve the descriptions, the intent is exactly as you describe. The description above refers to the fact that there are
summary counter for alarms (not shelved) and a leaf  indicating shelved alarms.
         leaf shelves-active {
           if-feature alarm-shelving;
           type empty;
           description
             "This is a hint to the operator that there are active
              alarm shelves.  This leaf MUST exist if the
              alarms/shelved-alarms/number-of-shelved-alarms is > 0.";
         }

Will improve descriptions.

> 
> container shelved-alarms) The description does not allow for unshelving. Then why do we have such an operator state?
See my response to the email from Joey and Nick from Adtran on this as well. The mechanism is not clear as currently described.
Shelving and un-shelving can *only* be done by configuring/deleting a shelve.
The operator-state shelved and un-shelved can only be *read* not set.

I will work on improving this in next version.

         enum shelved {
           value 4;
           description
             "Alarm shelved.  Alarms in alarms/shelved-alarms/
              MUST be assigned this operator state by the server as
              the last entry in the operator-state-change list.";
         }
         enum un-shelved {
           value 5;
           description
             "Alarm moved back to alarm-list from shelf.
              Alarms 'moved' from /alarms/shelved-alarms/
              to /alarms/alarm-list MUST be assigned this
              state by the server as the last entry in the
              operator-state-change list.";
         }



> 
> regards Balazs
> -- 
> Balazs Lengyel                       Ericsson Hungary Ltd.
> Senior Specialist
> Mobile: +36-70-330-7909              email: Balazs.Lengyel@ericsson.com <mailto:Balazs.Lengyel@ericsson.com> 
> _______________________________________________
> CCAMP mailing list
> CCAMP@ietf.org
> https://www.ietf.org/mailman/listinfo/ccamp