Re: [CCAMP] Alarm Module work

"De La Marche, Dirk (Nokia - BE/Antwerp)" <dirk.de_la_marche@nokia.com> Fri, 01 June 2018 14:11 UTC

From: "De La Marche, Dirk (Nokia - BE/Antwerp)" <dirk.de_la_marche@nokia.com>
To: stefan vallin <stefan@wallan.se>, NICK HANCOCK <nick.hancock@adtran.com>
CC: Common Control and Measurement Plane Discussion List <ccamp@ietf.org>
Thread-Topic: [CCAMP] Alarm Module work
Thread-Index: AQHT+PyTXIQ9kDJyZku7l8czo2vJyqRKMfIAgAEo0OA=
Date: Fri, 01 Jun 2018 14:11:35 +0000
Message-ID: <AM5PR0701MB23385334660EC433FD678B0AF1620@AM5PR0701MB2338.eurprd07.prod.outlook.com>
References: <D174588E-1233-4B53-B5BB-D29DE14B3888@wallan.se> <BD6D193629F47C479266C0985F16AAC7F07058E9@ex-mb1.corp.adtran.com> <7906650D-4E83-4386-AA08-43B120CD6866@wallan.se>
In-Reply-To: <7906650D-4E83-4386-AA08-43B120CD6866@wallan.se>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: nokia.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/related; boundary="_004_AM5PR0701MB23385334660EC433FD678B0AF1620AM5PR0701MB2338_"; type="multipart/alternative"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: d765793d-b9cb-41a9-8c5e-08d5c7c995b3
X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Jun 2018 14:11:35.5625 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0701MB2692
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/bvSVRgfU-b0d9mOmge4YR_dlIc4>
Subject: Re: [CCAMP] Alarm Module work
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jun 2018 14:11:50 -0000

Stefan, Nick,

Thank you for this interesting exchange of alarm ideas.

I have two questions related to the extended notification filtering proposal.

(1) In the two examples in this mail exchange the clear of the alarm always happens when the alarm is above the severity threshold line. What if the clear happens below the severity threshold line?  Will we in this case also send an alarm state change notification? Similarly if an alarm stays during its whole lifetime below the severity threshold line, we do not expect any notification, right? Wouldn't this mean that the alarm application needs to remember the history of the alarm, i.e. whether a notification was once send (e.g. alarm severity was shortly above the severity threshold), in order to generate or not generate a clear notification?
I would imagine that an external alarm manager loses interest in an alarm once it drops below the severity threshold line (as indicated by the T4 notification). Would it still be interested in the notification once it is actually cleared on the box?

(2) if indeed the severity filtering functionality overlaps with the shelving feature, does this mean that the same rules apply to shelved alarms as they apply to severity filtered alarms, i.e. that a clear of the alarm always results in a clear notification? The case I am thinking about is an active alarm that is shelved. There is no notification defined at the moment the alarm is shelved which is inconsistent from an external alarm manager point of view. When the alarm clears while shelved don't we also need a clear notification? How else will the external alarm manager stay in sync with the box' alarm situation?

Based on this mail exchange I would agree that alarm shelving is a workable method that can replace the severity filtering if we can provide a means (e.g. using notifications) to keep the alarm application on both NC server and NC client in sync. In my opinion this is an important criterium to validate any alarm model.

Concerning the second proposal, a workable Alarm Severity Assignment Profile is something that is very valuable (since actually used by several, if not all the operators I have worked with). If we can overrule vendor-specific alarm severities on alarm-id and/or on object-id using wildcards we are fine.

Kind regards,
Dirk
From: CCAMP [mailto:ccamp-bounces@ietf.org] On Behalf Of stefan vallin
Sent: Thursday, May 31, 2018 9:03 PM
To: NICK HANCOCK <nick.hancock@adtran.com>
Cc: Common Control and Measurement Plane Discussion List <ccamp@ietf.org>
Subject: Re: [CCAMP] Alarm Module work

Hi Nick!
Thanks for your comments, really appreciated!

On 31 May 2018, at 18:24, NICK HANCOCK <nick.hancock@adtran.com<mailto:nick.hancock@adtran.com>> wrote:

Hi Stefan,

Thanks for sharing this for discussion.

I absolutely agree that it is most important that we can progress the YANG Alarm Module to RFC asap, so that support for it can begin to be implemented in the industry.


1) Extended notification filtering.

The introduction of 'notify-security-level' here has one drawback in that it is a global configuration applying to all alarm types and thus does not allow the behaviour to be assigned based on type of resource, such as interface or object. This does not allow you to suppress alarms below a certain severity for some interface types that are not so important, for example, but keep normal alarm behaviour for other more important interfaces.
mmm, we could add resource and alarm type to notification filtering, but then we have alarm shelving and notification filtering as overlapping mechanisms.
I could argue that the manager/client could do the notification filtering.
My experience is that the severity level filtering is of less importance. You normally want to shelf based on resource and alarm type

Also just 'Crossing the specified level' may fulfil the desired behaviour for alarm types with a 1:1 mapping to severity level, but not for alarm types with severity life cycles. if the severity of the alarm continues to increase above the configured severity level, alarm notifications would also need to be sent.
That was the intention, will clarify.


Consider the following example assuming the severity-level is set to 'major':
           [(Time, severity, clear)]:
           [(T1, major, -), (T2, critical, -), (T3, major, -), (T4, minor, -), (T5, major), (T6, major, clear)]
I would expect alarm notifications at T1, T2, T3, and T6.
Adding some severity changes to your scenario
[(T1, major, -), (T2, critical, -), (T3, major, -), (T4, minor, -), (T5, warning), (T6, major), (T7, clear)]
[cid:image003.png@01D3F9B9.A5EBDB00]

Notifications will be sent at T1, T2, T3, T4, T6, T7



Changing this configuration to a choice does have its advantages, such as to allow the addition of other cases later or allow other SDOs or vendors to augment other cases.
So maybe a pragmatic approach so as to not block progress of this draft could be to keep the choice format but omit the 'notify-security-level' for now and continue discussions.
Or keep it :)



2) Alarm Severity Assignment Profile

The proposed implementation provides the means to override severities for alarm types with severity life cycles, but at the same time the implementation is relatively simple.
Also the use of a criteria to assign the severity to alarms - and I am assuming that it would work like for the shelf - allows resource-independent overriding of factory default severities through specification of an alarm-type only, but also to add additional overrides for specific resources. The only question is if there are multiple assignments that apply to a specific alarm instance which assignment should apply? For example, if I create an entry for
[(alarm-type-id, alarm-type-id-qualifier, resource, severity-level]
[(los,,,major),(los,,"interface 1", critical)]
which severity assignment applies to interface 1? The more specific?
Good comment!
Priority order:
1) the more specific
  1.1) resource
  1.2) alarm type (remember hierarchical)
2) order in list



The implementation is also very specific to alarm severity assignment. The mechanism itself, though, is relatively generic, mapping information to alarms, in this case alarm severities. Other SDOs or vendors may wish to augment the list with other data nodes to use this mechanism to associate other data with alarm types and avoid having to implement multiple lists. So I believe that there would be a great advantage and added-value, if this list would be made more generic, such as renaming it to just 'alarm-profile', for example.
We are circling back and forth here.
The list has expressed the need to support ITU Alarm Severity Assignment Profile, this is exactly what the suggested model does.


And although it does fulfil the requirements of M.3100/M.3160,
as requested :)

including the list within the module ietf-alarms-itu would basically restrict the use and possible extension of the ASAP to ITU requirements only.
Not sure what you mean with "restrict"


Since possible augmentations could originate from requirements coming from other SDOs and vendors, it would IMHO not be prudent to include it in this module.
Nothing stops augmentations and additions of other features. Just felt there was a high pressure for ASAP which the suggestion captures.

best regards Stefan


Best regards
Nick



This message has been classified General Business by NICK HANCOCK on Thursday, 31 May 2018 at 18:24:19.

From: CCAMP [mailto:ccamp-bounces@ietf.org] On Behalf Of stefan vallin
Sent: Monday, May 28, 2018 2:12 PM
To: Common Control and Measurement Plane Discussion List
Subject: [CCAMP] Alarm Module work

Hi!
Me and Martin are working on an updated version of the alarm module. Several smaller things pointed out by reviewers. Thank you all.

We would like to share 2 things for discussion:
1) Extended notification filtering
2) Alarm Severity Assignment Profile

We are now stretching the limit for being a first core module with only relevant features.
At this point I think it is more important to start having implementation support rather than adding even more features which might scare people of from implementing it.


1) Extended notification filtering
========================
See suggestion below, added the capability to filter on severity.
We did not include resource filtering since that would be too much overlap with shelving.

      choice notify-status-changes {
        description
          "This leaf controls the notifications sent for alarm status
           updates. There are three options:
           1. notifications are sent for all updates, severity level
              changes and alarm text changes
           2. notifications are only sent for alarm raise and clear
           3. notifications are sent for status changes equal to or
              above the specified severity level. Clear notifications
              shall always be sent
              Notifications shall also be sent for state changes that
              makes an alarm less severe than the specified level.
           In option 3, assuming the severity level is set to major,
           and that the alarm has the following state changes
           [(Time, severity, clear)]:
           [(T1, major, -), (T2, minor, -), (T3, warning, -),
            (T4, minor, -), (T5, major), (T6, major, clear)]
           In that case, notifications will be sent at
           T1, T2, T5 and T6";
        leaf notify-all-state-changes {
          type empty;
          description
            "Send notifications for all status changes.";
        }
        leaf notify-raise-and-clear {
          type empty;
          description
            "Send notifications only for raise, clear, and re-raise.
             Notifications for severity level changes or alarm text
             changes are not sent.";
        }
        leaf notify-severity-level {
          type severity;
          description
            "Only send notifications for alarm state changes
             crossing the specified level. Always send clear
             notifications.";
        }
      }



2) Alarm Severity Assignment Profile
============================

We have renamed ietf-alarms-x733 to ietf-alarms-itu since it now includes X.733 as well as M.3100/M.3160 features

  list alarm-severity-assignment-profile {
      if-feature alarm-severity-assignment-profile;
      key "alarm-type-id alarm-type-qualifier resource";
      ordered-by user;
      description
        "If an alarm matches the criteria in one of the entries
         in this list the configured severity levels shall be
         used instead of the system default. Note well that the
         mapping allows for several severity levels since this
         alarm module uses a stateful alarm model where
         the same alarm can have the following states:
         [(warning, not cleared),(minor, not cleared),
          (minor, cleared)]

         The configuration of this list shall update the
         /al:alarms/al:alarm-inventory/al:alarm-type list so that a
         client can always get a full picture of the possible alarms
         by reading the alarm inventory. If an alarm matches several
         entries in this list, the first match is used.";
      reference
        "M.3160/M.3100 Alarm Severity Assignment Profile, ASAP";
      leaf alarm-type-id {
        type al:alarm-type-id;
        description
          "The alarm type identifier to match for severity
           assignment.";
      }
      leaf alarm-type-qualifier {
        type string;
        description
          "A W3C regular expression that is used to match
           an alarm type qualifier.";
      }
      leaf resource {
        type al:resource-match;
        description
          "Specifies which resources to match for severity
           assignment.";
      }
      leaf-list severity-levels {
        type al:severity;
        ordered-by user;
        description
          "Specifies the configured severity level(s) for the
           matching alarm. If the alarm has several severity
           levels the leaf-list shall be given in rising severity
           order. The original M3100/M3160 ASAP function only
           allows for a one-to-one mapping between alarm type and
           severity but since the IETF alarm module supports stateful
           alarms the mapping must allow for several severity levels.

           Assume a high-utilisation alarm type with two
           thresholds with the system default severity levels of
           threshold1 = warning and threshold2 = minor. Setting this
           leaf-list to (minor, major) will assign the severity
           levels threshold1 = minor and threshold2 = major";
      }
      leaf description {
        type string;
        mandatory true;
        description
          "A description of the alarm severity profile.";
      }
    }


Stefan Vallin
stefan@wallan.se<mailto:stefan@wallan.se>
+46705233262

Attachment: image003.png

Re: [CCAMP] Alarm Module work stefan vallin
Re: [CCAMP] Alarm Module work De La Marche, Dirk (Nokia - BE/Antwerp)
Re: [CCAMP] Alarm Module work stefan vallin
Re: [CCAMP] Alarm Module work stefan vallin
Re: [CCAMP] Alarm Module work NICK HANCOCK
[CCAMP] Alarm Module work stefan vallin
Re: [CCAMP] Alarm Module work NICK HANCOCK
Re: [CCAMP] Alarm Module work stefan vallin
Re: [CCAMP] Alarm Module work De La Marche, Dirk (Nokia - BE/Antwerp)
Re: [CCAMP] Alarm Module work NICK HANCOCK
Re: [CCAMP] Alarm Module work stefan vallin

Re: [CCAMP] Alarm Module work

Attachment: image003.png