[CCAMP] draft-vallin-ccamp-alarm-module-00.txt

stefan vallin <stefan@wallan.se> Wed, 11 October 2017 08:46 UTC

Return-Path: <stefan@wallan.se>
X-Original-To: ccamp@ietfa.amsl.com
Delivered-To: ccamp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30814128D0D for <ccamp@ietfa.amsl.com>; Wed, 11 Oct 2017 01:46:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wallan-se.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ggk3FKjgXosY for <ccamp@ietfa.amsl.com>; Wed, 11 Oct 2017 01:46:14 -0700 (PDT)
Received: from mail-wr0-x232.google.com (mail-wr0-x232.google.com [IPv6:2a00:1450:400c:c0c::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 157F11321C7 for <CCAMP@ietf.org>; Wed, 11 Oct 2017 01:46:14 -0700 (PDT)
Received: by mail-wr0-x232.google.com with SMTP id p10so667050wrc.6 for <CCAMP@ietf.org>; Wed, 11 Oct 2017 01:46:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wallan-se.20150623.gappssmtp.com; s=20150623; h=from:subject:message-id:date:to:mime-version; bh=g0ScQZmn0Y9XwEEnZdP4olf+QwcFubV3AqYaSDo7kMs=; b=l+z5chRIFDZavhADszPkRPx14D014WnTlbeEp9mRaAnPC7TQt4Jno5YGiBOWH4iix3 dz/sE5QXJzbbJK8tlcrXESdJ/6AZ6J9CMD3abJthx3cr9vyR+PBy4Zlp5s0Y8ceiWTzG 5+PzdzWCn7MvfaJ2BVajtFrGIQITkh8bdqEjTQPsrYXcdHFOM+KgVkHhCVLzzURB7r5P M3ZbwHboN4spjmIt+4QrgjRIg+0q/cHXAleY84bSgRrmKB1WUDpFYrHxI0EGVgkjxnA0 NB0Pan3sHPRUsVyE0QAeomfGhKdTbIfcKEwmxqB4V7siTN0kVxprQxl4nI6WmtdrYoWF JBoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:message-id:date:to:mime-version; bh=g0ScQZmn0Y9XwEEnZdP4olf+QwcFubV3AqYaSDo7kMs=; b=ObBzZbQTqmgg8CzRyNPN3J1IkUprii4IrUbBKYLYpMe9sjLJaVPXmyYHA3Op6hHUun eTc6yroOYLLHxUy9itJ6Q+Vsf/IxbzSSd2RwJU/oxrg2uwERtlPyweJnkQHSYHKBWp31 vHgvISw6JYEg0Fm7ss4LYrkPvC92XuFjVaajaMY+OPjlCGwJzDhYiSqUjXfAprqCjNFe ssMJ4aBRL/L1hMFjXcptzGD/cv2D6dWf+fUOL21kFCRWBkAvrRcXeIZItHaT57T8qZJp 2lkuKPJTuKY7rsvL/vF4bdowvoGHAXvIZmRmh14oxB83xD7hDCk1fE9Flm43cgnWopbl eiWg==
X-Gm-Message-State: AMCzsaXpwkW3aXKiYnzWwEDXa1n+3O2Wp7wBWXdTM02Nz/45YzHAk2TC 2io1ox+hhX4ydbhrOMCPOgyhy6bhhKY=
X-Google-Smtp-Source: AOwi7QA4wKQXiuD+AO0lQKkLfR0+pmdix4OseCdmetFnbtJQ9r1yaC87Fsm9o6j0fNYeDTqcJNbnNA==
X-Received: by 10.223.171.241 with SMTP id s104mr1161980wrc.256.1507711572109; Wed, 11 Oct 2017 01:46:12 -0700 (PDT)
Received: from host-2003-1c09-f00e-0002-78db-8330-37c2-ced6.1c09-h.de.terastrm.net (host-2003-1c09-f00e-0002-78db-8330-37c2-ced6.1c09-h.de.terastrm.net. [2003:1c09:f00e:2:78db:8330:37c2:ced6]) by smtp.gmail.com with ESMTPSA id x189sm4879808wmf.20.2017.10.11.01.46.11 for <CCAMP@ietf.org> (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 11 Oct 2017 01:46:11 -0700 (PDT)
From: stefan vallin <stefan@wallan.se>
Content-Type: multipart/alternative; boundary="Apple-Mail=_1CA3E571-312E-45AE-A560-BC67D04669A7"
Message-Id: <9C03C111-5D71-4DFD-847D-5FACD1D493C9@wallan.se>
Date: Wed, 11 Oct 2017 10:46:07 +0200
To: CCAMP@ietf.org
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/F7II06NrezyEjkoWzG4gT-cT79Y>
Subject: [CCAMP] draft-vallin-ccamp-alarm-module-00.txt
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Oct 2017 08:46:17 -0000

Hi All!

There have been discussions in NETMOD about our YANG module.
I would like to show two important responses in this mail group as well.

I think these responses are a good kick-start for the discussions in this group.


* NETMOD mailing group: 31 oct 2016: Comparison of Vallin and Sharma Draft: https://tools.ietf.org/html/draft-sharma-netmod-fault-model-01 <https://tools.ietf.org/html/draft-sharma-netmod-fault-model-01>
* NETMOD mailing group 12 dec 2016: Comment on the ITU-T reference.

Vallin vs Sharma
**********************

Overall comments
===============
*The sharma draft is a subset of the functionality in the vallin draft.
On top of the alarm-list, the vallin draft covers:
- alarm quality/usability requirements
- alarm summary
- alarm history (optional YANG feature)
- alarm operator actions like ack (optional YANG feature)
- alarm inventory (which are the possible alarms)
- alarm shelving (filtering)
- do not impose X733, rather this is an add-on

* Stateful vs notification-based definition of alarms.
 
The vallin draft focuses on representing alarms as an alarm state on a resource.
The notifications refers to alarm state changes for a specific alarm on a specific resource.
The alarm list represents the alarm state for a given resource and alarm type.

The sharma draft focuses on a notification focused view on alarms.
The alarm list represents the alarm notifications. The management system has to
correlate the notifications into an actual alarm state.

* On X733
The vallin draft does not use the X733 as a mandatory *core* model, rather it allows for X733 mapping when needed. 
While X733 has been the root for most telecom oriented alarm systems, it adds a bit of historic overhead.
For example globally standardised probable cause values have not shown to be useful in some cases.
X733 represents notifications and not state (see above).
Therefore an alarm is either cleared or minor for example.
The vallin draft clearly separates this. Severity is one thing, clearance is another.

* Terminology
the module is named “fault” while modelling alarms.
See for example X733 for definitions of fault, errror and alarm.



Detailed comments
================

Section 2
"  New network architectures that include controllers, orchestrators,
   PCE, applications, etc., require new alarm types and probable causes
   to be defined.  These new alarm types and probable causes will be
   defined in the next version of the model.”

We doubt that having globally defined probable causes is a scalable way forward.
Did not work well in X733 or RFC3877.

Probable cause values from standards has been more of historical value then real
value to alarm operators and alarm systems. Most telecom oriented alarm systems
require this alarm attribute. However the actual values are different for all management
systems. It is a confusing area with conflicting enum values. Our approach is different.
We consider this to be a configurable mapping to match the needs of the management
system and no values are defined in the alarm module.
It is also kept separate as a X733 mapping rather than part of the core model.


Section 2.2

The definition of 'alarm-id' is unclear. 
"In most cases this will be a combination of entity-type, entity-id,
probable-cause and severity”

entity-type: string
entity-id: inet:uri
probable-cause: identity-ref
severity: enum

Several questions on this:
a) why does not the entity (called resource in the vallin draft) refer to a path in the YANG data tree?
In the vallin draft we use an instance-identifier. We also allow for other resource instances based on SNMP or even a string as last resort.

b) Alarm list key
- assume you have a threshold alarm with the following life-cycle:
  T1, T2, T3, are the times for the alarm state changes.
  T1: raise, minor
  T2: major
  T3: clear

In the sharma draft this will be  *three different entries* in the alarm list. The client
would have to correlate those.

In the vallin draft this is *one entry*, one alarm, with three different states.
The vallin draft also clearly separates the severity from the clearance state.
The final state model is
(major, cleared)
This is important, what was the severity of the alarm and is it cleared or not?
This is not easily seen in the sharma draft.

This implies that the sharma draft alarm list is more of a notification log rather than an 
alarm-list that shows the current state. The vallin draft  alarm-list focuses on current state
of the alarms. Notifications represent state changes on the alarm state.

c) the service-affecting flag
This is superfluous. If the X733 severity levels are set correctly this is enough for
service-affecting or not. See the X733 definition of severity levels.
The vallin draft also has a leaf impacted-resources. In this leaf an alarm can refer to
affected/impacted services.

d) alarm-sequence number
Not needed, NETCONF has notification replay and uses SSH sessions.
Furthermore, since the sharma alarm list key is the identification of an alarm notification, why is this needed at all?
There has been these kinds of var-binds in SNMP Alarm MIBs as well, it was a bit flawed there as well: you could use informs instead of trap-pdus. How does this work with filtering mechanisms?
Do not try to do protocol stuff in the model.


e) House-keeping
Unclear how the list is managed, when are entries removed?


f) Other considerations:
- The vallin draft allows for more flexible resource (entity) identification, see below:

The primary mechanism is an instance-identifier so that a node in the data-tree can be referenced
The sharma draft uses  string/uri which is another domain than the module. A bit strange, say you have
an interface alarm, why should you not send an alarm on the path to interface in the data-model.

The Vallin draft also allows for other naming schemes, for example SNMP OIDs:
  typedef resource {
    type union {
      type instance-identifier {
        require-instance false;
      }
      type yang:object-identifier;
      type string;
    }


The alarm also has an optional leaf for referring to the alarming resource using an alternate naming scheme
        +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
           +--ro resource                      resource
           +--ro alarm-type-id                 alarm-type-id
           +--ro alarm-type-qualifier          alarm-type-qualifier
           +--ro alt-resource*                 resource

So the alarm can use both the instance-identifer and the SNMP OID for the alarming interface for example.


ITU-T
****************
First of all, I need to stress the fact that we carefully *did not* do the module design based on X.733.
There is a *separate*, optional, module to map to X733 which is still relevant for *some* management platforms.
So I do not really agree on that top-level comment.

But also note that G.7710 and G.806 are not replacing X.733. X.733 defines the information model
for *alarm reporting* along with the ASN.1 data definitions. G.7710 and G.806 defines the equipment fault
management functions that leads to alarm reporting at the end. The alarm list functions and alarm notifications
are not defined by G.7710 and G.806 to my knowledge.


G.806
Not so much related to general alarm management, this is more specifying instrumentation and 
behaviour for transmission alarm types like LOS, AIS etc.

G.7710
* Alarm Report Control maps to alarm shelving in the vallin alarm yang model. We could rename
  to ARC. Will do a detailed study if it maps 100% or if there is missing pieces.
* Alarm Severity Assignment Profile, configurable severity levels could be introduced. 
I doubt this being useful however. That is more for the management application itself rather than 
configuring on the device. But if you see requirements and scenarios for that, we can add it.


RFC7260
We need to make sure all requirements are covered. I think we are fine. 
Requirements on alarm filters, avoiding spurious alarms etc are covered

RFC5860
See above.


So in summary, there would be very little work to reference 
RFC7260, RFC5860, G.7710, G.806.
There are no big missing pieces, more on terminology especially ARC.


Stefan Vallin
stefan@wallan.se
+46705233262