Re: [CCAMP] draft-vallin-netmod-alarm-module again

stefan vallin <stefan@wallan.se> Mon, 09 October 2017 16:57 UTC

Return-Path: <stefan@wallan.se>
X-Original-To: ccamp@ietfa.amsl.com
Delivered-To: ccamp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8A7751346E9 for <ccamp@ietfa.amsl.com>; Mon, 9 Oct 2017 09:57:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wallan-se.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VFKpQ5AK2SIE for <ccamp@ietfa.amsl.com>; Mon, 9 Oct 2017 09:57:28 -0700 (PDT)
Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com [IPv6:2a00:1450:4010:c07::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 887191346EB for <ccamp@ietf.org>; Mon, 9 Oct 2017 09:57:25 -0700 (PDT)
Received: by mail-lf0-x229.google.com with SMTP id g70so12484375lfl.3 for <ccamp@ietf.org>; Mon, 09 Oct 2017 09:57:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wallan-se.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=iDGVS7o+UfyOncUNMQBQgNGjJsFwuUjGixKNsi94MC0=; b=0+OxyWTeU8YePyRj+XKTQImcCr/ONcxDwxWDlPcHcIhnY2XgRhdIFfniWNqu9/w4GV SyZk0BgoNRFSSUuGmkZXk4FypP79r/oHNk46Ma145EbFDNzP7KZpJVXsVESuiXWctxvO W8Xo9GgHs2jlulQvCjiUsaRy1+qUlyg/XNzoSNlZvcQO8dLJj53AfqbhFauhlTrIgXle ZZuHU0KIirxSmrroia40sKcDtr4lOmQ6k8iYoepjIxnKyFNM1Y8REQEHbkvP4hsvgM82 KPCioKfgPflxrbTnBUoxpOvH34kTuPCTh2LPUJeh4EdG6hotZQsUawJQ4hd2x+MbmvFk B9mg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=iDGVS7o+UfyOncUNMQBQgNGjJsFwuUjGixKNsi94MC0=; b=PykZhVioFUuMCODMFytkgH8vqv7Z9euCrBTDC/UxnSuJiTVSJ0m+E0P1dt3J+ZGeZs xp8kaYE+jVi48i0c5YjqtHlHmqoK51XaW2wW2TZhPSZCC+JXMmNSlQASNebj8+RI0Mzv EpCnKCv7vt1utNu+uXr1QdKDO3fxkxXxPvkXRfAfJOmwAHtmnEoRcouWctMDGwYH2OdY lBVHj7w2pXPc+DipYcyj0oRdbvKz5fOWvuOSVNHBnZnrM0j6wufl4ZVMNz5U98lZcXAN v/uOFdyWbVR2bKUuh5f4CQZiZhJBKr+wlwc3+FZbY3rgWf2hx2XQE7mvdq/B4Mtb6RK1 GqTw==
X-Gm-Message-State: AMCzsaXMSPUjx7v4X83ewLgh1c/KXHbVmcBXwBysXVPHiZuHqfiBZRMm m3No+YFLgoVEm2ARiCbbfj5neQ==
X-Google-Smtp-Source: AOwi7QA2osYAGgswc5Txpubqeo2nzjeRRemke/BC7jcvthCtA7+Zvelod+rm/unJ/2hYCRQJ1HYAmA==
X-Received: by 10.25.80.74 with SMTP id z10mr3811907lfj.76.1507568242127; Mon, 09 Oct 2017 09:57:22 -0700 (PDT)
Received: from [192.168.1.231] (h95-155-236-198.cust.se.alltele.net. [95.155.236.198]) by smtp.gmail.com with ESMTPSA id d82sm1433482lfl.32.2017.10.09.09.57.20 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 09 Oct 2017 09:57:21 -0700 (PDT)
Content-Type: multipart/alternative; boundary="Apple-Mail=_12ABE875-07D4-436D-AAFB-85AE2824B42F"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: stefan vallin <stefan@wallan.se>
In-Reply-To: <E0C26CAA2504C84093A49B2CAC3261A43A2C83DA@DGGEML503-MBX.china.huawei.com>
Date: Mon, 09 Oct 2017 18:57:19 +0200
Cc: Martin Bjorklund <mbj@tail-f.com>, "ccamp@ietf.org" <ccamp@ietf.org>, "bclaise@cisco.com" <bclaise@cisco.com>, "draft-vallin-netmod-alarm-module@ietf.org" <draft-vallin-netmod-alarm-module@ietf.org>
Message-Id: <132EEA7F-6D1F-4F92-B311-B82799C2147D@wallan.se>
References: <20171009.111354.2211060789355821652.mbj@tail-f.com> <E0C26CAA2504C84093A49B2CAC3261A43A2C83DA@DGGEML503-MBX.china.huawei.com>
To: Zhenghaomian <zhenghaomian@huawei.com>
X-Mailer: Apple Mail (2.3124)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/oBTv2U1ODNUSpax23LZQFGXBqiI>
Subject: Re: [CCAMP] draft-vallin-netmod-alarm-module again
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Oct 2017 16:57:32 -0000

Hi All!
Please see two responses I wrote last year:
* 31 oct 2016: Comparison of Vallin and Sharma Draft
* 12 dec 2016: Comment on the ITU-T reference.

Vallin vs Sharma
**********************

Overall comments
===============
*The sharma draft is a subset of the functionality in the vallin draft.
On top of the alarm-list, the vallin draft covers:
- alarm quality/usability requirements
- alarm summary
- alarm history (optional YANG feature)
- alarm operator actions like ack (optional YANG feature)
- alarm inventory (which are the possible alarms)
- alarm shelving (filtering)
- do not impose X733, rather this is an add-on

* Stateful vs notification-based definition of alarms.
 
The vallin draft focuses on representing alarms as an alarm state on a resource.
The notifications refers to alarm state changes for a specific alarm on a specific resource.
The alarm list represents the alarm state for a given resource and alarm type.

The sharma draft focuses on a notification focused view on alarms.
The alarm list represents the alarm notifications. The management system has to
correlate the notifications into an actual alarm state.

* On X733
The vallin draft does not use the X733 as a mandatory *core* model, rather it allows for X733 mapping when needed. 
While X733 has been the root for most telecom oriented alarm systems, it adds a bit of historic overhead.
For example globally standardised probable cause values have not shown to be useful in some cases.
X733 represents notifications and not state (see above).
Therefore an alarm is either cleared or minor for example.
The vallin draft clearly separates this. Severity is one thing, clearance is another.

* Terminology
the module is named “fault” while modelling alarms.
See for example X733 for definitions of fault, errror and alarm.



Detailed comments
================

Section 2
"  New network architectures that include controllers, orchestrators,
   PCE, applications, etc., require new alarm types and probable causes
   to be defined.  These new alarm types and probable causes will be
   defined in the next version of the model.”

We doubt that having globally defined probable causes is a scalable way forward.
Did not work well in X733 or RFC3877.

Probable cause values from standards has been more of historical value then real
value to alarm operators and alarm systems. Most telecom oriented alarm systems
require this alarm attribute. However the actual values are different for all management
systems. It is a confusing area with conflicting enum values. Our approach is different.
We consider this to be a configurable mapping to match the needs of the management
system and no values are defined in the alarm module.
It is also kept separate as a X733 mapping rather than part of the core model.


Section 2.2

The definition of 'alarm-id' is unclear. 
"In most cases this will be a combination of entity-type, entity-id,
probable-cause and severity”

entity-type: string
entity-id: inet:uri
probable-cause: identity-ref
severity: enum

Several questions on this:
a) why does not the entity (called resource in the vallin draft) refer to a path in the YANG data tree?
In the vallin draft we use an instance-identifier. We also allow for other resource instances based on SNMP or even a string as last resort.

b) Alarm list key
- assume you have a threshold alarm with the following life-cycle:
  T1, T2, T3, are the times for the alarm state changes.
  T1: raise, minor
  T2: major
  T3: clear

In the sharma draft this will be  *three different entries* in the alarm list. The client
would have to correlate those.

In the vallin draft this is *one entry*, one alarm, with three different states.
The vallin draft also clearly separates the severity from the clearance state.
The final state model is
(major, cleared)
This is important, what was the severity of the alarm and is it cleared or not?
This is not easily seen in the sharma draft.

This implies that the sharma draft alarm list is more of a notification log rather than an 
alarm-list that shows the current state. The vallin draft  alarm-list focuses on current state
of the alarms. Notifications represent state changes on the alarm state.

c) the service-affecting flag
This is superfluous. If the X733 severity levels are set correctly this is enough for
service-affecting or not. See the X733 definition of severity levels.
The vallin draft also has a leaf impacted-resources. In this leaf an alarm can refer to
affected/impacted services.

d) alarm-sequence number
Not needed, NETCONF has notification replay and uses SSH sessions.
Furthermore, since the sharma alarm list key is the identification of an alarm notification, why is this needed at all?
There has been these kinds of var-binds in SNMP Alarm MIBs as well, it was a bit flawed there as well: you could use informs instead of trap-pdus. How does this work with filtering mechanisms?
Do not try to do protocol stuff in the model.


e) House-keeping
Unclear how the list is managed, when are entries removed?


f) Other considerations:
- The vallin draft allows for more flexible resource (entity) identification, see below:

The primary mechanism is an instance-identifier so that a node in the data-tree can be referenced
The sharma draft uses  string/uri which is another domain than the module. A bit strange, say you have
an interface alarm, why should you not send an alarm on the path to interface in the data-model.

The Vallin draft also allows for other naming schemes, for example SNMP OIDs:
  typedef resource {
    type union {
      type instance-identifier {
        require-instance false;
      }
      type yang:object-identifier;
      type string;
    }


The alarm also has an optional leaf for referring to the alarming resource using an alternate naming scheme
        +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
           +--ro resource                      resource
           +--ro alarm-type-id                 alarm-type-id
           +--ro alarm-type-qualifier          alarm-type-qualifier
           +--ro alt-resource*                 resource

So the alarm can use both the instance-identifer and the SNMP OID for the alarming interface for example.


ITU-T
****************
First of all, I need to stress the fact that we carefully *did not* do the module design based on X.733.
There is a *separate*, optional, module to map to X733 which is still relevant for *some* management platforms.
So I do not really agree on that top-level comment.

But also note that G.7710 and G.806 are not replacing X.733. X.733 defines the information model
for *alarm reporting* along with the ASN.1 data definitions. G.7710 and G.806 defines the equipment fault
management functions that leads to alarm reporting at the end. The alarm list functions and alarm notifications
are not defined by G.7710 and G.806 to my knowledge.


G.806
Not so much related to general alarm management, this is more specifying instrumentation and 
behaviour for transmission alarm types like LOS, AIS etc.

G.7710
* Alarm Report Control maps to alarm shelving in the vallin alarm yang model. We could rename
  to ARC. Will do a detailed study if it maps 100% or if there is missing pieces.
* Alarm Severity Assignment Profile, configurable severity levels could be introduced. 
I doubt this being useful however. That is more for the management application itself rather than 
configuring on the device. But if you see requirements and scenarios for that, we can add it.


RFC7260
We need to make sure all requirements are covered. I think we are fine. 
Requirements on alarm filters, avoiding spurious alarms etc are covered

RFC5860
See above.


So in summary, there would be very little work to reference 
RFC7260, RFC5860, G.7710, G.806.
There are no big missing pieces, more on terminology especially ARC.

Stefan Vallin
stefan@wallan.se
+46705233262

> On 09 Oct 2017, at 11:28, Zhenghaomian <zhenghaomian@huawei.com> wrote:
> 
> Hello Martin, 
> 
> Great to see this work brought back, actually we are interested and looking forward to see this work progress in IETF 99 but there was no presentation and we failed to find any of the co-authors of this draft. 
> 
> We had similar idea about the alarm work and there was one draft published as https://tools.ietf.org/html/draft-sharma-netmod-fault-model-01 . we have reviewed both this draft and draft-vallin-netmod-alarm-module, and would like to drive on merging into one. It was observed that draft-vallin is more complete than draft-sharma, but the only problem is that the reference to ITU-T work is too old. We are also seeking for some support from ITU side, so that we can update the draft after there were agreement. 
> 
> Please feel free to comment if you have other ideas, thank you. 
> 
> Best wishes,
> Haomian
> 
> -----邮件原件-----
> 发件人: CCAMP [mailto:ccamp-bounces@ietf.org] 代表 Martin Bjorklund
> 发送时间: 2017年10月9日 17:14
> 收件人: ccamp@ietf.org
> 抄送: bclaise@cisco.com; draft-vallin-netmod-alarm-module@ietf.org
> 主题: [CCAMP] draft-vallin-netmod-alarm-module again
> 
> Hi,
> 
> We have received quite a lot of emails indicating support for the draft "draft-vallin-netmod-alarm-module".  People are asking why nothing happens with this draft.
> 
> It was decided that CCAMP might be the best place for this draft, but since then people have expressed that they rather want NETMOD to work on it.
> 
> I sent this email to CCAMP but never got any feedback:
> https://mailarchive.ietf.org/arch/msg/ccamp/f4qUNkpsT5dVF_1l9Ibd3hH1YA4/?qid=2187bb6a3443f12d15a29a03486f0754
> 
> It would be much appreciated if anyone in CCAMP that are also working on the generic alarm functionality in ITU could send feedback on this document.
> 
> The document can be found here:
> https://datatracker.ietf.org/doc/draft-vallin-netmod-alarm-module
> 
> 
> /martin
> 
> _______________________________________________
> CCAMP mailing list
> CCAMP@ietf.org
> https://www.ietf.org/mailman/listinfo/ccamp