[CCAMP] Second review of draft-ietf-ccamp-alarm-module-01

Qin Wu <bill.wu@huawei.com> Sat, 21 July 2018 12:17 UTC

Return-Path: <bill.wu@huawei.com>
X-Original-To: ccamp@ietfa.amsl.com
Delivered-To: ccamp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5616130DC7 for <ccamp@ietfa.amsl.com>; Sat, 21 Jul 2018 05:17:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bRYJC3V22TiZ for <ccamp@ietfa.amsl.com>; Sat, 21 Jul 2018 05:17:10 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 66782128BAC for <ccamp@ietf.org>; Sat, 21 Jul 2018 05:17:10 -0700 (PDT)
Received: from LHREML711-CAH.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id 48C7BD37FCFCF for <ccamp@ietf.org>; Sat, 21 Jul 2018 13:17:06 +0100 (IST)
Received: from NKGEML414-HUB.china.huawei.com (10.98.56.75) by LHREML711-CAH.china.huawei.com (10.201.108.34) with Microsoft SMTP Server (TLS) id 14.3.399.0; Sat, 21 Jul 2018 13:17:06 +0100
Received: from NKGEML513-MBX.china.huawei.com ([169.254.1.110]) by nkgeml414-hub.china.huawei.com ([10.98.56.75]) with mapi id 14.03.0382.000; Sat, 21 Jul 2018 20:16:57 +0800
From: Qin Wu <bill.wu@huawei.com>
To: stefan vallin <stefan@wallan.se>
CC: "ccamp@ietf.org" <ccamp@ietf.org>
Thread-Topic: Second review of draft-ietf-ccamp-alarm-module-01
Thread-Index: AdQgn0ZSsaTuKMi2STS36VPAe6hr7g==
Date: Sat, 21 Jul 2018 12:16:57 +0000
Message-ID: <B8F9A780D330094D99AF023C5877DABA9AF5BDE8@nkgeml513-mbx.china.huawei.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.124.182.104]
Content-Type: multipart/alternative; boundary="_000_B8F9A780D330094D99AF023C5877DABA9AF5BDE8nkgeml513mbxchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/8s2-q2RZyYm4TmtZI6ItDngpVQg>
Subject: [CCAMP] Second review of draft-ietf-ccamp-alarm-module-01
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 21 Jul 2018 12:17:14 -0000

Hi, Stefan:
Before the next version of alarm model comes up, I would like to have the following suggestions and comments:

1.       UUID support for the type of resource under alarm list
Last time you said:
“

Good point, will consider adding it in the next revision.

However, there is a danger here in that developers might escape throwing UUIDs to operators. As an operator in a NOC it is hard to know what to do with a UUID.

In many cases UUID are a sign of using the alarms as a log/debug thing for developers.



typedef resource {

        type union {

          type instance-identifier {

            require-instance false;

          }

          type yang:object-identifier;

          type string;

        }
“
However in our implementation case, we did allow operator in a NOC to use UUID to correlate resource objects in the alarm-inventory, don’t we?


2.       Dependency between root-cause-resource, impacted-resource, related-alarm
Under alarm list, there are three dependent parameters: root-cause-resource, impacted-resource, related-alarm
It is still not clear to me how root-cause-resource, impacted-resource are used together with resource parameter under related-alarm, why root-cause-resource and impact-resource not part of related-alarm.
If the answer is no, for root-cause-resource leaf-list, I am wondering why not add is-root-cause parameter to indicate a specific alarm under alarm list is root cause alarm. Only when is-root-cause is set to true, then root-cause-resource will be provided. Does this make sense?
In our practice, we usually design one root cause alarm and several derived alarms, the derived alarm will use leafref to point to root cause alarm, I am wondering whether we assume each alarm under alarm list is root cause alarm and Related-alarm are derived alarms. If the answer is no, I think we should one new parameter under related-alarm list to reference to the root cause alarm.


3.       Consolidate tuple corresponding to a single alarm instance into pair
This YANG alarm module uses the tuple (resource, alarm type identifier, alarm type qualifier)to identify a single alarm instance. I am wondering whether the tuple can be reduced into (resource, alarm-type identifier), allow alarm-type identifier support a union of identity and string. The reason for that is inherit base identity for alarm-type-identifier to get a bunch of derived identity is not sufficient when alarm-type can be fine granularity classified into hundreds type.


4.       Semantics difference between description under alarm-inventory and alarm-text nder alarm list

See description definition and alarm-text definition as follows:

“

description:A description of the possible alarm.  It SHOULD include information on possible underlying root causes and corrective actions.

alarm-text:The string used to inform operators about the alarm. This MUST contain enough information for an operator to be able to understand the problem and how to resolve it.  If this string contains structure, this format should be clearly documented for programs to be able to parse that information.
   “
   I am not sure any semantics difference between description and alarm-text, why not replace one with another? Or we can further broke down description/alarm-text into root-cause and corrective-actions. I believe they are key information we want to convey through description/alarm-text.


5.       Alarm arrive time support
Under operator-state-change, we have time parameter to represent Timestamp for operator action on alarm, I am wondering do we need to add alarm-arrive-time to represent the time when alarm arrive at the management system.
It is useful information for the alarm management.


6.       Alarm-name field support for alarm and alarm inventory
In the current model, each alarm under alarm list is uniquely identified by three leaf key (resource, alarm type identifier, alarm type qualifier),would it more desirable to define a single leaf key, e.g., add alarm name or alarm-no to uniquely identify each alarm? That will simplify the alarm management from the management system perspective. Make sense?


7.       Reason-id support for alarm list and alarm inventory
In the current model, is root cause resource is the reason to generate each alarm? If not, I propose to add reason-id for each alarm under alarm list and alarm inventory.


8.       Alarm generating device or location support for alarm list and alarm inventory
In the current model, it seems the resource type can potentially indicate the device or location where the alarm is generated, but not explicitly. I am wondering why not add alarm-generating-device and alarm-generating-location two parameters to explicitly indicate the device or location where the alarm is generated, that will simplify alarm management, make sense?


9.       Alarm notification category support
In the current model, alarm notification is defined as follows:
“

This notification is used to report a state change for an alarm. The same notification is used for reporting a newly raised alarm, a cleared alarm or changing the text and/or
severity of an existing alarm.

”
However it is not clear how to distinguish alarm notification for newly reaised alarm from alarm notification for a cleared alarm. Would it be more sensible to add alarm notification category support something as follows:
“

leaf category {

         type enumeration {

           enum fault {

             description

               "Alarm raised.";

           }

           enum recovery {

             description

               "Alarm cleared.";

           }

           enum Change {

             description

               "Alarm changed.";

           }

         }
”

10.   Consistency between alarm list construct and alarm notification construct
We see the difference between alarm list construct and alarm notification construct is operator action defined under alarm notification construct and operator state change under alarm list construct.
As specified in RFC7950,
“
An action MUST NOT be defined within an rpc, another action, or a
   notification
”
I am not sure action can be allowed within alarm-notification construct, in that case, I would propose to remove operator action from alarm notification construct.
In addition, the operator parameter under operator-state-change can be removed or consolidated into set-operator-state action.


11.   Additionalinfo support for alarm list
I think we should allow vendor specific extension to be added as part of alarm list, the vendor specific extension can be defined in TLV format.


12.   Alarm-no support for set-operator-state
If we believe set-operator-state is useful action under alarm list. I am wondering if we can add alarm-no or alarm-name to identify each alarm under set-operator-state. This will help a lot for alarm ack operation based on each alarm number.


13.   Is-acked for alarm list
Since we have is-cleared parameter under alarm list to indicate the current clearance state of the alarm, why not add is-acked parameter under alarm list to indicate the current acked state of the alarm, make sense?

-Qin