[CCAMP] Review of draft-ietf-ccamp-alarm-module-01

Qin Wu <bill.wu@huawei.com> Thu, 10 May 2018 11:02 UTC

Return-Path: <bill.wu@huawei.com>
X-Original-To: ccamp@ietfa.amsl.com
Delivered-To: ccamp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 91FB512EACA for <ccamp@ietfa.amsl.com>; Thu, 10 May 2018 04:02:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z2EnGGQE9ssp for <ccamp@ietfa.amsl.com>; Thu, 10 May 2018 04:02:47 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [194.213.3.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 97F1C12EABE for <ccamp@ietf.org>; Thu, 10 May 2018 04:02:46 -0700 (PDT)
Received: from lhreml704-cah.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id 3ADE0B838BA6B for <ccamp@ietf.org>; Thu, 10 May 2018 12:02:42 +0100 (IST)
Received: from NKGEML412-HUB.china.huawei.com (10.98.56.73) by lhreml704-cah.china.huawei.com (10.201.108.45) with Microsoft SMTP Server (TLS) id 14.3.382.0; Thu, 10 May 2018 12:02:43 +0100
Received: from NKGEML513-MBS.china.huawei.com ([169.254.2.44]) by nkgeml412-hub.china.huawei.com ([10.98.56.73]) with mapi id 14.03.0382.000; Thu, 10 May 2018 19:02:35 +0800
From: Qin Wu <bill.wu@huawei.com>
To: "stefan@wallan.se" <stefan@wallan.se>, "ccamp@ietf.org" <ccamp@ietf.org>
CC: Zhangcuimin <zhangcuimin@huawei.com>, "Zhangmingyu (Jason)" <jason.zhangmingyu@huawei.com>
Thread-Topic: Review of draft-ietf-ccamp-alarm-module-01
Thread-Index: AdPoTmX9Drhd0LrSTrKDK45JJdiHhQ==
Date: Thu, 10 May 2018 11:02:34 +0000
Message-ID: <B8F9A780D330094D99AF023C5877DABA9AE169D1@nkgeml513-mbs.china.huawei.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.138.33.244]
Content-Type: multipart/alternative; boundary="_000_B8F9A780D330094D99AF023C5877DABA9AE169D1nkgeml513mbschi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/7-bV-3nOeJ92VRQJYcj2cMh2OyU>
Subject: [CCAMP] Review of draft-ietf-ccamp-alarm-module-01
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 May 2018 11:02:50 -0000

Hi, Stefan:
Thanks for the updated draft v-(01). I get a few time to re-read this draft. Here are a few comments and suggestions:

1.       Section 3.2
Section 3.2 provides two kind of concrete alarm types, one is to use YANG identity derived from base YANG identity to describe concrete alarm type, the second is to use derived YANG identity combing with string type identifier to describe concrete alarm type, however, I don't think these two kind of concrete alarm type can be supported at the same time inthis model, if you choose one, you should give up the other. Also in the examples of concrete-alarm-type:
"

     // Alternative 1: concrete alarm type identity
     import ietf-alarms {
       prefix al;
     }
     identity environmental-alarm {
       base al:alarm-type;
       description "Abstract alarm type";
     }
     identity smoke {
       base environmental-alarm;
       description "Concrete alarm type";
     }

     // Alternative 2: concrete alarm type qualifier
     import ietf-alarms {
       prefix al;
     }
     identity environmental-alarm {
       base al:alarm-type;
       description "Abstract alarm type";
     }
     identity external-detector {
       base environmental-alarm;
       description
         "Abstract alarm type, a run-time configuration
          procedure sets the type of alarm detected. This will
          be reported in the alarm-type-qualifier.";
     }

"
I don't see the clear difference between concrete alarm type identity in alternative 1 and concrete alarm type qualifier in alternative 2 from YANG language perspective.
In the alarm list:

      +--ro alarm* [resource alarm-type-id alarm-type-qualifier]

         +--ro resource                 resource

         +--ro alarm-type-id            alarm-type-id

         +--ro alarm-type-qualifier     alarm-type-qualifier
We can see alarm-type-id and alarm-type-qualifier are both mandatory and used as unique key index.
So it looks we go with the second choice to use both alarm-type identity and alarm-type-qualifier to describe concrete alarm type. Therefore I am not sure there is value to keep alarm-inventory

     +--ro alarm-inventory

        +--ro alarm-type* [alarm-type-id alarm-type-qualifier]

           +--ro alarm-type-id           alarm-type-id

           +--ro alarm-type-qualifier    alarm-type-qualifier

           +--ro resource*               resource-match

           +--ro has-clear               boolean

           +--ro severity-levels*        severity

           +--ro description             string


Since any new alarm type can be added at any time, not only at design time, but also in the running time.


2.       Section 4.1
I doubt we need alarm control parameters, since we have NETCONF Subtree Filter Components, why NETCONF subtree filter is not sufficient, why we should define additional alarm control mechanism to filter
Unnecessary alarm, to decide when to move filtered or alarmed in and when to move them out.
even we need this schemes, I think we can consider to define alarm control in the separate draft called alarm policy control,:), also it is not clear how do we distinguish the alarm that has been blocked or filtered and alarm that has been suppressed by the system automatically when the system detect the duplicate alarm?
If this is true, I think shelved alarm list should also been taken out.


3.       Is alarm list include the alarm that has been deleted since you introduce purge-alarm RPC support?

Apparently not, but since the alarm can be removed or deleted, why not count them in the alarm-summary?

Also it is not clear the difference between remove ,delete? I am wondering if we really need this functionality since

If the alarm can be managed, why we introduce too many human involvement, even it is administrator?


4.       How root cause resource is related to the resource associated with the current alarm in the alarm list? One to one relationship, many to one relationship?

5.       How impacted resource is related to the resource associated with the current alarm in the alarm list?

6.       Is there any relationship between impacted resource and alarms listed in the related alarm?

7.       How each alarm in the alarm list is related to alarms in the related-alarm? Which alarm will be impacted by which alarm, which alarm trigger another alarm? Is there nesting relationship between each other?

How does alarm and related alarm, impact resource, root-cause-resource help to diagnostic the network and identify the root cause of the problem? Can you provide an example to explain this?

8.       I am not sure we should introduce operator life cycle management or administrative life cycle management, it looks to me we need to rely on many manual provision or action, why not only focus on system generated alarm, system cleared alarmed, why we still look for traditional alarm management and require operator to engage and ack alarm, close alarm.

9.       Even though we still require operator life cycle management, I don't understand we need administrative alarm life cycle management? Since clear alarm can be re-raised by the system, therefore when the system clear alarm, or operator close alarm, these closed alarm or cleared alarm by system can be reraised again, so why we need administrative alarm life cycle management? I am struggling about this.

10.   What I like to hope is to only consider system generated alarm, the alarm in the alarm list can be classified into new raised alarm, cleared alarm, re-raised alarm,also alarm severity can be increased or decreased, how do we keep track of change of alarm severity changes? I know in the alarm list, we have alarm text to indicate which alarm is raised alarm,which alarm is cleared alarm, that will indicate whether alarm is active alarm or present alarm or the alarm that has been cleared by the system, but we may also interested in all the history alarm, e.g., in operator lifecycle management, the closed alarms are also history alarm, maybe shelved alarm are also history alarm? But I am not sure it is a good idea to introduce manual action related alarms in this model?

11.   As for common type resource,
"

     typedef resource {

       type union {

         type instance-identifier {

           require-instance false;

         }

         type yang:object-identifier;

         type string;

       }
"
I am wondering resource common type is extensible to support UUID which is also a common type defined in RFC6991?

12.   Regarding resource-match

     typedef resource-match {

       type union {

      type yang:xpath1.0;

         type yang:object-identifier;

         type string;

       }
I am wondering whether resource type or objected type need to be considered to add more fine granularity. Look forward to your answer to these questions, thanks very much.

-Qin