Re: [CCAMP] Second review of draft-ietf-ccamp-alarm-module-01

Qin Wu <> Tue, 14 August 2018 12:32 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D0B5512DD85 for <>; Tue, 14 Aug 2018 05:32:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id PhCYUYKdKxBd for <>; Tue, 14 Aug 2018 05:32:38 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 8A02C127598 for <>; Tue, 14 Aug 2018 05:32:37 -0700 (PDT)
Received: from (unknown []) by Forcepoint Email with ESMTP id 2E4CC8B0C21EF for <>; Tue, 14 Aug 2018 13:32:34 +0100 (IST)
Received: from ( by ( with Microsoft SMTP Server (TLS) id 14.3.399.0; Tue, 14 Aug 2018 13:32:33 +0100
Received: from ([]) by ([]) with mapi id 14.03.0399.000; Tue, 14 Aug 2018 20:32:30 +0800
From: Qin Wu <>
To: stefan vallin <>
CC: "" <>
Thread-Topic: Second review of draft-ietf-ccamp-alarm-module-01
Date: Tue, 14 Aug 2018 12:32:29 +0000
Message-ID: <>
References: <> <> <> <> <> <> <>
In-Reply-To: <>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
x-originating-ip: []
Content-Type: multipart/alternative; boundary="_000_B8F9A780D330094D99AF023C5877DABA9AFA574Enkgeml513mbschi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <>
Subject: Re: [CCAMP] Second review of draft-ietf-ccamp-alarm-module-01
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Discussion list for the CCAMP working group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 14 Aug 2018 12:32:44 -0000

Hi, Stefan:
Thanks for quick response, please see followup comments below.
发件人: stefan vallin []
发送时间: 2018年8月10日 22:38
收件人: Qin Wu
主题: Re: Second review of draft-ietf-ccamp-alarm-module-01


On 9 Aug 2018, at 04:54, Qin Wu <<>> wrote:

Thank for your update in v-(02)
Why not have a generic model applicable to both controller and the device, I see this model as alarm monitoring framework. Also this draft said in the introduction:
   The purpose is to define a standardised alarm interface for network
   devices that can be easily integrated into management applications.
   The model is also applicable as a northbound alarm interface in the
   management applications.

Yes :) and it works. But I appreciate you coming back to the topic. I have not been clear on the details
The most important thing is how to handle the reference to the alarming resource within a device when the alarm model is used in the controller.
Then the resource must somehow also include the reference to the device.

I am working on a detailed clear answer to this, stay tuned...

[Qin]: One way to handle the reference to the alarming resource is add Alarm-name or alarm-serial-no as one field of alarm list.
So alarm-name or alarm-serial-no can be seen as alias of 3 tuple (resource, alarm-type-id, alarm-type-qualifier).

In addition, I believe you haven’t touched my followup comments posted at:
which are not controller support specific comment, appreciate your response to those comments.
4 issues highlighted below:

1.  Alarm-type-id supports union of identity and string

I know defining alarm-type-id as identity make alarm-type-id is more extensible, but waste more space than using enum.

I am wondering why not define alarm-type-id as uint32 or string with embedded format such as groupid-alarmid(e.g., ”2310-36700394”), this will help manage millions of alarm types easier.

Defining alarm-type-id as identity seems wasting a lot of space and hard to deal with millions of alarm type in the design time since Enumerating each of them require human to enter all of alarm types in yang file.
A) a flat enum does not work globally across enterprises and organisations, see ITU failure with probable cause
B) Millons of alarm types ??? No that will not happen
[Qin]: That’s the reality we are facing.(:-
C) uint32, that is meaningless for operators
[Qin]: That’s why we should have both alarm-name and alarm-serial-no, alarm-name provide meaning for operators.
D) string, that will result in surprises for operators, developers will introduce strings in their code that suddenly shows up in the NOC.
[Qin]: The essence of alarm-type-qualifier is string qualifier, so you believe introduce alarm-type-qualifier will result in surprises for operators as well??
E) I do not get your last comment ”require human to enter all alarm types in yang file”.
     You have to design which alarm types your system has, that should not come as.a surprise to the operator.
[Qin]: Enter 2 million alarm type in YANG file is challenging to human.
There are several benefits of hierarchical identities for alarm types:
- Alarm types can be parsed from YANG modules
- You can reason about “abstract” alarm types
- Extensibility, enterprises and organisations can extend previous identities

2.  Alarm-name or alarm-serial-no field support for alarm and alarm inventory

Suppose we have alarm-name or alarm-serial-no, I believe it is more easier to based on one field rather than 3 tuple(resource, alarm-type-id, alarm-type-qualifier) to identify each alarm instance,

The most important is this will simplify operation and management.
I think that
(GigabitEthernet0/15, link-alarm, “")

Tells more than:

[Qin]: The limitation of 3 tuple is when the same alarm identified by (GigabitEthernet0/15, link-alarm, “")
is raised again, (GigabitEthernet0/15, link-alarm, “")can not be used to distinguish first raised alarm and second raised alarm.
By introducing unsigned integer type alarm-serial-no and string type alarm-name, this issue can be solved.

[Qin]: If you correlate alarm instance with alarm name or alarm-serial-no, it will be easier to look up each alarm instance based on alarm-name or alarm-serial-no than using 3 tuple(resource, alarm-type-id, alarm-type-qualifier).

3.  Alarm notification category support

Do we based on’ is-cleared’ and ‘status-change’ field to tell The same notification is used for reporting a newly raised alarm, a cleared alarm or changing the text?

How do we know the notification is used for newly raised alarm is not clear to me, since we don’t have raised field.
A) You have your stateful alarm list in your controller
B) You get a notification:
       leaf perceived-severity {
         type severity-with-clear;
This tells you the (new) severity state
- So if this is clear the alarm is cleared.
- If you do not have an entry for the key, it is a new alarm
[Qin]: Don’t understand this, can you provide an example to explain this.

- Well if the only thing that is different from your entry is the text, the text has changed...

4.  Consistency between alarm list construct and alarm notification construct
Why alarm notification can not be used to notify the time when this alarm entry was created rather than just the time when alarm status is changed?
The first entry in the status-change list represents the first state change for the alarm, “creation”.
Note however that there is no absolute truth here over time.
Alarm entries might be deleted by house-keeping or admin purposes. What is creation?

[Qin]: I feel confused, time-created leaf within alarm-list and time leaf within alarm-notification are defines as follows separately:

         leaf time-created {
             type yang:date-and-time;
             mandatory true;
               "The time-stamp when this alarm entry was created. This
                represents the first time the alarm appeared

       leaf time {
         type yang:date-and-time;
         mandatory true;
           "The time the status of the alarm changed.  The value
            represents the time the real alarm state change appeared
            in the resource and not when it was added to the
            alarm list. The /alarm-list/alarm/last-changed MUST be
            set to the same value.";
could you clarify their difference. My impression time-created is for newly raised alarm, the leaf time is applied to both newly raised alarm and alarm with severity change. Is leaf time applicable to cleared alarm is not clear to me. Would it be great to add category parameter within alarm notification to explicitly distinguish one another:

leaf category {

         type enumeration {

           enum raised {


               "Alarm raised in case of fault.";


           enum cleared {


               "Alarm cleared in case of recovery.";


           enum Change {


               " changing the text and/or

         severity of an existing alarm.";



Make sense?

Why alarm notification can not be used to notify whether the alarm is cleared or not?
See above
[Qin]: So you should add is-cleared parameter within alarm notification to indicate this,Right?
In the current alarm notification, there is no is-cleared parameter.

To address this, the proposal is to make Consistency between alarm list construct and alarm notification construct, make sense?
[Qin]: For consistency, I think time-created leaf within alarm-list and time leaf within alarm-notification should be aligned.
In addition, is-cleared parameter should be added into alarm notification.

Best regards!

发件人: stefan vallin []
发送时间: 2018年8月9日 1:36
收件人: Qin Wu
主题: Re: Second review of draft-ietf-ccamp-alarm-module-01

Sorry for slow response!
Thanks again for your comments.
The larger scope the more complexity.
I think it is important to prove the model in the scope of a NE/device first. Then extend with requirements for the controller/mid-level manager in a later revision or a separate augmenting module.
I am also convinced that the current model works as a base for the controller based on implementation experience. We had some more leafs in the controller than in the device.

So in summary, I would like to progress this to an RFC targeting the NE scope in a first step before adding more features targeting the controller.
Br Stefan

On 23 Jul 2018, at 11:39, Qin Wu <<>> wrote:

Are you saying the controller model should be different from device model or the model in the southbound interface of the controller should be different from the model used in northbound interface of the network device?
Or the model used in northbound interface of the controller should be different from one used in the northbound interface of the network device?
Why not have one generic model which can be applied to both southbound and northbound interfaces?

发件人: stefan vallin []
发送时间: 2018年7月23日 2:37
收件人: Qin Wu;<>
主题: Re: Second review of draft-ietf-ccamp-alarm-module-01

Hi again!
Addition to #8
You could augment with a device leaf in your mgmt app.

The module scope is within one device primarily

Br stefan
Mvh stefan

22 juli 2018 kl. 20:17 skrev stefan vallin <<>>:
Hi Qin!
Thanks for your review and comments, see inline below:

On 21 Jul 2018, at 14:16, Qin Wu <<>> wrote:

Hi, Stefan:
Before the next version of alarm model comes up, I would like to have the following suggestions and comments:
1.       UUID support for the type of resource under alarm list
Last time you said:
Good point, will consider adding it in the next revision.
However, there is a danger here in that developers might escape throwing UUIDs to operators. As an operator in a NOC it is hard to know what to do with a UUID.
In many cases UUID are a sign of using the alarms as a log/debug thing for developers.

typedef resource {
        type union {
          type instance-identifier {
            require-instance false;
          type yang:object-identifier;
          type string;
However in our implementation case, we did allow operator in a NOC to use UUID to correlate resource objects in the alarm-inventory, don’t we?
We have added UUID to the upcoming version:
  typedef resource {
    type union {
      type instance-identifier {
        require-instance false;
      type yang:object-identifier;
      type yang:uuid;
      type string;

Resource-match is also updated to handle UUIDs.

2.       Dependency between root-cause-resource, impacted-resource, related-alarm
Under alarm list, there are three dependent parameters: root-cause-resource, impacted-resource, related-alarm
It is still not clear to me how root-cause-resource, impacted-resource are used together with resource parameter under related-alarm, why root-cause-resource and impact-resource not part of related-alarm.
If the answer is no, for root-cause-resource leaf-list, I am wondering why not add is-root-cause parameter to indicate a specific alarm under alarm list is root cause alarm. Only when is-root-cause is set to true, then root-cause-resource will be provided. Does this make sense?
In our practice, we usually design one root cause alarm and several derived alarms, the derived alarm will use leafref to point to root cause alarm, I am wondering whether we assume each alarm under alarm list is root cause alarm and Related-alarm are derived alarms. If the answer is no, I think we should one new parameter under related-alarm list to reference to the root cause alarm.
We have updated the test in the RFC document on this topic:
3.6.  Root Cause, Impacted Resources and Related Alarms

   The general principle of this alarm module is to limit the amount of
   alarms.  The alarm has two leaf-lists to identify possible impacted
   resources and possible root-cause resources.  The system should not
   represent individual alarms for the possible root-cause resources and
   impacted resources.  These serves as hints only.  It is up to the
   client application to use this information to present the overall

   A system should always strive to identify the resource that can be
   acted upon as the "resource" leaf.  The "impacted-resource" leaf-list
   shall be used to identify any side-effects of the alarm.  The
   impacted resources can not be acted upon to fix the problem.  An
   example of this kind of alarm might be a disc full problem which
   impacts a number of databases.

   In some occasions the system might not be capable of detecting the
   root cause, the resource that can be acted upon.  The instrumentation
   in this case only monitors the side-effect and needs to represent an
   alarm that indicates a situation that needs acting upon.  The
   instrumentation still might identify possible candidates for the
   root-cause resource.  In this case the "root-cause-resource" leaf-
   list can be used to indicate the candidate root-cause resources.  An
   example of this kind of alarm might be an active test tool that
   detects an SLA violation on a VPN connection and identifies the
   devices along the chain as candidate root causes.

   The alarm module also supports a way to associate different alarms to
   each other with the "related-alarm" list.  This list enables the
   server to inform the client that certain alarms are related to other

   Note well that this module does not prescribe any dependencies or
   preference between the above alarm correlation mechanisms.  Different
   systems have different capabilities and the above described
   mechanisms are available to support the instrumentation features.

3.       Consolidate tuple corresponding to a single alarm instance into pair
This YANG alarm module uses the tuple (resource, alarm type identifier, alarm type qualifier)to identify a single alarm instance. I am wondering whether the tuple can be reduced into (resource, alarm-type identifier), allow alarm-type identifier support a union of identity and string. The reason for that is inherit base identity for alarm-type-identifier to get a bunch of derived identity is not sufficient when alarm-type can be fine granularity classified into hundreds type.

No that will not work, read the text in the RFC document, alarm type identifier Is static design-time, qualifier is runtime and a refinement of the alarm-type identifier.
See updated text in the upcoming version of the RFC:
3.2.  Alarm Type

   This document defines an alarm type with an alarm type id and an
   alarm type qualifier.

   The alarm type id is modeled as a YANG identity.  With YANG
   identities, new alarm types can be defined in a distributed fashion.
   YANG identities are hierarchical, which means that an hierarchy of
   alarm types can be defined.

   Standards and vendors should define their own alarm type identities
   based on this definition.
   The use of YANG identities means that all possible alarms are
   identified at design time.  This explicit declaration of alarm types
   makes it easier to allow for alarm qualification reviews and
   preparation of alarm actions and documentation.

   There are occasions where the alarm types are not known at design
   time.  For example, a system with digital inputs that allows users to
   connects detectors (e.g., smoke detector) to the inputs.  In this
   case it is a configuration action that says that certain connectors
   are fire alarms for example.  A potential drawback of this is that
   there is a big risk that alarm operators will receive alarm types as
   a surprise, they do not know how to resolve the problem since a
   defined alarm procedure does not necessarily exist.  To avoid this
   risk the system MUST publish all possible alarm types in the alarm
   inventory, see Section 4.2.

   In order to allow for dynamic addition of alarm types the alarm
   module also allows for further qualification of the identity based
   alarm type using a string.

   A vendor or standard can then define their own alarm-type hierarchy.
   The example below shows a hierarchy based on X.733 event types:

     import ietf-alarms {
       prefix al;
     identity vendor-alarms {
       base al:alarm-type;
     identity communications-alarm {
       base vendor-alarms;
     identity link-alarm {
       base communications-alarm;

   Alarm types can be abstract.  An abstract alarm type is used as a
   base for defining hierarchical alarm types.  Concrete alarm types are
   used for alarm states and appear in the alarm inventory.  There are
   two kinds of concrete alarm types:

   1.  The last subordinate identity in the "alarm-type-id" hierarchy is
       concrete, for example: "alarm-identity.environmental-
       alarm.smoke".  In this example "alarm-identity" and
       "environmental-alarm" are abstract YANG identities, whereas
       "smoke" is a concrete YANG identity.

Vallin & Bjorklund      Expires January 11, 2019                [Page 6]
Internet-Draft              YANG Alarm Module                  July 2018

   2.  The YANG identity hierarchy is abstract and the concrete alarm
       type is defined by the dynamic alarm qualifier string, for
       example: "alarm-identity.environmental-alarm.external-detector"
       with alarm-type-qualifier "smoke".

   For example:

     // Alternative 1: concrete alarm type identity
     import ietf-alarms {
       prefix al;
     identity environmental-alarm {
       base al:alarm-type;
       description "Abstract alarm type";
     identity smoke {
       base environmental-alarm;
       description "Concrete alarm type";

     // Alternative 2: concrete alarm type qualifier
     import ietf-alarms {
       prefix al;
     identity environmental-alarm {
       base al:alarm-type;
       description "Abstract alarm type";
     identity external-detector {
       base environmental-alarm;
         "Abstract alarm type, a run-time configuration
          procedure sets the type of alarm detected. This will
          be reported in the alarm-type-qualifier.";

   A server SHOULD strive to minimize the number of dynamically defined
   alarm types.

4.       Semantics difference between description under alarm-inventory and alarm-text nder alarm list
See description definition and alarm-text definition as follows:
description:A description of the possible alarm.  It SHOULD include information on possible underlying root causes and corrective actions.
alarm-text:The string used to inform operators about the alarm. This MUST contain enough information for an operator to be able to understand the problem and how to resolve it.  If this string contains structure, this format should be clearly documented for programs to be able to parse that information.
   I am not sure any semantics difference between description and alarm-text, why not replace one with another? Or we can further broke down description/alarm-text into root-cause and corrective-actions. I believe they are key information we want to convey through description/alarm-text.
Alarm description is dynamic/run-time, conveys relevant information for the specific alarm state change.
Description in the inventory is static, cannot convey dynamic state change information

5.       Alarm arrive time support
Under operator-state-change, we have time parameter to represent Timestamp for operator action on alarm, I am wondering do we need to add alarm-arrive-time to represent the time when alarm arrive at the management system.
It is useful information for the alarm management.
The alarm has a leaf representing the real time the state change appeared:
    +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
       +--ro last-changed               yang:date-and-time
       +--ro status-change* [time]
          +--ro time                    yang:date-and-time
This should represent the time it really happened. Not the time the notification arrived at the management system. If you need that, that is something you can add in your mgmt system.

6.       Alarm-name field support for alarm and alarm inventory
In the current model, each alarm under alarm list is uniquely identified by three leaf key (resource, alarm type identifier, alarm type qualifier),would it more desirable to define a single leaf key, e.g., add alarm name or alarm-no to uniquely identify each alarm? That will simplify the alarm management from the management system perspective. Make sense?
A string no…
This is a fundamental design principle in the alarm module. The key, the tuple, carries semantic information, there is no doubt how to match notifications to the alarm state.
3GPP Alarm IRP, for example, introduced a confusing single key alarmId key which created paradoxes,
if you have different alarmIds but for the the same alarmtype and resource, what does it mean?

7.       Reason-id support for alarm list and alarm inventory
In the current model, is root cause resource is the reason to generate each alarm? If not, I propose to add reason-id for each alarm under alarm list and alarm inventory.
See answer to #2

8.       Alarm generating device or location support for alarm list and alarm inventory
In the current model, it seems the resource type can potentially indicate the device or location where the alarm is generated, but not explicitly. I am wondering why not add alarm-generating-device and alarm-generating-location two parameters to explicitly indicate the device or location where the alarm is generated, that will simplify alarm management, make sense?

I guess you are considering a management application and not the device?
The resource is a leafier which could/should include the device in your model in your management application.

9.       Alarm notification category support
In the current model, alarm notification is defined as follows:
This notification is used to report a state change for an alarm. The same notification is used for reporting a newly raised alarm, a cleared alarm or changing the text and/or
severity of an existing alarm.

However it is not clear how to distinguish alarm notification for newly reaised alarm from alarm notification for a cleared alarm. Would it be more sensible to add alarm notification category support something as follows:
leaf category {
         type enumeration {
           enum fault {
               "Alarm raised.";
           enum recovery {
               "Alarm cleared.";
           enum Change {
               "Alarm changed.";
Not needed, this is obvious when you map the notification towards the key tuple.

10.   Consistency between alarm list construct and alarm notification construct
We see the difference between alarm list construct and alarm notification construct is operator action defined under alarm notification construct and operator state change under alarm list construct.
As specified in RFC7950,
An action MUST NOT be defined within an rpc, another action, or a
I am not sure action can be allowed within alarm-notification construct, in that case, I would propose to remove operator action from alarm notification construct.
In addition, the operator parameter under operator-state-change can be removed or consolidated into set-operator-state action.
I do not understand
The action is not defined in the notification.

11.   Additionalinfo support for alarm list
I think we should allow vendor specific extension to be added as part of alarm list, the vendor specific extension can be defined in TLV format.
The alarm module does not restrict any vendor additions, better to use augmentation.

12.   Alarm-no support for set-operator-state
If we believe set-operator-state is useful action under alarm list. I am wondering if we can add alarm-no or alarm-name to identify each alarm under set-operator-state. This will help a lot for alarm ack operation based on each alarm number.
See above

13.   Is-acked for alarm list
Since we have is-cleared parameter under alarm list to indicate the current clearance state of the alarm, why not add is-acked parameter under alarm list to indicate the current acked state of the alarm, make sense?
You can get that from the operator-state-change list.

Br Stefan