Re: [CCAMP] Second review of draft-ietf-ccamp-alarm-module-01

stefan vallin <stefan@wallan.se> Wed, 08 August 2018 17:36 UTC

From: stefan vallin <stefan@wallan.se>
Message-Id: <734639AA-E2B4-493A-81D6-2F80D4192883@wallan.se>
Content-Type: multipart/alternative; boundary="Apple-Mail=_A7DF9315-9BBA-4E69-80AD-9648F302B76E"
Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Date: Wed, 08 Aug 2018 19:36:01 +0200
In-Reply-To: <B8F9A780D330094D99AF023C5877DABA9AF74602@nkgeml513-mbs.china.huawei.com>
Cc: "ccamp@ietf.org" <ccamp@ietf.org>
To: Qin Wu <bill.wu@huawei.com>
References: <B8F9A780D330094D99AF023C5877DABA9AF5BDE8@nkgeml513-mbx.china.huawei.com> <E597E310-27B8-4091-89BB-F510CE1AC3C0@wallan.se> <50582C88-3BC2-450F-B761-E61310AABFB4@wallan.se> <B8F9A780D330094D99AF023C5877DABA9AF74602@nkgeml513-mbs.china.huawei.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/859fKdqCzF37wzaKJvQDepraNYk>
Subject: Re: [CCAMP] Second review of draft-ietf-ccamp-alarm-module-01
Precedence: list

Hi!
Sorry for slow response!
Thanks again for your comments.
The larger scope the more complexity. 
I think it is important to prove the model in the scope of a NE/device first. Then extend with requirements for the controller/mid-level manager in a later revision or a separate augmenting module.
I am also convinced that the current model works as a base for the controller based on implementation experience. We had some more leafs in the controller than in the device.

So in summary, I would like to progress this to an RFC targeting the NE scope in a first step before adding more features targeting the controller.
Br Stefan



> On 23 Jul 2018, at 11:39, Qin Wu <bill.wu@huawei.com> wrote:
> 
> Are you saying the controller model should be different from device model or the model in the southbound interface of the controller should be different from the model used in northbound interface of the network device?
> Or the model used in northbound interface of the controller should be different from one used in the northbound interface of the network device?
> Why not have one generic model which can be applied to both southbound and northbound interfaces?
>  
> -Qin
> 发件人: stefan vallin [mailto:stefan@wallan.se <mailto:stefan@wallan.se>] 
> 发送时间: 2018年7月23日 2:37
> 收件人: Qin Wu; ccamp@ietf.org <mailto:ccamp@ietf.org>
> 主题: Re: Second review of draft-ietf-ccamp-alarm-module-01
>  
> Hi again!
> Addition to #8
> You could augment with a device leaf in your mgmt app.
>  
> The module scope is within one device primarily
>  
> Br stefan
> 
> Mvh stefan
> +46(0)705233262
> 
> 22 juli 2018 kl. 20:17 skrev stefan vallin <stefan@wallan.se <mailto:stefan@wallan.se>>:
> 
> Hi Qin!
> Thanks for your review and comments, see inline below:
> 
> 
> On 21 Jul 2018, at 14:16, Qin Wu <bill.wu@huawei.com <mailto:bill.wu@huawei.com>> wrote:
>  
> Hi, Stefan:
> Before the next version of alarm model comes up, I would like to have the following suggestions and comments:
> 1.       UUID support for the type of resource under alarm list
> Last time you said:
> “
> Good point, will consider adding it in the next revision.
> However, there is a danger here in that developers might escape throwing UUIDs to operators. As an operator in a NOC it is hard to know what to do with a UUID.
> In many cases UUID are a sign of using the alarms as a log/debug thing for developers.
>  
> typedef resource {
>         type union {
>           type instance-identifier {
>             require-instance false;
>           }
>           type yang:object-identifier;
>           type string;
>         }
> “
> However in our implementation case, we did allow operator in a NOC to use UUID to correlate resource objects in the alarm-inventory, don’t we?
> We have added UUID to the upcoming version:
>   typedef resource {
>     type union {
>       type instance-identifier {
>         require-instance false;
>       }
>       type yang:object-identifier;
>       type yang:uuid;
>       type string;
>     }
>  
> Resource-match is also updated to handle UUIDs.
>  
>  
>  
> 
> 
>  
> 2.       Dependency between root-cause-resource, impacted-resource, related-alarm
> Under alarm list, there are three dependent parameters: root-cause-resource, impacted-resource, related-alarm
> It is still not clear to me how root-cause-resource, impacted-resource are used together with resource parameter under related-alarm, why root-cause-resource and impact-resource not part of related-alarm.
> If the answer is no, for root-cause-resource leaf-list, I am wondering why not add is-root-cause parameter to indicate a specific alarm under alarm list is root cause alarm. Only when is-root-cause is set to true, then root-cause-resource will be provided. Does this make sense?
> In our practice, we usually design one root cause alarm and several derived alarms, the derived alarm will use leafref to point to root cause alarm, I am wondering whether we assume each alarm under alarm list is root cause alarm and Related-alarm are derived alarms. If the answer is no, I think we should one new parameter under related-alarm list to reference to the root cause alarm.
> We have updated the test in the RFC document on this topic:
> 3.6.  Root Cause, Impacted Resources and Related Alarms
>  
>    The general principle of this alarm module is to limit the amount of
>    alarms.  The alarm has two leaf-lists to identify possible impacted
>    resources and possible root-cause resources.  The system should not
>    represent individual alarms for the possible root-cause resources and
>    impacted resources.  These serves as hints only.  It is up to the
>    client application to use this information to present the overall
>    status.
>  
>    A system should always strive to identify the resource that can be
>    acted upon as the "resource" leaf.  The "impacted-resource" leaf-list
>    shall be used to identify any side-effects of the alarm.  The
>    impacted resources can not be acted upon to fix the problem.  An
>    example of this kind of alarm might be a disc full problem which
>    impacts a number of databases.
>  
>    In some occasions the system might not be capable of detecting the
>    root cause, the resource that can be acted upon.  The instrumentation
>    in this case only monitors the side-effect and needs to represent an
>    alarm that indicates a situation that needs acting upon.  The
>    instrumentation still might identify possible candidates for the
>    root-cause resource.  In this case the "root-cause-resource" leaf-
>    list can be used to indicate the candidate root-cause resources.  An
>    example of this kind of alarm might be an active test tool that
>    detects an SLA violation on a VPN connection and identifies the
>    devices along the chain as candidate root causes.
>  
>    The alarm module also supports a way to associate different alarms to
>    each other with the "related-alarm" list.  This list enables the
>    server to inform the client that certain alarms are related to other
>    alarms.
>  
>    Note well that this module does not prescribe any dependencies or
>    preference between the above alarm correlation mechanisms.  Different
>    systems have different capabilities and the above described
>    mechanisms are available to support the instrumentation features.
> 
> 
>  
> 3.       Consolidate tuple corresponding to a single alarm instance into pair
> This YANG alarm module uses the tuple (resource, alarm type identifier, alarm type qualifier)to identify a single alarm instance. I am wondering whether the tuple can be reduced into (resource, alarm-type identifier), allow alarm-type identifier support a union of identity and string. The reason for that is inherit base identity for alarm-type-identifier to get a bunch of derived identity is not sufficient when alarm-type can be fine granularity classified into hundreds type.
>  
> No that will not work, read the text in the RFC document, alarm type identifier Is static design-time, qualifier is runtime and a refinement of the alarm-type identifier.
> See updated text in the upcoming version of the RFC:
> 3.2.  Alarm Type
>  
>    This document defines an alarm type with an alarm type id and an
>    alarm type qualifier.
>  
>    The alarm type id is modeled as a YANG identity.  With YANG
>    identities, new alarm types can be defined in a distributed fashion.
>    YANG identities are hierarchical, which means that an hierarchy of
>    alarm types can be defined.
>  
>    Standards and vendors should define their own alarm type identities
>    based on this definition.
>    The use of YANG identities means that all possible alarms are
>    identified at design time.  This explicit declaration of alarm types
>    makes it easier to allow for alarm qualification reviews and
>    preparation of alarm actions and documentation.
>  
>    There are occasions where the alarm types are not known at design
>    time.  For example, a system with digital inputs that allows users to
>    connects detectors (e.g., smoke detector) to the inputs.  In this
>    case it is a configuration action that says that certain connectors
>    are fire alarms for example.  A potential drawback of this is that
>    there is a big risk that alarm operators will receive alarm types as
>    a surprise, they do not know how to resolve the problem since a
>    defined alarm procedure does not necessarily exist.  To avoid this
>    risk the system MUST publish all possible alarm types in the alarm
>    inventory, see Section 4.2.
>  
>    In order to allow for dynamic addition of alarm types the alarm
>    module also allows for further qualification of the identity based
>    alarm type using a string.
>  
>    A vendor or standard can then define their own alarm-type hierarchy.
>    The example below shows a hierarchy based on X.733 event types:
>  
>      import ietf-alarms {
>        prefix al;
>      }
>      identity vendor-alarms {
>        base al:alarm-type;
>      }
>      identity communications-alarm {
>        base vendor-alarms;
>      }
>      identity link-alarm {
>        base communications-alarm;
>      }
>  
>    Alarm types can be abstract.  An abstract alarm type is used as a
>    base for defining hierarchical alarm types.  Concrete alarm types are
>    used for alarm states and appear in the alarm inventory.  There are
>    two kinds of concrete alarm types:
>  
>    1.  The last subordinate identity in the "alarm-type-id" hierarchy is
>        concrete, for example: "alarm-identity.environmental-
>        alarm.smoke".  In this example "alarm-identity" and
>        "environmental-alarm" are abstract YANG identities, whereas
>        "smoke" is a concrete YANG identity.
>  
>  
>  
>  
>  
> Vallin & Bjorklund      Expires January 11, 2019                [Page 6]
> Internet-Draft              YANG Alarm Module                  July 2018
>  
>  
>    2.  The YANG identity hierarchy is abstract and the concrete alarm
>        type is defined by the dynamic alarm qualifier string, for
>        example: "alarm-identity.environmental-alarm.external-detector"
>        with alarm-type-qualifier "smoke".
>  
>    For example:
>  
>      // Alternative 1: concrete alarm type identity
>      import ietf-alarms {
>        prefix al;
>      }
>      identity environmental-alarm {
>        base al:alarm-type;
>        description "Abstract alarm type";
>      }
>      identity smoke {
>        base environmental-alarm;
>        description "Concrete alarm type";
>      }
>  
>      // Alternative 2: concrete alarm type qualifier
>      import ietf-alarms {
>        prefix al;
>      }
>      identity environmental-alarm {
>        base al:alarm-type;
>        description "Abstract alarm type";
>      }
>      identity external-detector {
>        base environmental-alarm;
>        description
>          "Abstract alarm type, a run-time configuration
>           procedure sets the type of alarm detected. This will
>           be reported in the alarm-type-qualifier.";
>      }
>  
>    A server SHOULD strive to minimize the number of dynamically defined
>    alarm types.
>  
> 
> 
>  
> 4.       Semantics difference between description under alarm-inventory and alarm-text nder alarm list
> See description definition and alarm-text definition as follows:
> “
> description：A description of the possible alarm.  It SHOULD include information on possible underlying root causes and corrective actions.
> alarm-text：The string used to inform operators about the alarm. This MUST contain enough information for an operator to be able to understand the problem and how to resolve it.  If this string contains structure, this format should be clearly documented for programs to be able to parse that information.
>    “
>    I am not sure any semantics difference between description and alarm-text, why not replace one with another? Or we can further broke down description/alarm-text into root-cause and corrective-actions. I believe they are key information we want to convey through description/alarm-text.
> Alarm description is dynamic/run-time, conveys relevant information for the specific alarm state change.
> Description in the inventory is static, cannot convey dynamic state change information
> 
> 
>  
> 5.       Alarm arrive time support
> Under operator-state-change, we have time parameter to represent Timestamp for operator action on alarm, I am wondering do we need to add alarm-arrive-time to represent the time when alarm arrive at the management system.
> It is useful information for the alarm management.
> The alarm has a leaf representing the real time the state change appeared:
>     +--ro alarm* [resource alarm-type-id alarm-type-qualifier]
>           ...
>        +--ro last-changed               yang:date-and-time
>        +--ro status-change* [time]
>           +--ro time                    yang:date-and-time
> This should represent the time it really happened. Not the time the notification arrived at the management system. If you need that, that is something you can add in your mgmt system.
>  
>  
> 6.       Alarm-name field support for alarm and alarm inventory
> In the current model, each alarm under alarm list is uniquely identified by three leaf key (resource, alarm type identifier, alarm type qualifier),would it more desirable to define a single leaf key, e.g., add alarm name or alarm-no to uniquely identify each alarm? That will simplify the alarm management from the management system perspective. Make sense?
> A string no…
> This is a fundamental design principle in the alarm module. The key, the tuple, carries semantic information, there is no doubt how to match notifications to the alarm state.
> 3GPP Alarm IRP, for example, introduced a confusing single key alarmId key which created paradoxes, 
> if you have different alarmIds but for the the same alarmtype and resource, what does it mean?
> 
> 
>  
> 7.       Reason-id support for alarm list and alarm inventory
> In the current model, is root cause resource is the reason to generate each alarm? If not, I propose to add reason-id for each alarm under alarm list and alarm inventory.
> See answer to #2
>  
> 8.       Alarm generating device or location support for alarm list and alarm inventory
> In the current model, it seems the resource type can potentially indicate the device or location where the alarm is generated, but not explicitly. I am wondering why not add alarm-generating-device and alarm-generating-location two parameters to explicitly indicate the device or location where the alarm is generated, that will simplify alarm management, make sense? 
>  
> I guess you are considering a management application and not the device? 
> The resource is a leafier which could/should include the device in your model in your management application.
> 
> 
> 9.       Alarm notification category support
> In the current model, alarm notification is defined as follows:
> “
> This notification is used to report a state change for an alarm. The same notification is used for reporting a newly raised alarm, a cleared alarm or changing the text and/or
> severity of an existing alarm.
>  
> ”
> However it is not clear how to distinguish alarm notification for newly reaised alarm from alarm notification for a cleared alarm. Would it be more sensible to add alarm notification category support something as follows:
> “
> leaf category {
>          type enumeration {
>            enum fault {
>              description
>                "Alarm raised.";
>            }
>            enum recovery {
>              description
>                "Alarm cleared.";
>            }
>            enum Change {
>              description
>                "Alarm changed.";
>            }
>          }
> ”
> Not needed, this is obvious when you map the notification towards the key tuple.
> 
> 
> 10.   Consistency between alarm list construct and alarm notification construct
> We see the difference between alarm list construct and alarm notification construct is operator action defined under alarm notification construct and operator state change under alarm list construct.
> As specified in RFC7950,
> “
> An action MUST NOT be defined within an rpc, another action, or a
>    notification
> ”
> I am not sure action can be allowed within alarm-notification construct, in that case, I would propose to remove operator action from alarm notification construct.
> In addition, the operator parameter under operator-state-change can be removed or consolidated into set-operator-state action.
> I do not understand
> The action is not defined in the notification.
> 
> 
>  
> 11.   Additionalinfo support for alarm list
> I think we should allow vendor specific extension to be added as part of alarm list, the vendor specific extension can be defined in TLV format.
> The alarm module does not restrict any vendor additions, better to use augmentation.
> 
> 
>  
> 12.   Alarm-no support for set-operator-state
> If we believe set-operator-state is useful action under alarm list. I am wondering if we can add alarm-no or alarm-name to identify each alarm under set-operator-state. This will help a lot for alarm ack operation based on each alarm number.
> See above
> 
>  
> 13.   Is-acked for alarm list
> Since we have is-cleared parameter under alarm list to indicate the current clearance state of the alarm, why not add is-acked parameter under alarm list to indicate the current acked state of the alarm, make sense?
> You can get that from the operator-state-change list.
> 
>  
>  
> Br Stefan

[CCAMP] Second review of draft-ietf-ccamp-alarm-m… Qin Wu
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… stefan vallin
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… stefan vallin
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… Qin Wu
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… Qin Wu
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… stefan vallin
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… Qin Wu
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… tom petch
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… stefan vallin
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… BRUNGARD, DEBORAH A
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… stefan vallin
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… stefan vallin
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… BRUNGARD, DEBORAH A
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… tom petch
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… Qin Wu
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… stefan vallin
Re: [CCAMP] review of draft-ietf-ccamp-alarm-modu… stefan vallin
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… stefan vallin
Re: [CCAMP] Second review of draft-ietf-ccamp-ala… Qin Wu