Re: [netmod] draft-ietf-ccamp-alarm-module-02

Karen Elisabeth Egede Nielsen <KEE@kamstrup.com> Tue, 02 October 2018 07:27 UTC

Return-Path: <KEE@kamstrup.com>
X-Original-To: netmod@ietfa.amsl.com
Delivered-To: netmod@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 421DD130DCA; Tue, 2 Oct 2018 00:27:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3LUXzQViWK9R; Tue, 2 Oct 2018 00:27:23 -0700 (PDT)
Received: from mail.kamstrup.com (mail.kamstrup.com [185.181.20.38]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 104BF130DC3; Tue, 2 Oct 2018 00:27:22 -0700 (PDT)
Received: from EXCHANGE2010.kamstrup.dk ([::1]) by Exchange2010.kamstrup.dk ([::1]) with mapi id 14.03.0415.000; Tue, 2 Oct 2018 09:27:20 +0200
From: Karen Elisabeth Egede Nielsen <KEE@kamstrup.com>
To: stefan vallin <stefan@wallan.se>
CC: Martin Bjorklund <mbj@tail-f.com>, "ccamp@ietf.org" <ccamp@ietf.org>, "netmod@ietf.org" <netmod@ietf.org>
Thread-Topic: draft-ietf-ccamp-alarm-module-02
Thread-Index: AdRQCodrkWYiLZVDQVapWZ9icZtUKAAoPuyAAA5MvzACSI7fgAAFRodw
Date: Tue, 02 Oct 2018 07:27:20 +0000
Message-ID: <A81686D187412242AD51AAC709D08443342B21A8@Exchange2010.kamstrup.dk>
References: <A81686D187412242AD51AAC709D0844334296B31@Exchange2010.kamstrup.dk> <20180920.103107.750560007019896412.mbj@tail-f.com> <A81686D187412242AD51AAC709D0844334297C1E@Exchange2010.kamstrup.dk> <06297C39-B42A-492C-ABB5-3EB3C094F35D@wallan.se>
In-Reply-To: <06297C39-B42A-492C-ABB5-3EB3C094F35D@wallan.se>
Accept-Language: en-US, da-DK
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [172.20.19.187]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/netmod/SHapP7Go1-mxH_xP9E6Bq114VRM>
Subject: Re: [netmod] draft-ietf-ccamp-alarm-module-02
X-BeenThere: netmod@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETMOD WG list <netmod.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netmod>, <mailto:netmod-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netmod/>
List-Post: <mailto:netmod@ietf.org>
List-Help: <mailto:netmod-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netmod>, <mailto:netmod-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Oct 2018 07:27:27 -0000

HI Stefan, Martin,

Thanks a lot.
Pls see inline below.

BR, Karen

<snip>

> >
> > I hope that you can accept the follow up right below:
> >
> > * Would it not be relevant in the draft to outline the relation to the alarm-
> state in RFC8348 ?
> >
> > ** Possibly even in the substance of the document rather then in an
> > appendix  - assuming that the two are seen as complementary
> mechanisms
> > potentially based on the same underlying alarm framework (that you
> > define in this draft)
> We can add a short description on the relationship between the Alarm
> Module and RFC8348.
> As Martin stated they serve different purposes:
> "The "alarm-state" in RFC 8348 (and EntityAlarmStatus in RFC 4268) is just
> a summary of the alarms that may be active on the specific hardware
> component.  It doesn't say anything about how alarms are reported, and it
> doesn't provide any details of the alarms; it is just a bitmask. The alarm-
> module draft, specifies how alarms are reported, generically.  It also
> provides a list of all active alarms."
> 
> The mapping between the data-models are outlined below.
> Alarm YANG.                                 RFC 8348
> alarm list
> * resource                                      corresponds  to /hardware/component/
> * is-cleared                                    no bit set in
> /hardware/component/state/alarm-state
> * perceived-severity                       corresponding bit set in
> /hardware/component/state/alarm-state
> * operator-state-change/state.      if the alarm is acked by the operator it
> could correspond to under-repair
> 
[Keen] I would appreciate that very much. Thanks !
> >
> > ** In the draft you have "closed" state of an alarm. Wouldn't it be
> relevant, in your opinion. with this alarm framework in mind, also to have
> the closed state in the alarm-state object of RFC8348 ?
> >
> > * The same question (should be included in alarm-state of RFC8348) for
> the shelved alarms ?
> This would be an update to RFC 8348, and is out of scope for this work
[Keen] Yes. However the IETF makes comprehensive standards. 
The present gap in between this work (which I support) and RFC8348 is actually also one of the reasons why I think that it is important to 
explicitly relate to  RFC8348 in this work (draft in subject).

For our usecase (IoT) I think that it will be relevant to implement a solution inline with this draft. However I also believe that we would like to implement support for alarm-state object of RFC8348, in this case then for usefulness extended with the "closed" state.

> >
> > Something else:
> >
> > * Assuming that one has an alarm which have no clear  (see next question
> below) or where clear may not always come.
> >   Would an operator close of this alarm make it disappear from the active
> alarms summary ? Can that be an implementation decision  - possibly
> depending on the alarm type, possibly configurable ?
> The general answer is no, as stated in the document, there is no automatic
> purge/deletion of an alarm on clear from the resource or close from the
> operator.
> This is by design, from an ops perspective it makes sense to be able to view
> the alarm even after it is cleared/closed.
> You might want to study the root cause afterwards to perform proactive
> actions for it not to appear again for example.
> 
[Keen] Yes that makes good sense.

> But as you say, you can make it an implementation decision, "purge on
> clear", "purge on close".
> If it is hard-coded per alarm-type, describe it in the alarm inventory You can
> also make it configurable per alarm type by augmenting the alarm profile
> with a purge-policy: "purge on clear", "purge on close"
> >
[Keen] Thanks.

> > * RFC3877 has the following statement: "Alarms SHOULD  be modelled so
> Notifications are sent on alarm Clear."
> > I did not find this statement in the substance of the draft nor in Appendix
> F (But it may have escaped me).
> > Is this also the mindset of this draft ?
> According to this alarm module an alarm-notification will be sent with
> perceived-severity set to cleared

[Keen] Is that stated explicitly in the draft ?
Will you associate a keyword  with this statement ?

> >
> > * It is correctly understood that the Alarm Summary and the Alarm list
> contains the alarms which are presently in the system - i.e. which have not
> been purged ?
> Correct
> >  * Would it be relevant for the Alarm Summary list to tell when alarms was
> last purged due to administrative action ?
> We do not want to load the alarm module with more features at this point,
> this could be done in the management application/client.
> >
[Keen] It might be prudent to have this state in the server as well (?)

> > * Are you considering to implement support for statistics ?
> What do you mean with statistics?
> a) Statistics on alarms or do you mean a b) performance monitoring
> module?
> It a, no, that is up to the management application If b, that is a separate
> module not within this one

[Keen] I was referring to the first. Here first and foremost statistics on received alarms divided on severity level.

Yes it can be done in the management application, but I am not sure why it is necessarily "up to the management application" to do this ?

Would some global statistics in the server not make sense - or do you have specific reasons for placing all statistics in the management application layer ?

BR, Karen

> 
> >
> >
> > BR, Karen
> >
> > -----Original Message-----
> > From: Martin Bjorklund <mbj@tail-f.com>
> > Sent: 20. september 2018 10:31
> > To: Karen Elisabeth Egede Nielsen <KEE@kamstrup.com>
> > Cc: ccamp@ietf.org; stefan@wallan.se; netmod@ietf.org
> > Subject: Re: draft-ietf-ccamp-alarm-module-02
> >
> > Hi,
> >
> > Karen Elisabeth Egede Nielsen <KEE@kamstrup.com> wrote:
> >> Hi,
> >>
> >> This draft is new to me and modelling of alarm management also
> >> somewhat....
> >>
> >> Could you enlighten me on the relationship, if any, in between the
> >> alarm module of this draft and the Device/resource alarm state within
> >> RFC8348 (equivalently the EntityAlarmStatus of RFC4268) ?
> >
> > The "alarm-state" in RFC 8348 (and EntityAlarmStatus in RFC 4268) is just
> a summary of the alarms that may be active on the specific hardware
> component.  It doesn't say anything about how alarms are reported, and it
> doesn't provide any details of the alarms; it is just a bitmask.
> >
> > The alarm-module draft OTOH, specifies how alarms are reported,
> generically.  It also provides a list of all active alarms.
> >
> >> E.g.  are the two they considered complementary mechanisms
> (modules),
> >> just different view glasses, or are they non-compatible or redundant
> >> ..?
> >
> > So if both modules are implemented (they don't have to be), the
> information can be viewed as redundant or just different views.
> >
> >
> > /martin
> >
> >
> >
> >>
> >> Many Thanks in advance !
> >>
> >>
> >> BR, Karen Nielsen
> >>