Re: [Gen-art] Genart last call review of draft-ietf-ccamp-alarm-module-07
Alissa Cooper <alissa@cooperw.in> Mon, 08 April 2019 15:47 UTC
Return-Path: <alissa@cooperw.in>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B44271203EB; Mon, 8 Apr 2019 08:47:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=cooperw.in header.b=qxt3HMsf; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=XxslZWpi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gFE1NTDrDf37; Mon, 8 Apr 2019 08:47:18 -0700 (PDT)
Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AA2D11203F2; Mon, 8 Apr 2019 08:47:18 -0700 (PDT)
Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id AEFC0D88; Mon, 8 Apr 2019 11:47:17 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Mon, 08 Apr 2019 11:47:18 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cooperw.in; h= from:message-id:content-type:mime-version:subject:date :in-reply-to:cc:to:references; s=fm2; bh=XtizEGsIeFZ9KhK52mv652D IhwNEl8XpjI1EHkj/5Eg=; b=qxt3HMsfBhufSkpShf9V012NtBaQf8E9eUL5/5a YxwgMEK9vJLGMJhgpJSY7OZs6IzXiKlaAFiBVPdRU2crtxPXmLWmuFLx0oYp6Wbk +hQJq/1YUrbrJ7Mm6OZ2ndW6HWK+bHyQhB3a/x+ST/sw57XpQSVO3bBxrdsRHKtk ET+ISAAeP7yJCuYG2u2hrq37vB9IpQAWVIXjzNbSHFarhT57FJJCsLvcWzjsvzhX SVL7qduna+M3FxcXwazLrQwt181TmDSpGjjQK6XbLgjnMJSCjgISzg8z4RYgMvhK xxJ0XGAFan3lPL1DbC84OR+PRcY8C8WkCQdy1m07dT0Ze8Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=XtizEG sIeFZ9KhK52mv652DIhwNEl8XpjI1EHkj/5Eg=; b=XxslZWpisiq/jbwb8cthGX 6fiqcJ9z+b+dSbD5ecz4YhwxH6dAtF/IAXtKqTuoeXbJreyMx1Wt5ADTGk6dWk8E UmHhSTktDeoF2pfbXNcy/v93ofhWu5juoaGb98dfAlVTSlhxgzsJKopaAWWW5QcV 0hNGb0djIr5x9l2+aOeRTL/9/1HngrMt2Y1DicyhqqTi43SnQPhB3Vbk2TrkJgKT kX2z1zQHutzPvXuRLCaJhWE8rvF9eRC0GAROeMt0o75OnOdXNXgYrh7WI2WTWdIY ECc4qHy8R0rWVtBibD+v7FPuSh1PYZjfhy4kFLdXtGxN/L2irP1YKw/gZwXaSp/A ==
X-ME-Sender: <xms:BG2rXMEQ65nL2eV-HNFLkXwL-cvSXGquYUmJytBfpwFQG4m--Mvk5A>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddrudefgdelgecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhkfgtggfuffgjvfhfofesrgdtmherhhdtjeenucfhrhhomheptehlihhsshgr ucevohhophgvrhcuoegrlhhishhsrgestghoohhpvghrfidrihhnqeenucffohhmrghinh epihgvthhfrdhorhhgnecukfhppedujeefrdefkedruddujedrleefnecurfgrrhgrmhep mhgrihhlfhhrohhmpegrlhhishhsrgestghoohhpvghrfidrihhnnecuvehluhhsthgvrh fuihiivgeptd
X-ME-Proxy: <xmx:BG2rXI1t0fBGJ4KOFgWRkWjA030ZtLgo3epWoiEMaZuQ8-vQ_u8kXA> <xmx:BG2rXOr5pbsgp_X9pW8LQFYrN-SYhU8vdRMYmgiv-bDEQhVW93KyuA> <xmx:BG2rXA5B5iExUCnNGDluijY2f_OmLkTxDAVXXPmHvyzZ1rpxUYfzOg> <xmx:BW2rXB-UPC3OJJV_as4WfY40zgEjIL44ps2tJbAfODfEse6kdj09eA>
Received: from rtp-alcoop-nitro5.cisco.com (unknown [173.38.117.93]) by mail.messagingengine.com (Postfix) with ESMTPA id CA580100E5; Mon, 8 Apr 2019 11:47:15 -0400 (EDT)
From: Alissa Cooper <alissa@cooperw.in>
Message-Id: <14722B00-7C42-443C-A067-1E69E51B301C@cooperw.in>
Content-Type: multipart/alternative; boundary="Apple-Mail=_95B7A1E4-3EED-4B3A-9AFA-3D8852820347"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Date: Mon, 08 Apr 2019 11:47:14 -0400
In-Reply-To: <549638C6-2599-4524-8AE9-436FC949DC88@wallan.se>
Cc: draft-ietf-ccamp-alarm-module.all@ietf.org, gen-art <gen-art@ietf.org>, ccamp@ietf.org, ietf <ietf@ietf.org>
To: stefan vallin <stefan@wallan.se>, Dan Romascanu <dromasca@gmail.com>
References: <155294084679.26073.4005125072161491147@ietfa.amsl.com> <956268DE-965F-40C3-A845-D236424CE07D@wallan.se> <CAFgnS4WOQL3Q=ec_8efvKtMXb8RSXyaRGx+iyMs28RnWk=HGtQ@mail.gmail.com> <549638C6-2599-4524-8AE9-436FC949DC88@wallan.se>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/gen-art/5VPvNcFrFz7oYHvn9QnpQQYZFE4>
Subject: Re: [Gen-art] Genart last call review of draft-ietf-ccamp-alarm-module-07
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Apr 2019 15:47:30 -0000
Dan, thanks for your review. Stefan, thank you for making the corresponding changes. I entered a No Objection ballot. Alissa > On Mar 20, 2019, at 3:51 AM, stefan vallin <stefan@wallan.se> wrote: > > Thanks Dan! > >> On 19 Mar 2019, at 19:54, Dan Romascanu <dromasca@gmail.com <mailto:dromasca@gmail.com>> wrote: >> >> Hi Stefan, >> >> Thank you for your answer and for addressing my concerns. I am comfortable with your proposals. If your AD agrees, I would include these in a revised version before submission to the approval of the IESG. >> >> Regards, >> >> Dan >> >> >> On Tue, Mar 19, 2019 at 5:11 PM stefan vallin <stefan@wallan.se <mailto:stefan@wallan.se>> wrote: >> Hi Dan! >> Thanks for your review, an honour to have RFC 3877 in the loop :) >> See inline >> br Stefan >> >> >> > >> > >> > Major issues: >> > >> > 1. The definition of Alarm is key for the whole model. It reads like this: >> > Alarm (the general concept): An alarm signifies an undesirable state in a >> > resource that requires corrective action. >> > >> > However, RFC 3877 already defined a number of concepts including: >> > Error >> > A deviation of a system from normal operation. >> > >> > Fault >> > Lasting error or warning condition. >> > >> > .... >> > >> > Alarm >> > Persistent indication of a fault. >> > >> > I believe that there is a need to show why the model defined by RFC 3877 needs >> > to be changed, and why the difference that RFC 3877 was making between a Fault >> > and an Alarm is no longer needed. >> >> Good comment, you are right, and we need to keep the distinction between fault and alarm. >> That distinction is used in X.733, 3GPP IRP and others. The general pattern is that “fault” >> refers to what is really broken, and the alarm the manifestation of that underlying cause. >> There is not a simple 1-1 relationship between a fault and an alarm >> * 1 fault may have many alarms due to limited root cause capabilities of the system >> * There might be no underlying fault to an alarm, consider a non-optimal QoS configuration >> which gives bad quality in VOIP calls. Certainly a MOS alarm from the VOIP probe, but there >> is no “fault” as such (if you do not consider a non-optimal config as a fault) >> >> So X.733 >> X.733 fault: The physical or algorithmic cause of a malfunction >> 3GPP fault: a deviation of a system from normal operation, which may result in the loss of operational capabilities of the element or the loss of redundancy in case of a redundant configuration >> >> I suggest we add the following to terminology: >> Fault: the underlying cause of an undesired behaviour >> >> If we then turn to the term “alarm". I have added two aspects to the definition of an alarm: >> >> An alarm signifies an *undesirable state* in a resource that *requires corrective action*. >> >> Mostly based on the alarm standardization work in the process industry (see draft references). >> >> 1) Rather than “deviation from normal”, we say “undesirable”, subtle difference. >> In IT environments it is easier to define what is normal, a normal load to a web server. >> And anything deviation from that normal load could be an alarm. >> In networking, things are more dynamic, and deviation from normal might be the desired state. >> So the definition stresses the fact that it is an undesired state, not just deviation from normal. >> >> 2) Adding the requirement that an alarm per definition should require an action. This is a sound >> requirement that puts requirements on what qualifies as an alarm and limits the amounts of alarms. >> (See for example the EEMUA, and ISA182 references in the draft). The 3GPP Alarm standard >> also added this to their definition at the later revisions to address the alarm overload problem. >> >> >> >> >> >> > Also, RFC 3877 defined in Section 3 a >> > Framework and an Architecture that was consistent with X.733. This document has >> > no such section, and while acknowledging the need for a mapping to X.733 it >> > states as a goal: >> > Mapping to X.733, which is a requirement for some alarm systems. Still, keep >> > some of the X.733 concepts out of the core model in order to make the model >> > small and easy to understand >> > >> > More details about what is left out and why these are not needed would help. >> The alarm YANG model does not *require* the X.733 parameter >> definitions of for example probable-cause enum values. Today, most networking devices >> and management systems do not rely on those enumerations. >> >> Those are defined in the X733 augmentation module in order to keep the core model as >> small and useful as possible. X733 requirements come more often from telecom environments. >> >> >> > >> > Minor issues: >> > >> > 1. Section 2 makes a statement that includes >> > ... While IETF has not really addressed alarm management >> > >> > This is is actually not accurate. RFC 3877 addressed Alarm Management. Maybe >> > there is a need to revise that approach, but this should be done explicitly, >> > not by stating that it did not exist. >> Correct, bad wording. >> OLD TEXT: >> Address alarm usability requirements, see Appendix G. While IETF >> has not really addressed alarm management, telecom standards has >> addressed it purely from a protocol perspective. The process >> industry has published several relevant standards addressing >> requirements for a useful alarm interface; [EEMUA], [ISA182]. >> This alarm module defines usability requirements as well as a YANG >> data model. >> SUGGESTION: >> Address alarm usability requirements, see Appendix G. While IETF >> and telecom standards have addressed alarms mostly from a >> protocol perspective, the process industry has published >> several relevant standards addressing requirements for a useful >> alarm interface; [EEMUA], [ISA182]. >> This alarm module defines usability requirements as well as a YANG >> data model. >> >> > >> > 2. Section 3.5: >> > Closing an alarm implies that the operator considers the corrective action >> > performed. >> > >> > Is this always true? The undesirable state may have been cancelled by some >> > other event than corrective action, for example the resource is no longer used, >> > or the time elapsed mat have made the undesirable state irrelevant. >> >> I think it is important to keep the two perspectives in mind. An operator closing an >> alarm is only a flag from the operations team that the alarm does not need an action. >> It might be cleared or not cleared by the system. >> >> So in your first example, the alarm is probably cleared by the instrumentation, >> correlating “the other event”. >> >> If the resource is no longer used a shelf should be created. >> >> If time has passed, depends, …. >> >> > >> > 3. In section 3.5.1: >> > Alarms are not cleared by operators, only the underlying instrumentation can >> > clear an alarm. Operators can close alarms. >> > >> > So, the document makes a distinction between clearing an alarm and closing an >> > alarm. It may be good to define two two concepts to make the distinction clear. >> >> Good point! >> >> Suggested terminology additions: >> * Cleared alarm: a cleared alarm is an alarm where the system/server considers the >> undesired state to be cleared. Operators can not clear alarms, clearance is managed >> by the system. A linkUp notification can be considered a clear condition for a linkDown state. >> >> * Closed alarm: operators can close alarms irrespective of the alarm being cleared or not. >> A closed alarm indicates that the alarm does not need attention, either since the corrective >> action has been taken or that it can be ignored for other reasons. >> >> >> > >> > 4. Appendix F.1: >> > The alarm MIB is state oriented rather than notification oriented, an alarm >> > is a "lasting condition", not a discrete notification reporting about a >> > condition state change. >> Good catch, will rephrase, the alarm MIB and the alarm YANG has a stateful view >> of alarms, not notification-focused. >> >> Suggested change: >> OLD >> RFC 3877 defines alarm referring back to "a deviation from normal operation". This is >> problematic, since this might not require an operator action. The alarm MIB is state >> oriented rather than notification oriented, an alarm is a "lasting condition", not a >> discrete notification reporting about a condition state change. >> NEW: >> RFC 3877 defines alarm referring back to "a deviation from normal operation". The Alarm YANG >> model adds the requirement that it should require an corrective action and should be undesired, >> not only a deviation from normal. The alarm MIB is state oriented in the same way as the Alarm YANG, >> it focuses on the "lasting condition", not the individual notifications. >> >> >> > >> > I am not sure that I understand this comment. Alarm states are defined also in >> > this document, and Alarms as defined here are also different than ' a discrete >> > notification reporting about a condition state change'. So, what does this >> > comment really try to say? >> > >> > Nits/editorial comments: >> > >> > >> > > _______________________________________________ > Gen-art mailing list > Gen-art@ietf.org > https://www.ietf.org/mailman/listinfo/gen-art
- [Gen-art] Genart last call review of draft-ietf-c… Dan Romascanu via Datatracker
- Re: [Gen-art] Genart last call review of draft-ie… stefan vallin
- Re: [Gen-art] Genart last call review of draft-ie… Dan Romascanu
- Re: [Gen-art] Genart last call review of draft-ie… stefan vallin
- Re: [Gen-art] Genart last call review of draft-ie… Alissa Cooper