Re: [RTG-DIR] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06

stefan vallin <stefan@wallan.se> Tue, 15 January 2019 21:52 UTC

Return-Path: <stefan@wallan.se>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 342661294FA for <rtg-dir@ietfa.amsl.com>; Tue, 15 Jan 2019 13:52:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.042
X-Spam-Level:
X-Spam-Status: No, score=-2.042 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.142, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wallan-se.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dvPt7ZFibtZb for <rtg-dir@ietfa.amsl.com>; Tue, 15 Jan 2019 13:52:42 -0800 (PST)
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 75124129AA0 for <rtg-dir@ietf.org>; Tue, 15 Jan 2019 13:52:39 -0800 (PST)
Received: by mail-lf1-x129.google.com with SMTP id z13so3245622lfe.11 for <rtg-dir@ietf.org>; Tue, 15 Jan 2019 13:52:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wallan-se.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=NAg3WEDN7alkDwM+QgeVEI29Wv8pJh/vRpp3+UEcI2M=; b=GnTgDtq67kY/PSn+2A51C/JmYHZ/XHzgGus8V/i0PcwsRMJk3pYQ3Q0kpXGfM30zj6 O0LRGwf9lNZP4ubAWTeu9WTRT5tdo5nTEek/Yk4jhauP/Ypoo+hvoH2UWzrqRjm3iLwh 2AtZfOTl6mtTvDdpObfxZBG1nhdxfvoiIEcX+9lKc3EmmhSwolU+C46wkk0rC4ZXh5+4 x80DDbtazLKeXgCG7qx9YI5xEgZLDFTIuAADu/BMldcl2gX9qSmIekZfnLhHwcXU0l3X l/Jm0pmPApIv6HYN56of2CaOjvMv1rntGZvxiUcmw/FZkoIpcbnIkZpt46SrmY6EF0ra 4EOA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=NAg3WEDN7alkDwM+QgeVEI29Wv8pJh/vRpp3+UEcI2M=; b=JsAAqeH90UyC4un8owvLV8KmaQFvdGEHVHiMgl4AIBoF7lxqu+bRB7Av0yFfgxZ85y uP267jmljmOwMCxmeA3Tfe0hxHvgSuyvrumWTtZxjFb7hXWr5MUzFfPTPkltJ2AZIX8a ZcMM5SPNGl9o+zNyyGrRGyx3xn0d2RTgppty3qxywa2CmWsfbUaIcPatPQb4Vqsju5e7 QJhcNotx/p5g0RCgeobuKM+4x6bfSVo7qtB46rhiqeEvWwaM8zfdCzQttS29l69XbTsA DidCnTlxYVdk+bbf1TiPy4iQotyMWXb6ohq16jBgIaEwsA5k0MndADmSI9RHd2C+Q8r5 XM9A==
X-Gm-Message-State: AJcUukdERKqywRCkDANLcGwjz92OPw5TnAzuSzmG0Ecr4hJDlPCyzCiR Nx55boVEzDXdeCgXF5tAvAtbiw==
X-Google-Smtp-Source: ALg8bN5JzDH5zCFrfdywTG8oOD6QAx7yI5ZGFABjw8fgU4Qbr/zLQM+7QNVGsUM2cyUFZzhxpqnnkQ==
X-Received: by 2002:a19:4287:: with SMTP id p129mr4402274lfa.135.1547589157544; Tue, 15 Jan 2019 13:52:37 -0800 (PST)
Received: from [192.168.2.34] (j152.broadband.quicknet.se. [217.78.25.152]) by smtp.gmail.com with ESMTPSA id y1-v6sm741874ljh.39.2019.01.15.13.52.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Jan 2019 13:52:36 -0800 (PST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\))
From: stefan vallin <stefan@wallan.se>
In-Reply-To: <154714089885.30812.1684533748546533450@ietfa.amsl.com>
Date: Tue, 15 Jan 2019 22:52:36 +0100
Cc: rtg-dir@ietf.org, draft-ietf-ccamp-alarm-module.all@ietf.org, "CCAMP (ccamp@ietf.org)" <ccamp@ietf.org>, ietf@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <55998B73-A581-4A47-8D23-B88E2607EFC8@wallan.se>
References: <154714089885.30812.1684533748546533450@ietfa.amsl.com>
To: Joel Halpern <jmh@joelhalpern.com>
X-Mailer: Apple Mail (2.3445.100.39)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/0HcB_puOzICb74H3svYDN7778tM>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jan 2019 21:52:44 -0000

Hi Joel!
Thanks for your review, really helpful!
See inline:

> On 10 Jan 2019, at 18:21, Joel Halpern <jmh@joelhalpern.com> wrote:
> Minor Issues:
>    The first paragraph of section 3.6 (Root Cause, Impacted Resources and
>    Related Alarms) has a confused "not", a missing preposition, and a typoed
>    conjunction, making it very hard to be sure what is intended.  I believe
>    the first part of the sentence should read: "The recommendation is to have
>    a single alarm for the underlying problem and list …"
That sentence is really broken in v6, my fat fat fingers. This was also pointed out by Gert Grammel.
It should read:

 The recommendation is to have a single alarm
 for the underlying problem and list the affected resources in the
 alarm, rather than having separate alarms for each resource


> 
>    There is a larger issue about system behavior and root cause analysis that
>    I think should be discussed in this section.  Root cause analysis and
>    side-effect analysis are not simple tasks.  It is common for them to be
>    performed outside of network elements.  When such is performed outside of a
>    network element, it is unclear what the implications are.  Is it the intent
>    that network elements that can not perform root cause analysis and impacted
>    resource determination should NOT support this YANG module?  Or can /
>    should / may they support it even though they can not perform this
>    analysis?  There is a paragraph that seems to be trying to talk about this,
>    but I was left confused about what was expected.  Part of my confusion is
>    that the text treats this inability as rare, whereas in my experience for
>    network elements such inability is common.
The module does not mandate any root-cause -, impact analysis or correlation capabilities.
The purpose of this section is to describe optional leafs in the alarms relating to presenting the result of such analysis, if supported.
If the system has no such capabilities, the optional leafs are not used and this section can be ignored.
I will make that clear in this  section. It would be fatal if the reader did not use the module assuming it put requirements on correlation.
> 
>    It took me a while to realize what the text in 3.7 (and 4.1.1) about not
>    generating notification is talking about.   The problem is that with all
>    the effort to make clear that alarms are not notifications, I missed the
>    fact that an alarm being raised (or re-raised) does itself cause a
>    notification.  And that it is this re-raise notification (and other
>    severity change, clearing, etc notifications) that are suppressed by the
>    shelving.   It seems to me that there needs to be better explanation of
>    this in or before 3.7.
Ok, will improve description.
> 
>    Reading the YANG for shelving alarms, it looks to me that while it can do
>    what is described earlier in the document, the conceptual structure is VERY
>    different.  From the YANG, to shelve a specific alarm one has to create a
>    named shelf whose conditions identify the specific alarm.  To selve several
>    alarms that are related (for example, when the operator looks at a list and
>    selects several items to shelve) the system will likely have to create
>    multiple shelves, give each a unique name, and put the different alarm
>    identifiers in each one.   
The data-model uses a leaf-list for the resource which makes it possible to define one shelf for several resources. 
However your comments made us aware of alarm-type and alarm-qualifier just being leafs. As you point out this may
lead to situations where you need to configure several shelfs for shelving different alarms relating to the same reason.
We will change this so that several alarm-types/alarm-qualifiers can also be defined for one shelf. 
With this change, any arbitrary group of alarms can be configured as one shelf.
Also, we will make the list ordered-by user, and add to the
description that the first matching shelf is used.

Thanks for pointing this out.

> To unshelve alarms, one has to find the named
>    shelf which has caused the shelving.   This seems very awkward.  It seems
>    to have been designed to enable one to store the shelving reason separate
>    from the alarm itself.  It introduces the odd effect that if the shelves
>    are used with conditions that can match more than one thing, then one could
>    have several shelves shelving the same alarm, and an effort to unshelve
>    might well not produce the desired result. Assuming that this complexity is
>    desired by the working group, I would ask that it be explicitly called out
>    in the descriptive portions of the document.
See above, with the proposed change, it will always be one shelf.
Finding the shelf with the shelf name should not be awkward.
Note well that it is likely that there are several alarms that are shelved due to the same shelf configuration.
Take the straight-forward shelving of all alarms from a specific interface. 
Different alarms from that interface will then be shelved and it is straight-forward to delete the shelf configuration that says, “if X/Y/Z under test”.
This is an important feature of shelving.

See above about the change to ordered-by-user, this will address your issue of several shelves addressing the same alarm.
Again, thanks for your review making us aware of this issue!
> 
> Nits:
>        In section 4.4 (overview of The Alarm List) tree showing the components
>        of the purge-alarm operation, is there any way to make clear that the
>        enumeration called alarm-status is the enumeration of filter choices
>        related to whether the alarm is cleared?  Maybe rename it
>        alarm-cleared-filter?
Ok, will consider this

br Stefan and Martin
> 
>