Re: [CCAMP] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06

"Joel M. Halpern" <jmh@joelhalpern.com> Tue, 15 January 2019 22:38 UTC

Return-Path: <jmh@joelhalpern.com>
X-Original-To: ccamp@ietfa.amsl.com
Delivered-To: ccamp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C7D7512867A; Tue, 15 Jan 2019 14:38:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Level:
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=joelhalpern.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TQTw4wP57Dq5; Tue, 15 Jan 2019 14:38:22 -0800 (PST)
Received: from mailb2.tigertech.net (mailb2.tigertech.net [208.80.4.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C8346127598; Tue, 15 Jan 2019 14:38:22 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by mailb2.tigertech.net (Postfix) with ESMTP id 43fQGk3kHWzFpr3; Tue, 15 Jan 2019 14:38:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelhalpern.com; s=2.tigertech; t=1547591902; bh=l5NqbsgXTS73ZaiAHzTgUjNMSsviiL5ujTILDkZqzcU=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=erxOs6wfSM8AC03ojc/2FQ4q96uT/JVBrp9rmqMPFhfC29TWQO6N8LSFzEJkTmVlP YR8DqLTEnlp5uCf4V5XvXU71FBZhFRKWb+vMcWM1XXMA8w4hjtPIz3XuXwiYLaZ/pb x+CP6N6isSvWsA0SdS8wxu9lIYRMyRN1HkO50mOw=
X-Virus-Scanned: Debian amavisd-new at b2.tigertech.net
Received: from Joels-MacBook-Pro.local (209-255-163-147.ip.mcleodusa.net [209.255.163.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mailb2.tigertech.net (Postfix) with ESMTPSA id 43fQGj2hrBz13Kbh; Tue, 15 Jan 2019 14:38:21 -0800 (PST)
To: stefan vallin <stefan@wallan.se>
Cc: rtg-dir@ietf.org, draft-ietf-ccamp-alarm-module.all@ietf.org, "CCAMP (ccamp@ietf.org)" <ccamp@ietf.org>, ietf@ietf.org
References: <154714089885.30812.1684533748546533450@ietfa.amsl.com> <55998B73-A581-4A47-8D23-B88E2607EFC8@wallan.se>
From: "Joel M. Halpern" <jmh@joelhalpern.com>
Message-ID: <68a25f22-5b92-2f7b-9104-0e7e9c580a9b@joelhalpern.com>
Date: Tue, 15 Jan 2019 17:38:20 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <55998B73-A581-4A47-8D23-B88E2607EFC8@wallan.se>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ccamp/EW9Ri_ATGW_BB7R9B2YfOrgNnsA>
Subject: Re: [CCAMP] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06
X-BeenThere: ccamp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion list for the CCAMP working group <ccamp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ccamp>, <mailto:ccamp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ccamp/>
List-Post: <mailto:ccamp@ietf.org>
List-Help: <mailto:ccamp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ccamp>, <mailto:ccamp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jan 2019 22:38:25 -0000

Thank you Stefan.  Your proposed clarifications sound like they will 
addres my concerns.

Yours,
Joel

On 1/15/19 4:52 PM, stefan vallin wrote:
> Hi Joel!
> Thanks for your review, really helpful!
> See inline:
> 
>> On 10 Jan 2019, at 18:21, Joel Halpern <jmh@joelhalpern.com>; wrote:
>> Minor Issues:
>>     The first paragraph of section 3.6 (Root Cause, Impacted Resources and
>>     Related Alarms) has a confused "not", a missing preposition, and a typoed
>>     conjunction, making it very hard to be sure what is intended.  I believe
>>     the first part of the sentence should read: "The recommendation is to have
>>     a single alarm for the underlying problem and list …"
> That sentence is really broken in v6, my fat fat fingers. This was also pointed out by Gert Grammel.
> It should read:
> 
>   The recommendation is to have a single alarm
>   for the underlying problem and list the affected resources in the
>   alarm, rather than having separate alarms for each resource
> 
> 
>>
>>     There is a larger issue about system behavior and root cause analysis that
>>     I think should be discussed in this section.  Root cause analysis and
>>     side-effect analysis are not simple tasks.  It is common for them to be
>>     performed outside of network elements.  When such is performed outside of a
>>     network element, it is unclear what the implications are.  Is it the intent
>>     that network elements that can not perform root cause analysis and impacted
>>     resource determination should NOT support this YANG module?  Or can /
>>     should / may they support it even though they can not perform this
>>     analysis?  There is a paragraph that seems to be trying to talk about this,
>>     but I was left confused about what was expected.  Part of my confusion is
>>     that the text treats this inability as rare, whereas in my experience for
>>     network elements such inability is common.
> The module does not mandate any root-cause -, impact analysis or correlation capabilities.
> The purpose of this section is to describe optional leafs in the alarms relating to presenting the result of such analysis, if supported.
> If the system has no such capabilities, the optional leafs are not used and this section can be ignored.
> I will make that clear in this  section. It would be fatal if the reader did not use the module assuming it put requirements on correlation.
>>
>>     It took me a while to realize what the text in 3.7 (and 4.1.1) about not
>>     generating notification is talking about.   The problem is that with all
>>     the effort to make clear that alarms are not notifications, I missed the
>>     fact that an alarm being raised (or re-raised) does itself cause a
>>     notification.  And that it is this re-raise notification (and other
>>     severity change, clearing, etc notifications) that are suppressed by the
>>     shelving.   It seems to me that there needs to be better explanation of
>>     this in or before 3.7.
> Ok, will improve description.
>>
>>     Reading the YANG for shelving alarms, it looks to me that while it can do
>>     what is described earlier in the document, the conceptual structure is VERY
>>     different.  From the YANG, to shelve a specific alarm one has to create a
>>     named shelf whose conditions identify the specific alarm.  To selve several
>>     alarms that are related (for example, when the operator looks at a list and
>>     selects several items to shelve) the system will likely have to create
>>     multiple shelves, give each a unique name, and put the different alarm
>>     identifiers in each one.
> The data-model uses a leaf-list for the resource which makes it possible to define one shelf for several resources.
> However your comments made us aware of alarm-type and alarm-qualifier just being leafs. As you point out this may
> lead to situations where you need to configure several shelfs for shelving different alarms relating to the same reason.
> We will change this so that several alarm-types/alarm-qualifiers can also be defined for one shelf.
> With this change, any arbitrary group of alarms can be configured as one shelf.
> Also, we will make the list ordered-by user, and add to the
> description that the first matching shelf is used.
> 
> Thanks for pointing this out.
> 
>> To unshelve alarms, one has to find the named
>>     shelf which has caused the shelving.   This seems very awkward.  It seems
>>     to have been designed to enable one to store the shelving reason separate
>>     from the alarm itself.  It introduces the odd effect that if the shelves
>>     are used with conditions that can match more than one thing, then one could
>>     have several shelves shelving the same alarm, and an effort to unshelve
>>     might well not produce the desired result. Assuming that this complexity is
>>     desired by the working group, I would ask that it be explicitly called out
>>     in the descriptive portions of the document.
> See above, with the proposed change, it will always be one shelf.
> Finding the shelf with the shelf name should not be awkward.
> Note well that it is likely that there are several alarms that are shelved due to the same shelf configuration.
> Take the straight-forward shelving of all alarms from a specific interface.
> Different alarms from that interface will then be shelved and it is straight-forward to delete the shelf configuration that says, “if X/Y/Z under test”.
> This is an important feature of shelving.
> 
> See above about the change to ordered-by-user, this will address your issue of several shelves addressing the same alarm.
> Again, thanks for your review making us aware of this issue!
>>
>> Nits:
>>         In section 4.4 (overview of The Alarm List) tree showing the components
>>         of the purge-alarm operation, is there any way to make clear that the
>>         enumeration called alarm-status is the enumeration of filter choices
>>         related to whether the alarm is cleared?  Maybe rename it
>>         alarm-cleared-filter?
> Ok, will consider this
> 
> br Stefan and Martin
>>
>>
>