Re: [RTG-DIR] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06

"Joel M. Halpern" <jmh@joelhalpern.com> Tue, 15 January 2019 22:38 UTC

To: stefan vallin <stefan@wallan.se>
Cc: rtg-dir@ietf.org, draft-ietf-ccamp-alarm-module.all@ietf.org, "CCAMP (ccamp@ietf.org)" <ccamp@ietf.org>, ietf@ietf.org
References: <154714089885.30812.1684533748546533450@ietfa.amsl.com> <55998B73-A581-4A47-8D23-B88E2607EFC8@wallan.se>
From: "Joel M. Halpern" <jmh@joelhalpern.com>
Message-ID: <68a25f22-5b92-2f7b-9104-0e7e9c580a9b@joelhalpern.com>
Date: Tue, 15 Jan 2019 17:38:20 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <55998B73-A581-4A47-8D23-B88E2607EFC8@wallan.se>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/TNM3AOXwkJ-l-NrflvRsMUJh-AA>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06
Precedence: list

Thank you Stefan.  Your proposed clarifications sound like they will 
addres my concerns.

Yours,
Joel

On 1/15/19 4:52 PM, stefan vallin wrote:
> Hi Joel!
> Thanks for your review, really helpful!
> See inline:
> 
>> On 10 Jan 2019, at 18:21, Joel Halpern <jmh@joelhalpern.com> wrote:
>> Minor Issues:
>>     The first paragraph of section 3.6 (Root Cause, Impacted Resources and
>>     Related Alarms) has a confused "not", a missing preposition, and a typoed
>>     conjunction, making it very hard to be sure what is intended.  I believe
>>     the first part of the sentence should read: "The recommendation is to have
>>     a single alarm for the underlying problem and list …"
> That sentence is really broken in v6, my fat fat fingers. This was also pointed out by Gert Grammel.
> It should read:
> 
>   The recommendation is to have a single alarm
>   for the underlying problem and list the affected resources in the
>   alarm, rather than having separate alarms for each resource
> 
> 
>>
>>     There is a larger issue about system behavior and root cause analysis that
>>     I think should be discussed in this section.  Root cause analysis and
>>     side-effect analysis are not simple tasks.  It is common for them to be
>>     performed outside of network elements.  When such is performed outside of a
>>     network element, it is unclear what the implications are.  Is it the intent
>>     that network elements that can not perform root cause analysis and impacted
>>     resource determination should NOT support this YANG module?  Or can /
>>     should / may they support it even though they can not perform this
>>     analysis?  There is a paragraph that seems to be trying to talk about this,
>>     but I was left confused about what was expected.  Part of my confusion is
>>     that the text treats this inability as rare, whereas in my experience for
>>     network elements such inability is common.
> The module does not mandate any root-cause -, impact analysis or correlation capabilities.
> The purpose of this section is to describe optional leafs in the alarms relating to presenting the result of such analysis, if supported.
> If the system has no such capabilities, the optional leafs are not used and this section can be ignored.
> I will make that clear in this  section. It would be fatal if the reader did not use the module assuming it put requirements on correlation.
>>
>>     It took me a while to realize what the text in 3.7 (and 4.1.1) about not
>>     generating notification is talking about.   The problem is that with all
>>     the effort to make clear that alarms are not notifications, I missed the
>>     fact that an alarm being raised (or re-raised) does itself cause a
>>     notification.  And that it is this re-raise notification (and other
>>     severity change, clearing, etc notifications) that are suppressed by the
>>     shelving.   It seems to me that there needs to be better explanation of
>>     this in or before 3.7.
> Ok, will improve description.
>>
>>     Reading the YANG for shelving alarms, it looks to me that while it can do
>>     what is described earlier in the document, the conceptual structure is VERY
>>     different.  From the YANG, to shelve a specific alarm one has to create a
>>     named shelf whose conditions identify the specific alarm.  To selve several
>>     alarms that are related (for example, when the operator looks at a list and
>>     selects several items to shelve) the system will likely have to create
>>     multiple shelves, give each a unique name, and put the different alarm
>>     identifiers in each one.
> The data-model uses a leaf-list for the resource which makes it possible to define one shelf for several resources.
> However your comments made us aware of alarm-type and alarm-qualifier just being leafs. As you point out this may
> lead to situations where you need to configure several shelfs for shelving different alarms relating to the same reason.
> We will change this so that several alarm-types/alarm-qualifiers can also be defined for one shelf.
> With this change, any arbitrary group of alarms can be configured as one shelf.
> Also, we will make the list ordered-by user, and add to the
> description that the first matching shelf is used.
> 
> Thanks for pointing this out.
> 
>> To unshelve alarms, one has to find the named
>>     shelf which has caused the shelving.   This seems very awkward.  It seems
>>     to have been designed to enable one to store the shelving reason separate
>>     from the alarm itself.  It introduces the odd effect that if the shelves
>>     are used with conditions that can match more than one thing, then one could
>>     have several shelves shelving the same alarm, and an effort to unshelve
>>     might well not produce the desired result. Assuming that this complexity is
>>     desired by the working group, I would ask that it be explicitly called out
>>     in the descriptive portions of the document.
> See above, with the proposed change, it will always be one shelf.
> Finding the shelf with the shelf name should not be awkward.
> Note well that it is likely that there are several alarms that are shelved due to the same shelf configuration.
> Take the straight-forward shelving of all alarms from a specific interface.
> Different alarms from that interface will then be shelved and it is straight-forward to delete the shelf configuration that says, “if X/Y/Z under test”.
> This is an important feature of shelving.
> 
> See above about the change to ordered-by-user, this will address your issue of several shelves addressing the same alarm.
> Again, thanks for your review making us aware of this issue!
>>
>> Nits:
>>         In section 4.4 (overview of The Alarm List) tree showing the components
>>         of the purge-alarm operation, is there any way to make clear that the
>>         enumeration called alarm-status is the enumeration of filter choices
>>         related to whether the alarm is cleared?  Maybe rename it
>>         alarm-cleared-filter?
> Ok, will consider this
> 
> br Stefan and Martin
>>
>>
>

[RTG-DIR] Rtgdir last call review of draft-ietf-c… Joel Halpern
Re: [RTG-DIR] Rtgdir last call review of draft-ie… stefan vallin
Re: [RTG-DIR] Rtgdir last call review of draft-ie… Joel M. Halpern