Re: [RTG-DIR] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06

"Joel M. Halpern" <> Tue, 15 January 2019 22:38 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C7D7512867A; Tue, 15 Jan 2019 14:38:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id TQTw4wP57Dq5; Tue, 15 Jan 2019 14:38:22 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id C8346127598; Tue, 15 Jan 2019 14:38:22 -0800 (PST)
Received: from localhost (localhost []) by (Postfix) with ESMTP id 43fQGk3kHWzFpr3; Tue, 15 Jan 2019 14:38:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=2.tigertech; t=1547591902; bh=l5NqbsgXTS73ZaiAHzTgUjNMSsviiL5ujTILDkZqzcU=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=erxOs6wfSM8AC03ojc/2FQ4q96uT/JVBrp9rmqMPFhfC29TWQO6N8LSFzEJkTmVlP YR8DqLTEnlp5uCf4V5XvXU71FBZhFRKWb+vMcWM1XXMA8w4hjtPIz3XuXwiYLaZ/pb x+CP6N6isSvWsA0SdS8wxu9lIYRMyRN1HkO50mOw=
X-Virus-Scanned: Debian amavisd-new at
Received: from Joels-MacBook-Pro.local ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 43fQGj2hrBz13Kbh; Tue, 15 Jan 2019 14:38:21 -0800 (PST)
To: stefan vallin <>
Cc:,, "CCAMP (" <>,
References: <> <>
From: "Joel M. Halpern" <>
Message-ID: <>
Date: Tue, 15 Jan 2019 17:38:20 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.4.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <>
Subject: Re: [RTG-DIR] Rtgdir last call review of draft-ietf-ccamp-alarm-module-06
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Area Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 15 Jan 2019 22:38:25 -0000

Thank you Stefan.  Your proposed clarifications sound like they will 
addres my concerns.


On 1/15/19 4:52 PM, stefan vallin wrote:
> Hi Joel!
> Thanks for your review, really helpful!
> See inline:
>> On 10 Jan 2019, at 18:21, Joel Halpern <> wrote:
>> Minor Issues:
>>     The first paragraph of section 3.6 (Root Cause, Impacted Resources and
>>     Related Alarms) has a confused "not", a missing preposition, and a typoed
>>     conjunction, making it very hard to be sure what is intended.  I believe
>>     the first part of the sentence should read: "The recommendation is to have
>>     a single alarm for the underlying problem and list …"
> That sentence is really broken in v6, my fat fat fingers. This was also pointed out by Gert Grammel.
> It should read:
>   The recommendation is to have a single alarm
>   for the underlying problem and list the affected resources in the
>   alarm, rather than having separate alarms for each resource
>>     There is a larger issue about system behavior and root cause analysis that
>>     I think should be discussed in this section.  Root cause analysis and
>>     side-effect analysis are not simple tasks.  It is common for them to be
>>     performed outside of network elements.  When such is performed outside of a
>>     network element, it is unclear what the implications are.  Is it the intent
>>     that network elements that can not perform root cause analysis and impacted
>>     resource determination should NOT support this YANG module?  Or can /
>>     should / may they support it even though they can not perform this
>>     analysis?  There is a paragraph that seems to be trying to talk about this,
>>     but I was left confused about what was expected.  Part of my confusion is
>>     that the text treats this inability as rare, whereas in my experience for
>>     network elements such inability is common.
> The module does not mandate any root-cause -, impact analysis or correlation capabilities.
> The purpose of this section is to describe optional leafs in the alarms relating to presenting the result of such analysis, if supported.
> If the system has no such capabilities, the optional leafs are not used and this section can be ignored.
> I will make that clear in this  section. It would be fatal if the reader did not use the module assuming it put requirements on correlation.
>>     It took me a while to realize what the text in 3.7 (and 4.1.1) about not
>>     generating notification is talking about.   The problem is that with all
>>     the effort to make clear that alarms are not notifications, I missed the
>>     fact that an alarm being raised (or re-raised) does itself cause a
>>     notification.  And that it is this re-raise notification (and other
>>     severity change, clearing, etc notifications) that are suppressed by the
>>     shelving.   It seems to me that there needs to be better explanation of
>>     this in or before 3.7.
> Ok, will improve description.
>>     Reading the YANG for shelving alarms, it looks to me that while it can do
>>     what is described earlier in the document, the conceptual structure is VERY
>>     different.  From the YANG, to shelve a specific alarm one has to create a
>>     named shelf whose conditions identify the specific alarm.  To selve several
>>     alarms that are related (for example, when the operator looks at a list and
>>     selects several items to shelve) the system will likely have to create
>>     multiple shelves, give each a unique name, and put the different alarm
>>     identifiers in each one.
> The data-model uses a leaf-list for the resource which makes it possible to define one shelf for several resources.
> However your comments made us aware of alarm-type and alarm-qualifier just being leafs. As you point out this may
> lead to situations where you need to configure several shelfs for shelving different alarms relating to the same reason.
> We will change this so that several alarm-types/alarm-qualifiers can also be defined for one shelf.
> With this change, any arbitrary group of alarms can be configured as one shelf.
> Also, we will make the list ordered-by user, and add to the
> description that the first matching shelf is used.
> Thanks for pointing this out.
>> To unshelve alarms, one has to find the named
>>     shelf which has caused the shelving.   This seems very awkward.  It seems
>>     to have been designed to enable one to store the shelving reason separate
>>     from the alarm itself.  It introduces the odd effect that if the shelves
>>     are used with conditions that can match more than one thing, then one could
>>     have several shelves shelving the same alarm, and an effort to unshelve
>>     might well not produce the desired result. Assuming that this complexity is
>>     desired by the working group, I would ask that it be explicitly called out
>>     in the descriptive portions of the document.
> See above, with the proposed change, it will always be one shelf.
> Finding the shelf with the shelf name should not be awkward.
> Note well that it is likely that there are several alarms that are shelved due to the same shelf configuration.
> Take the straight-forward shelving of all alarms from a specific interface.
> Different alarms from that interface will then be shelved and it is straight-forward to delete the shelf configuration that says, “if X/Y/Z under test”.
> This is an important feature of shelving.
> See above about the change to ordered-by-user, this will address your issue of several shelves addressing the same alarm.
> Again, thanks for your review making us aware of this issue!
>> Nits:
>>         In section 4.4 (overview of The Alarm List) tree showing the components
>>         of the purge-alarm operation, is there any way to make clear that the
>>         enumeration called alarm-status is the enumeration of filter choices
>>         related to whether the alarm is cleared?  Maybe rename it
>>         alarm-cleared-filter?
> Ok, will consider this
> br Stefan and Martin