Re: [Dime] [dime] #46: Bad normative advice on not letting overload reports expire

Ben Campbell <ben@nostrum.com> Wed, 26 March 2014 00:37 UTC

Return-Path: <ben@nostrum.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F6761A026C for <dime@ietfa.amsl.com>; Tue, 25 Mar 2014 17:37:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fbRUnXMYwKHY for <dime@ietfa.amsl.com>; Tue, 25 Mar 2014 17:37:43 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) by ietfa.amsl.com (Postfix) with ESMTP id D24581A0263 for <dime@ietf.org>; Tue, 25 Mar 2014 17:37:42 -0700 (PDT)
Received: from [10.0.1.29] (cpe-173-172-146-58.tx.res.rr.com [173.172.146.58]) (authenticated bits=0) by nostrum.com (8.14.8/8.14.7) with ESMTP id s2Q0baEn099630 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 25 Mar 2014 19:37:37 -0500 (CDT) (envelope-from ben@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host cpe-173-172-146-58.tx.res.rr.com [173.172.146.58] claimed to be [10.0.1.29]
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\))
From: Ben Campbell <ben@nostrum.com>
In-Reply-To: <4CAA0308-08B6-4F7D-9A51-7A4AAA833404@att.com>
Date: Tue, 25 Mar 2014 19:37:35 -0500
X-Mao-Original-Outgoing-Id: 417487055.641266-4e74abae78c1927c252ff41f6c0f7875
Content-Transfer-Encoding: quoted-printable
Message-Id: <B8B9348F-FE43-4242-AF73-02FA7201C6D4@nostrum.com>
References: <057.8b248d3cb5db23879c2730b80d4657d7@trac.tools.ietf.org> <B08CCDA3-4E2B-444A-AE27-9DE2D9C0B458@gmail.com> <4B803326-40A9-4E98-AC12-7DDF46BD101B@nostrum.com> <A9CA33BB78081F478946E4F34BF9AAA014D6979E@xmb-rcd-x10.cisco.com> <087A34937E64E74E848732CFF8354B9209772E9C@ESESSMB101.ericsson.se> <58574389-BAEB-49DA-A07E-B6648905C291@gmail.com> <533097D1.3090803@usdonovans.com>, <D24C5BAB-C9CD-4AA1-8F1D-AB21D25EDB01@nostrum.com> <4CAA0308-08B6-4F7D-9A51-7A4AAA833404@att.com>
To: MARTIN C DOLLY <md3135@att.com>
X-Mailer: Apple Mail (2.1874)
Archived-At: http://mailarchive.ietf.org/arch/msg/dime/C928J718clvhG6i2AKm31qvZekI
Cc: "dime@ietf.org" <dime@ietf.org>
Subject: Re: [Dime] [dime] #46: Bad normative advice on not letting overload reports expire
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Mar 2014 00:37:46 -0000

A reporting node has two ways of ending a reported overload condition. It can forcefully expire the report, by sending an updated report with a zero expiration time. Or it can let the report expire by not updating it.

This is analogous to how SIP handles registrations and subscriptions. You can remove one by sending a new REGISTER or SUBSCRIBE with expires=0.  Or if the registration or subscription is close to ending anyway, you can let it expire.

The "consensus" position says that you SHOULD do the first. That implies you SHOULD NOT do the second. My position is that we don't need that restriction; either choice is okay as long as it happens in a timely manner. In fact there may be times where sending an updated report does nothing but add unneeded messages.

I will go further to say that a normative preference for one or the other violates RFC 2119 guidance for normative language, in that the SHOULD is not necessary for interoperability, because both strategies work. And neither approach does damage, as long as it is timely.

Now, I agree that the reporting node SHOULD ensure that the overload report is invalidated in a timely manner. I just don't think we need to prefer one method of doing that over another.


On Mar 25, 2014, at 6:51 PM, DOLLY, MARTIN C <md3135@att.com> wrote:

> Why, sorry but please repeat or I go with I believe consensus
> 
> Thanks
> 
> Martin Dolly
> Lead Member of Technical Staff
> Core Network & Gov't/Regulatory Standards  
> AT&T Standards and Industry Alliances
> +1-609-903-3360
> md3135@att.com
> 
>> On Mar 25, 2014, at 7:17 PM, "Ben Campbell" <ben@nostrum.com> wrote:
>> 
>> I do not agree. While this fixes a related problem of using a zero validity-duration to signal the end of an overload condition, it still implies that one SHOULD NOT let a report "just expire". As I've argued before, I believe there are time when it is just as good, if not better, to let an overload condition expire naturally.
>> 
>> Here's a quote of my argument to that effect from further down the thread:
>> 
>>> I think it's reasonable to say that a reporting node should terminate an overload condition in a timely manner. But if it's about to expire anyway, then expiration might be just as timely as an explicit report. 
>>> 
>>> And of course, the definition of "timely" is somewhat a matter of policy. For example, I can imagine an deployment that had a large number of clients using fairly short validity durations, and _never_ explicitly signaling an end to an overload condition. This adds a bit of a "slow-start" to the recovery, since different clients will expire the overload condition at different times, and the load will ramp up gradually. I don't see anything wrong with that. Of course, it wouldn't work if one chose long validity durations, or if the signaling of overload to different clients happened in close synchronization.
>> 
>> So, here's a different proposal for your first paragraph:
>> 
>>  "When a reporting node has recovered from overload, it SHOULD invalidate any existing overload reports in a timely matter. This can be achieved by sending an updated overload report (meaning the OLR contains a new sequence number) with the OC-Validity-Duration AVP value set to zero ("0"). If the overload report is about to expire naturally, the reporting node MAY choose to simply let it do so."
>> 
>> 
>>> On Mar 24, 2014, at 3:38 PM, Steve Donovan <srdonovan@usdonovans.com> wrote:
>>> 
>>> Here's some proposed wording that will hopefully let us close this issue:
>>> 
>>> Regards,
>>> 
>>> Steve
>>> 
>>> -----
>>> 
>>> Section 4.5., paragraph 3 - 
>>> 
>>> Current -02 wording:
>>> 
>>> As a general guidance for implementations it is RECOMMENDED never to
>>>  let any overload report to timeout.  Following to this rule, an
>>>  overload endpoint should explicitly signal the end of overload
>>>  condition and not rely on the expiration of the validity time of the
>>>  overload report in the reacting node.  This is achieved by sending an
>>>  updated overload report (meaning it must contain a new sequence
>>>  number) with the OC-Validity-Duration AVP value set to zero ("0").
>>> 
>>> Proposed wording:
>>> 
>>>  A reporting node SHOULD explicitly signal the end of overload
>>>  condition in a timely manner.  This is achieved by sending an
>>>  updated overload report (meaning the OLR contains a new sequence
>>>  number) with the OC-Validity-Duration AVP value set to zero ("0").
>>> 
>>> A reacting node MUST invalidate and remove an overload report that
>>> expires without an explicit overload report containing an OC-Validity-Duration
>>> value set to zero ("0").
>>> 
>>> 
>>>> On 2/11/14 4:31 PM, Jouni Korhonen wrote:
>>>> Fine with me.
>>>> 
>>>> - Jouni
>>>> 
>>>> On Feb 11, 2014, at 12:24 PM, Maria Cruz Bartolome 
>>>> <maria.cruz.bartolome@ericsson.com>
>>>> wrote:
>>>> 
>>>> 
>>>>> Ben, Nirav,
>>>>> 
>>>>> I follow same argumentation.
>>>>> Regards
>>>>> /MCruz
>>>>> 
>>>>> -----Original Message-----
>>>>> From: DiME [
>>>>> mailto:dime-bounces@ietf.org
>>>>> ] On Behalf Of Nirav Salot (nsalot)
>>>>> Sent: martes, 11 de febrero de 2014 11:23
>>>>> To: Ben Campbell; Jouni Korhonen
>>>>> Cc: 
>>>>> dime@ietf.org list; draft-docdt-dime-ovli@tools.ietf.org
>>>>> 
>>>>> Subject: Re: [Dime] [dime] #46: Bad normative advice on not letting overload reports expire
>>>>> 
>>>>> Ben,
>>>>> 
>>>>> I resonate with your thinking below.
>>>>> 
>>>>> Regards,
>>>>> Nirav.
>>>>> 
>>>>> -----Original Message-----
>>>>> From: DiME [
>>>>> mailto:dime-bounces@ietf.org
>>>>> ] On Behalf Of Ben Campbell
>>>>> Sent: Monday, February 10, 2014 9:54 PM
>>>>> To: Jouni Korhonen
>>>>> Cc: 
>>>>> dime@ietf.org list; draft-docdt-dime-ovli@tools.ietf.org
>>>>> 
>>>>> Subject: Re: [Dime] [dime] #46: Bad normative advice on not letting overload reports expire
>>>>> 
>>>>> 
>>>>> On Feb 10, 2014, at 5:16 AM, Jouni Korhonen 
>>>>> <jouni.nospam@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> 
>>>>>> My reasoning for explicit termination was that knowing the 
>>>>>> implementation folks they will let overload conditions expire unless advised otherwise.
>>>>>> And having unnecessary stuff hanging around waiting for a cleanup is 
>>>>>> not a good thing in general. But I am open here for other options..
>>>>> I think it's reasonable to say that a reporting node should terminate an overload condition in a timely manner. But if it's about to expire anyway, then expiration might be just as timely as an explicit report. 
>>>>> 
>>>>> And of course, the definition of "timely" is somewhat a matter of policy. For example, I can imagine an deployment that had a large number of clients using fairly short validity durations, and _never_ explicitly signaling an end to an overload condition. This adds a bit of a "slow-start" to the recovery, since different clients will expire the overload condition at different times, and the load will ramp up gradually. I don't see anything wrong with that. Of course, it wouldn't work if one chose long validity durations, or if the signaling of overload to different clients happened in close synchronization.
>>>>> 
>>>>> _______________________________________________
>>>>> DiME mailing list
>>>>> 
>>>>> DiME@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/dime
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> DiME mailing list
>>>>> 
>>>>> DiME@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/dime
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> DiME mailing list
>>>>> 
>>>>> DiME@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/dime
>>>> _______________________________________________
>>>> DiME mailing list
>>>> 
>>>> DiME@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/dime
>>> 
>>> _______________________________________________
>>> DiME mailing list
>>> DiME@ietf.org
>>> https://www.ietf.org/mailman/listinfo/dime
>> 
>> _______________________________________________
>> DiME mailing list
>> DiME@ietf.org
>> https://www.ietf.org/mailman/listinfo/dime