Re: [Dime] AD review: draft-ietf-dime-overload-reqs version 9

Benoit Claise <bclaise@cisco.com> Sat, 27 July 2013 15:38 UTC

Return-Path: <bclaise@cisco.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E31FC21F9A13 for <dime@ietfa.amsl.com>; Sat, 27 Jul 2013 08:38:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.402
X-Spam-Level:
X-Spam-Status: No, score=-10.402 tagged_above=-999 required=5 tests=[AWL=-0.030, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vDSoIPbQtvMY for <dime@ietfa.amsl.com>; Sat, 27 Jul 2013 08:38:28 -0700 (PDT)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by ietfa.amsl.com (Postfix) with ESMTP id 3237321F8F4D for <dime@ietf.org>; Sat, 27 Jul 2013 08:38:22 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id r6RFcEGV011699; Sat, 27 Jul 2013 17:38:14 +0200 (CEST)
Received: from [10.61.214.14] ([10.61.214.14]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id r6RFbGCH000470; Sat, 27 Jul 2013 17:37:33 +0200 (CEST)
Message-ID: <51F3E910.2080603@cisco.com>
Date: Sat, 27 Jul 2013 17:36:48 +0200
From: Benoit Claise <bclaise@cisco.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130620 Thunderbird/17.0.7
MIME-Version: 1.0
To: Ben Campbell <ben@nostrum.com>
References: <51E5153F.3070101@cisco.com> <7FC7978E-7A8B-4874-AC96-CEFD304B15E9@computer.org> <F589A249-F4F0-40E5-BE5E-B5B6038B6E89@nostrum.com> <51EBE991.3020609@cisco.com> <6FEAA094-F6B9-43DE-B997-AD5A85D55EB9@nostrum.com>
In-Reply-To: <6FEAA094-F6B9-43DE-B997-AD5A85D55EB9@nostrum.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: draft-ietf-dime-overload-reqs.all@tools.ietf.org, dime mailing list <dime@ietf.org>
Subject: Re: [Dime] AD review: draft-ietf-dime-overload-reqs version 9
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dime>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Jul 2013 15:38:34 -0000

Hi Ben,

[not sure if that email was actually sent. Resending]

> Hi Benoit,
>
> Do I read correctly that this particular issue requires no further action?
I checked with the document shepherd, and you are right. No further 
action required on this issue.
Please post a draft with the other agreed changes, and I'll progress the 
document

Regards, Benoit
>
> A few more comments inline:
>
> On Jul 21, 2013, at 9:00 AM, Benoit Claise <bclaise@cisco.com> wrote:
>
>> On 19/07/2013 23:09, Ben Campbell wrote:
> [...]
>
>>> On Jul 16, 2013, at 8:28 AM, Eric McMurry <emcmurry@computer.org>
>>>   wrote:
>>>
>>>> ah, thanks for catching that.  Ben and I had been discussing this but I see responding to it was lost in the shuffle.  My apologies.
>>>>
>>>> The definition uses the term resources, which could include a number of things.  For the case where insufficient bandwidth would prevent overload, I think that would only be true for a very simple topology.  With multiple connections to multiple elements, agents, shared backend resources, or any other more complex topologies, bandwidth issues could indeed manifest into overload issues that meet the definition.
>>>>
>>>> I suspect that I am not understanding your point fully though.  Perhaps Ben can take a stab if I am not making sense.
>>>>
>>>>
>>> I think the issue may be that we never meant for "resources" to necessarily mean "local resources". For example, an agent could itself become overloaded because it's not getting responses from an upstream server. This could be simple because the downstream view of the server appears overloaded in aggregate to downstream clients. (This is very close to your idea of "system" overload, I think.) But the agent could also suffer truly local overload due to queues or memory filling up, the need to retransmit requests, etc.
>>>
>>> For the non-agent case, a server might depend on a remote database. If network congestion causes responses from the database server to be lost or slow down, the Diameter server can become overloaded.
>>>
>>> Would it help if we added a note to point out that the mentioned "resources" do not necessarily have to be local to the Diameter node?
>>>
>> I was able to narrow my source of confusion to a very specific point: what is an upstream diameter node?
>> I took this "overload" definition:
>>     Overload occurs when an element, such as a Diameter server or agent,
>>     has insufficient resources to successfully process all of the traffic
>>
>> it is receiving.
>> Then I took this sentence:
>>     External resources can include upstream Diameter nodes; for example,
>>     a Diameter agent can become effectively overloaded if one or more
>>     upstream nodes are overloaded.  While overload is not the same thing
>>     as network congestion, network congestion can reduce a Diameter nodes
>>     ability to process and respond to requests, thus contributing to
>>     overload.
>>
>> In my mind, I saw a picture like this:
>>
>>                                        Overloaded
>>                                        Upstream
>>                                        Diameter                                      Diameter
>>                                       Node                                             Node
>>           -------------------->---------X
>>                        request
>>
>> So I was thinking: the Diameter node (on the right, on the drawing) didn't receive the request. So according to the definition, it can't be overloaded.
> I agree that the node on the right is not overloaded in this example. But if the one on the left is an agent, the fact that transactions are failing between it and the node on the right may reduce it's ability to handle inbound requests from clients.
>
>> I guess that you had a picture like this in mind.
>>                                                                                              Overloaded
>>                                                                                              Upstream
>>                                        Diameter                                      Diameter
>>                                        Node                                             Node
>>                                                    --------------------> request
>>                                                                                     X<----------   (*)
>>
>> (*) the reply never arrived because the Overloaded Upstream Diameter Node is well ... overloaded
> That is one possible case. A particularly bad one, even, since the node on the left is likely to start retrying requests.
>
> Another example would be when a node depends on a non-Diameter remote resource. Imagine the same picture as the previous one, but the node on the right is a database server. If there's network congestion between the Diameter node and the database server, the Diameter node may not be able to operate at normal capacity.
>
>> After checking  "Upstream" in RFC 6733, we're should be fine.
>>
>>    Figure 7 provides an example of a message forwarded upstream by a
>>     Diameter relay.
>>
>>         +---------+ 1. Request  +---------+ 2. Request  +---------+
>>         | Access  |------------>|Diameter |------------>|Diameter |
>>         |         |             |         |             |  Home   |
>>         | Device  |<------------|  Relay  |<------------| Server  |
>>         +---------+  4. Answer  +---------+  3. Answer  +---------+
>>                    (Missing AVP)           (Missing AVP)
>>
>> My confusion. sorry.
> No Problem--It seems like 6733 uses "upstream" and "downstream" differently than I am used to. (Same with "client" and "server").
>
> Thanks!
>
> Ben.
>
>
>