Re: [Dime] AD review: draft-ietf-dime-overload-reqs version 9

Eric McMurry <emcmurry@computer.org> Mon, 29 July 2013 05:32 UTC

Return-Path: <emcmurry@computer.org>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6472021F9FE5 for <dime@ietfa.amsl.com>; Sun, 28 Jul 2013 22:32:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.32
X-Spam-Level:
X-Spam-Status: No, score=-2.32 tagged_above=-999 required=5 tests=[AWL=0.052, BAYES_00=-2.599, SARE_SUB_OBFU_Q1=0.227]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1zCavjHlNT3j for <dime@ietfa.amsl.com>; Sun, 28 Jul 2013 22:32:16 -0700 (PDT)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) by ietfa.amsl.com (Postfix) with ESMTP id 9EBE021F9FE1 for <dime@ietf.org>; Sun, 28 Jul 2013 22:32:16 -0700 (PDT)
Received: from cpe-76-184-161-215.tx.res.rr.com ([76.184.161.215] helo=antikythera.casamcmurry.com) by mho-02-ewr.mailhop.org with esmtpa (Exim 4.72) (envelope-from <emcmurry@computer.org>) id 1V3g48-0002jc-1v; Mon, 29 Jul 2013 05:32:16 +0000
Received: from localhost (localhost [127.0.0.1]) by antikythera.casamcmurry.com (Postfix) with ESMTP id D692ED5B1D7; Mon, 29 Jul 2013 00:32:13 -0500 (CDT)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 76.184.161.215
X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information)
X-MHO-User: U2FsdGVkX19u2T4exj885BGwqu76jxf/9T5AqAuKIAo=
X-Virus-Scanned: amavisd-new at casamcmurry.com
Received: from antikythera.casamcmurry.com ([127.0.0.1]) by localhost (antikythera.casamcmurry.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Loc3rfW44m_x; Mon, 29 Jul 2013 00:32:13 -0500 (CDT)
Received: from [192.168.13.6] (unknown [192.168.13.6]) by antikythera.casamcmurry.com (Postfix) with ESMTPSA id 6FF68D5B1C9; Mon, 29 Jul 2013 00:32:10 -0500 (CDT)
Content-Type: text/plain; charset="iso-8859-1"
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
From: Eric McMurry <emcmurry@computer.org>
In-Reply-To: <51F3E910.2080603@cisco.com>
Date: Mon, 29 Jul 2013 07:32:05 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <3F64EE48-0B07-4C59-8A5B-46B66C0CE3C3@computer.org>
References: <51E5153F.3070101@cisco.com> <7FC7978E-7A8B-4874-AC96-CEFD304B15E9@computer.org> <F589A249-F4F0-40E5-BE5E-B5B6038B6E89@nostrum.com> <51EBE991.3020609@cisco.com> <6FEAA094-F6B9-43DE-B997-AD5A85D55EB9@nostrum.com> <51F3E910.2080603@cisco.com>
To: Benoit Claise <bclaise@cisco.com>
X-Mailer: Apple Mail (2.1508)
Cc: dime mailing list <dime@ietf.org>, draft-ietf-dime-overload-reqs.all@tools.ietf.org
Subject: Re: [Dime] AD review: draft-ietf-dime-overload-reqs version 9
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dime>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jul 2013 05:32:21 -0000

Hi Benoit,

done:

URL:             http://www.ietf.org/internet-drafts/draft-ietf-dime-overload-reqs-10.txt
Status:          http://datatracker.ietf.org/doc/draft-ietf-dime-overload-reqs
Htmlized:        http://tools.ietf.org/html/draft-ietf-dime-overload-reqs-10
Diff:            http://www.ietf.org/rfcdiff?url2=draft-ietf-dime-overload-reqs-10

Thanks again for your help with this.

Eric


On Jul 27, 2013, at 17:36 , Benoit Claise <bclaise@cisco.com> wrote:

> Hi Ben,
> 
> [not sure if that email was actually sent. Resending]
> 
>> Hi Benoit,
>> 
>> Do I read correctly that this particular issue requires no further action?
> I checked with the document shepherd, and you are right. No further action required on this issue.
> Please post a draft with the other agreed changes, and I'll progress the document
> 
> Regards, Benoit
>> 
>> A few more comments inline:
>> 
>> On Jul 21, 2013, at 9:00 AM, Benoit Claise <bclaise@cisco.com> wrote:
>> 
>>> On 19/07/2013 23:09, Ben Campbell wrote:
>> [...]
>> 
>>>> On Jul 16, 2013, at 8:28 AM, Eric McMurry <emcmurry@computer.org>
>>>>  wrote:
>>>> 
>>>>> ah, thanks for catching that.  Ben and I had been discussing this but I see responding to it was lost in the shuffle.  My apologies.
>>>>> 
>>>>> The definition uses the term resources, which could include a number of things.  For the case where insufficient bandwidth would prevent overload, I think that would only be true for a very simple topology.  With multiple connections to multiple elements, agents, shared backend resources, or any other more complex topologies, bandwidth issues could indeed manifest into overload issues that meet the definition.
>>>>> 
>>>>> I suspect that I am not understanding your point fully though.  Perhaps Ben can take a stab if I am not making sense.
>>>>> 
>>>>> 
>>>> I think the issue may be that we never meant for "resources" to necessarily mean "local resources". For example, an agent could itself become overloaded because it's not getting responses from an upstream server. This could be simple because the downstream view of the server appears overloaded in aggregate to downstream clients. (This is very close to your idea of "system" overload, I think.) But the agent could also suffer truly local overload due to queues or memory filling up, the need to retransmit requests, etc.
>>>> 
>>>> For the non-agent case, a server might depend on a remote database. If network congestion causes responses from the database server to be lost or slow down, the Diameter server can become overloaded.
>>>> 
>>>> Would it help if we added a note to point out that the mentioned "resources" do not necessarily have to be local to the Diameter node?
>>>> 
>>> I was able to narrow my source of confusion to a very specific point: what is an upstream diameter node?
>>> I took this "overload" definition:
>>>    Overload occurs when an element, such as a Diameter server or agent,
>>>    has insufficient resources to successfully process all of the traffic
>>> 
>>> it is receiving.
>>> Then I took this sentence:
>>>    External resources can include upstream Diameter nodes; for example,
>>>    a Diameter agent can become effectively overloaded if one or more
>>>    upstream nodes are overloaded.  While overload is not the same thing
>>>    as network congestion, network congestion can reduce a Diameter nodes
>>>    ability to process and respond to requests, thus contributing to
>>>    overload.
>>> 
>>> In my mind, I saw a picture like this:
>>> 
>>>                                       Overloaded
>>>                                       Upstream
>>>                                       Diameter                                      Diameter
>>>                                      Node                                             Node
>>>          -------------------->---------X
>>>                       request
>>> 
>>> So I was thinking: the Diameter node (on the right, on the drawing) didn't receive the request. So according to the definition, it can't be overloaded.
>> I agree that the node on the right is not overloaded in this example. But if the one on the left is an agent, the fact that transactions are failing between it and the node on the right may reduce it's ability to handle inbound requests from clients.
>> 
>>> I guess that you had a picture like this in mind.
>>>                                                                                             Overloaded
>>>                                                                                             Upstream
>>>                                       Diameter                                      Diameter
>>>                                       Node                                             Node
>>>                                                   --------------------> request
>>>                                                                                    X<----------   (*)
>>> 
>>> (*) the reply never arrived because the Overloaded Upstream Diameter Node is well ... overloaded
>> That is one possible case. A particularly bad one, even, since the node on the left is likely to start retrying requests.
>> 
>> Another example would be when a node depends on a non-Diameter remote resource. Imagine the same picture as the previous one, but the node on the right is a database server. If there's network congestion between the Diameter node and the database server, the Diameter node may not be able to operate at normal capacity.
>> 
>>> After checking  "Upstream" in RFC 6733, we're should be fine.
>>> 
>>>   Figure 7 provides an example of a message forwarded upstream by a
>>>    Diameter relay.
>>> 
>>>        +---------+ 1. Request  +---------+ 2. Request  +---------+
>>>        | Access  |------------>|Diameter |------------>|Diameter |
>>>        |         |             |         |             |  Home   |
>>>        | Device  |<------------|  Relay  |<------------| Server  |
>>>        +---------+  4. Answer  +---------+  3. Answer  +---------+
>>>                   (Missing AVP)           (Missing AVP)
>>> 
>>> My confusion. sorry.
>> No Problem--It seems like 6733 uses "upstream" and "downstream" differently than I am used to. (Same with "client" and "server").
>> 
>> Thanks!
>> 
>> Ben.
>> 
>> 
>> 
>