Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Maria Cruz Bartolome <maria.cruz.bartolome@ericsson.com> Mon, 18 July 2016 07:25 UTC

Return-Path: <maria.cruz.bartolome@ericsson.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BFF9612D155 for <dime@ietfa.amsl.com>; Mon, 18 Jul 2016 00:25:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g5i6qKvpPLsB for <dime@ietfa.amsl.com>; Mon, 18 Jul 2016 00:25:17 -0700 (PDT)
Received: from sessmg22.ericsson.net (sessmg22.ericsson.net [193.180.251.58]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B50BB12D13E for <dime@ietf.org>; Mon, 18 Jul 2016 00:25:16 -0700 (PDT)
X-AuditID: c1b4fb3a-f79386d00000467b-f1-578c845ae81f
Received: from ESESSHC018.ericsson.se (Unknown_Domain [153.88.183.72]) by sessmg22.ericsson.net (Symantec Mail Security) with SMTP id 06.E3.18043.A548C875; Mon, 18 Jul 2016 09:25:14 +0200 (CEST)
Received: from ESESSMB101.ericsson.se ([169.254.1.179]) by ESESSHC018.ericsson.se ([153.88.183.72]) with mapi id 14.03.0294.000; Mon, 18 Jul 2016 09:25:14 +0200
From: Maria Cruz Bartolome <maria.cruz.bartolome@ericsson.com>
To: "A. Jean Mahoney" <mahoney@nostrum.com>, Steve Donovan <srdonovan@usdonovans.com>, "dime@ietf.org" <dime@ietf.org>
Thread-Topic: [Dime] WGLC #1 for draft-ietf-dime-load-02
Thread-Index: AQHRtdE4jPBCVA2NJ0e3rvn8VsJS8Z/qPcJQgAnlegCAAXp7AIAHc18QgAOELwCAAE15MIAaxhMAgAJ0uOA=
Date: Mon, 18 Jul 2016 07:25:13 +0000
Message-ID: <087A34937E64E74E848732CFF8354B92197696BD@ESESSMB101.ericsson.se>
References: <5b31616d-efa3-ac03-8f1c-bd8883a35d65@gmail.com> <087A34937E64E74E848732CFF8354B9219758407@ESESSMB101.ericsson.se> <3e2082d80d8e45caaca581c9dcc98468@CSRRDU1EXM025.corp.csra.com> <71796571-c370-cae8-d456-9d2dfb02544c@usdonovans.com> <087A34937E64E74E848732CFF8354B921975C3F4@ESESSMB101.ericsson.se> <71ffc339-37e0-e4fd-a16e-59da7fe23b6d@usdonovans.com> <087A34937E64E74E848732CFF8354B921975E5AB@ESESSMB101.ericsson.se> <192cffa8-1760-67f4-cc53-3ed16848ebd2@nostrum.com>
In-Reply-To: <192cffa8-1760-67f4-cc53-3ed16848ebd2@nostrum.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [153.88.183.148]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpjkeLIzCtJLcpLzFFi42KZGbHdQzeqpSfcYP9BBYu5vSvYLBo6V7Ja bGjicWD2WLLkJ5PHrJ1PWDxWve1jDWCO4rJJSc3JLEst0rdL4Mr40fCGteCWQsXyxjXMDYxz pLoYOTkkBEwk/u+ZzgRhi0lcuLeerYuRi0NI4AijxIm/S5ggnCWMEvu3zmAGqWITsJO4dPoF WIeIQIXEgf4djCC2sIC5ROvhy1BxC4nPpw8B1XMA2UkSbxeEgYRZBFQlli7/ywJi8wr4Svy+ fYAZYv47ZokHP5ewgiQ4BewlXi8/A1bECHTR91NrwGYyC4hL3HoyH+pSAYkle84zQ9iiEi8f /2OFsJUkGpc8YYWo15O4MXUKG4StLbFs4WtmiMWCEidnPmGZwCg6C8nYWUhaZiFpmYWkZQEj yypG0eLU4uLcdCMjvdSizOTi4vw8vbzUkk2MwNg5uOW31Q7Gg88dDzEKcDAq8fAuON4dLsSa WFZcmXuIUYKDWUmEt6WhJ1yINyWxsiq1KD++qDQntfgQozQHi5I4r/9LxXAhgfTEktTs1NSC 1CKYLBMHp1QDY2ZFUVGfrGLQx2kXggqPiEtmXDm6bL6Aw+87qfcu35oXFcPlUf5G4sJpnxa/ 7OPeXCKO+90mVB7cEHgsp2zXean+1r4+mSfP1mw0Lz2+kiPw2NJVXC1aUtOnLr++pLP25oeg oEeZC7PCfpXcSPRfdnXa7guPE9cXv5y22svmRLdZZ2Xams8iYkosxRmJhlrMRcWJAHGcQaCZ AgAA
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/swMm_f1Rv5lbor5pMKffRddUiqI>
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jul 2016 07:25:19 -0000

Hello Jean,

Thanks for providing your view.
See MCRUZ 2> comments below
Best regards
/MCruz

-----Original Message-----
From: A. Jean Mahoney [mailto:mahoney@nostrum.com] 
Sent: sábado, 16 de julio de 2016 21:48
To: Maria Cruz Bartolome; Steve Donovan; dime@ietf.org
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hi all,

Some observations below:

<snip, and apologies for my mail client butchering the includes>

<JPG> Agree with the thought- if "Little Server" is 30% utilized
>> and "Big Server" is 50% utilized, it still makes sense to send more 
>> traffic to Big Server.  But I am not sure if that is withn the scope 
>> of this document. </JPG>

SRD> I don't understand the concern.
>> The load values supplied will be input into the route selection 
>> algorithm as specified in RFC2782.  If a node isn't getting enough 
>> traffic it will change its load value to a lower value and will start 
>> getting more traffic.

MCRUZ> Unless the LOAD info provided is
>> in fact a value that represents the available capacity, then the load 
>> balancing will not select the less loaded server. Being able to 
>> select the less loaded server is the whole purpose of this mechanism, 
>> then we need to find a way to provide a LOAD value from different 
>> servers that we are able to compare, i.e. the value provide must 
>> indicate the available capacity regardless the static capacity of 
>> each server.

SRD> I view the goal of this a little differently.  The goal is to
> make sure that requests are delivered to nodes with available 
> capacity.  It is not strictly necessary that every request goes to the 
> least loaded node.

MCRUZ> Well, I do not agree. The whole purpose
> of providing LOAD info is to be able to choose a node with available 
> load (I agree), but among the node with available load we need to 
> choose the least loaded (or one of the least loaded). It does not make 
> sense, in my opinion, to simply select a node with available load, 
> when we are providing info about load. The information provided should 
> be valid to be able to select the least (or close to) loaded.

[ajm]  (With my implementer's hat on) Having clients chase the least-loaded server can go wrong, especially when lighting up a network, and a server's load goes from 0 to fully loaded really quickly. 
Depending on the design and timing, all the clients think the first server is the least loaded and they *all* pick it. Boom - the server is now maxed out. Clients should *distribute* their load across servers.
MCRUZ 2> Yes, the purpose of the draft is to provide a LOAD value that is valid to distribute load across servers.

Now, distribution cannot be completely even, but that's OK. Because we're talking about load here. Not *overload*. If you've designed your system correctly, a fully loaded server is *not* in danger of overload. 
MCRUZ 2> The issue is to be able to provide a LOAD value that allows the client to perform load distribution. If we do not take the weight into account, somehow (implementation dependent), the distribution will be very far from even, it may cause very important traffic oscillations (e.g. small servers will appear as low loaded but if traffic is sent towards then they may reach overload threshold very soon) and big server will normally be underutilized. Therefore, the expected load distribution is far from being achieved.

For instance - you've designed your system with 3 servers in a cluster that can handle one of their cluster mates going down. When "fully" 
loaded, each server in the cluster should never be so loaded such that they cannot handle one of their mates going down and taking on half of that mate's traffic.

*Overload* should be a rare event - like a tornado wiping out a chunk of your mobile network, and everybody calling and texting everybody else to make sure that they're ok.

Thanks!

Jean (who has actually had a customer point to the aftermath of a tornado and tell her, "Your solution has to handle THAT.")

<snip>