Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

"A. Jean Mahoney" <mahoney@nostrum.com> Sat, 16 July 2016 19:48 UTC

Return-Path: <mahoney@nostrum.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9FBE312D15A for <dime@ietfa.amsl.com>; Sat, 16 Jul 2016 12:48:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.187
X-Spam-Level:
X-Spam-Status: No, score=-3.187 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-1.287] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V6uwFWQSd8kR for <dime@ietfa.amsl.com>; Sat, 16 Jul 2016 12:48:21 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 16E3512D142 for <dime@ietf.org>; Sat, 16 Jul 2016 12:48:20 -0700 (PDT)
Received: from mutabilis-2.local ([173.57.161.14]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id u6GJmFnw031637 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sat, 16 Jul 2016 14:48:16 -0500 (CDT) (envelope-from mahoney@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host [173.57.161.14] claimed to be mutabilis-2.local
To: Maria Cruz Bartolome <maria.cruz.bartolome@ericsson.com>, Steve Donovan <srdonovan@usdonovans.com>, "dime@ietf.org" <dime@ietf.org>
References: <5b31616d-efa3-ac03-8f1c-bd8883a35d65@gmail.com> <087A34937E64E74E848732CFF8354B9219758407@ESESSMB101.ericsson.se> <3e2082d80d8e45caaca581c9dcc98468@CSRRDU1EXM025.corp.csra.com> <71796571-c370-cae8-d456-9d2dfb02544c@usdonovans.com> <087A34937E64E74E848732CFF8354B921975C3F4@ESESSMB101.ericsson.se> <71ffc339-37e0-e4fd-a16e-59da7fe23b6d@usdonovans.com> <087A34937E64E74E848732CFF8354B921975E5AB@ESESSMB101.ericsson.se>
From: "A. Jean Mahoney" <mahoney@nostrum.com>
Message-ID: <192cffa8-1760-67f4-cc53-3ed16848ebd2@nostrum.com>
Date: Sat, 16 Jul 2016 14:48:14 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <087A34937E64E74E848732CFF8354B921975E5AB@ESESSMB101.ericsson.se>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/hgYEOsYcSWhB1S394sA4fCC7zE4>
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Jul 2016 19:48:22 -0000

Hi all,

Some observations below:

<snip, and apologies for my mail client butchering the includes>

<JPG> Agree with the thought- if "Little Server" is 30% utilized
>> and "Big Server" is 50% utilized, it still makes sense to send more
>> traffic to Big Server.  But I am not sure if that is withn the
>> scope of this document. </JPG>

SRD> I don't understand the concern.
>> The load values supplied will be input into the route selection
>> algorithm as specified in RFC2782.  If a node isn't getting enough
>> traffic it will change its load value to a lower value and will
>> start getting more traffic.

MCRUZ> Unless the LOAD info provided is
>> in fact a value that represents the available capacity, then the
>> load balancing will not select the less loaded server. Being able
>> to select the less loaded server is the whole purpose of this
>> mechanism, then we need to find a way to provide a LOAD value from
>> different servers that we are able to compare, i.e. the value
>> provide must indicate the available capacity regardless the static
>> capacity of each server.

SRD> I view the goal of this a little differently.  The goal is to
> make sure that requests are delivered to nodes with available
> capacity.  It is not strictly necessary that every request goes to
> the least loaded node.

MCRUZ> Well, I do not agree. The whole purpose
> of providing LOAD info is to be able to choose a node with available
> load (I agree), but among the node with available load we need to
> choose the least loaded (or one of the least loaded). It does not
> make sense, in my opinion, to simply select a node with available
> load, when we are providing info about load. The information provided
> should be valid to be able to select the least (or close to) loaded.

[ajm]  (With my implementer's hat on) Having clients chase the 
least-loaded server can go wrong, especially when lighting up a network, 
and a server's load goes from 0 to fully loaded really quickly. 
Depending on the design and timing, all the clients think the first 
server is the least loaded and they *all* pick it. Boom - the server is 
now maxed out. Clients should *distribute* their load across servers.

Now, distribution cannot be completely even, but that's OK. Because 
we're talking about load here. Not *overload*. If you've designed your 
system correctly, a fully loaded server is *not* in danger of overload. 
For instance - you've designed your system with 3 servers in a cluster 
that can handle one of their cluster mates going down. When "fully" 
loaded, each server in the cluster should never be so loaded such that 
they cannot handle one of their mates going down and taking on half of 
that mate's traffic.

*Overload* should be a rare event - like a tornado wiping out a chunk of 
your mobile network, and everybody calling and texting everybody else to 
make sure that they're ok.

Thanks!

Jean (who has actually had a customer point to the aftermath of a 
tornado and tell her, "Your solution has to handle THAT.")

<snip>