Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

"Trottin, Jean-Jacques (Nokia - FR)" <> Wed, 13 July 2016 14:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4F6F312D838 for <>; Wed, 13 Jul 2016 07:34:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.901
X-Spam-Status: No, score=-6.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 0tUqbMf_LO4x for <>; Wed, 13 Jul 2016 07:34:34 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 2C9F412D661 for <>; Wed, 13 Jul 2016 07:34:34 -0700 (PDT)
Received: from (unknown []) by Websense Email Security Gateway with ESMTPS id 856884203BF2E for <>; Wed, 13 Jul 2016 14:34:29 +0000 (GMT)
Received: from ( []) by (GMO-o) with ESMTP id u6DEYV6e019498 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for <>; Wed, 13 Jul 2016 14:34:31 GMT
Received: from ( []) by (GMO) with ESMTP id u6DEYPXB025050 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for <>; Wed, 13 Jul 2016 16:34:31 +0200
Received: from ([]) by ([]) with mapi id 14.03.0195.001; Wed, 13 Jul 2016 16:34:28 +0200
From: "Trottin, Jean-Jacques (Nokia - FR)" <>
To: "" <>
Thread-Topic: [Dime] WGLC #1 for draft-ietf-dime-load-02
Thread-Index: AQHRtdE4tMkqClz2NEuuQj0gjsd2sp/qPcJQgAnlegCAAXp7AIAHXWgAgAOaJgCAADIrgIABPp8ggAAIp4CAB5znIIADJcoAgAnFkrA=
Date: Wed, 13 Jul 2016 14:34:27 +0000
Message-ID: <>
References: <> <> <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
x-originating-ip: []
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <>
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 13 Jul 2016 14:34:39 -0000

Hi MCruz, all,

Thanks for your feedback. 
I still have some understanding issues

See comments <JJ3> to your last reflections.

Best regards

-----Message d'origine-----
De : DiME [] De la part de Maria Cruz Bartolome
Envoyé : jeudi 7 juillet 2016 11:10
À : Trottin, Jean-Jacques (Nokia - FR);
Objet : Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hello JJacques, all,

See comments only to your last reflections.
Best regards

====== from previous emails (begin) =================================
> 5.
> Now
>     The goal is make it possible to use both the load values received as
>      a part of the Diameter Load mechanism and weight values received as a
>      result of a DNS SRV query.  As a result, the Diameter load value has
>      a range of 0-65535.  This value and DNS SRV weight values are then
>      used in a distribution algorithm similar to that specified in
>      [RFC2782].
> Comments:
> In order to have an efficient load balancing algorithm, it is not enough for the reacting node (for the node in charge of load balancing) to know the Load of each server, but it needs to know the load in relation to each server capacity. Unless we do so, the Load value of a server can't be compared with the Load of a Server with a different weight.
> Then, in my opinion, we need to find a way to provide a Load value that is in fact comparable with the rest of the Load values of the servers in the group.
> Reflecting a bit longer on this, I think we need then to define a group of servers in the load-balancing group, like a load-balancing context, and then, for all servers in such a group we need to provide a relative value of dynamic Load.
> <JPG> Agree with the thought- if "Little Server" is 30% utilized and 
> "Big Server" is 50% utilized, it still makes sense to send more 
> traffic to Big Server.  But I am not sure if that is withn the scope 
> of this document. </JPG>
> SRD> I don't understand the concern.  The load values supplied will be
> input into the route selection algorithm as specified in RFC2782.  If 
> a node isn't getting enough traffic it will change its load value to a 
> lower value and will start getting more traffic.
> MCRUZ> Unless the LOAD info provided is in fact a value that represents the available capacity, then the load balancing will not select the less loaded server. Being able to select the less loaded server is the whole purpose of this mechanism, then we need to find a way to provide a LOAD value from different servers that we are able to compare, i.e. the value provide must indicate the available capacity regardless the static capacity of each server.
SRD> I view the goal of this a little differently.  The goal is to make
sure that requests are delivered to nodes with available capacity.  It is not strictly necessary that every request goes to the least loaded node.
MCRUZ> Well, I do not agree. The whole purpose of providing LOAD info is to be able to choose a node with available  load (I agree), but among the node with available load we need to choose the least loaded (or one of the least loaded). It does not make sense, in my opinion, to simply select a node with available load, when we are providing info about load. The information provided should be valid to be able to select the least (or close to) loaded.

> Providing an example, let me use dynamic Load (say DL) in % (100% is totally loaded) that I found it easier for calculation:
> - Server1: weight=1500; DL= 2%
> - Server2: weight=55000; DL= 70%
> Then, if we only use DL in the LB algorithm, obviously Server 1 seems to be clearly less loaded, but however, taking into account its weight is much smaller it may be the other way around. In fact, if traffic is redirected to this server, it may get overloaded rapidly (due to its small capacity).
> One possible way to calculate the relative DL is  to divide it by the weight, then for this example:
> - Server1 RDL= 10000 * (2/1500) = 13.33
> - Server2 RDL= 10000 * (70/55000) = 12.73 (I multiplied by 10000 
> simply to get rid of the decimals for our discussion).
> Then, we actually find out that available load for both servers is pretty similar. In fact, in this case, a correct load balancing should select Server2 as the less loaded server instead of server1.
> My proposal is to consider this reflection in the draft, and then make a clear distinction between dynamic load (DL) and RELATIVE DL. We need to provide the RDL in the message, not DL.
SRD> This is about how the load value is calculated which is explicitly
stated as being an implementation decision.
MCRUZ> Not exactly. We need to reflect in the draft that the LOAD provided should be the relative available load, taking into account the static weight. This is the only way we are providing a load value that can possibly be used by a client to LOAD-balance. 
I could accept that we leave the way to do so up to implementations.
Proposal: "LOAD should be calculated in a way that reflects the available load independently of the weight of each server, in order to allow the Diameter node that performs server selection to accurateraly compare values from different servers, i.e. LOAD value identifies the same amount of available capacity, regardless the server that has calculate it. "

JJ2> I analysed a bit more your example with Server1 RDL= 10000 *
JJ2> (2/1500) = 13.33 and Server2 RDL= 10000 * (70/55000) = 12.73 with 
JJ2> the conclusion to select server2. This is a bit surprising as 
JJ2> server 1 is only 2% loaded. This example is rather specific with a 
JJ2> server 1 weight being 2,7% of server 2 weight. I did another 
JJ2> example with less difference in the weights
- Server1: weight=30000; DL= 30%
- Server2: weight=60000; DL= 50%
This drives to
Server1 RDL= 10000 * (30/30000) = 10,0
Server2 RDL= 10000 * (50/60000) = 8,3	 
Here also, if I follow your reasoning, this would drive to select server 2 to increase its RDL. Again the result is to increase server2 load

Even by taking a 80% load for server 2 (so a high load in practice) and 50% for server 1
Server1 RDL= 10000 * (50/30000) = 16,7
Server2 RDL= 10000 * (80/60000) = 13,3
This still drives to select server 2, although the reasoning would be to increase server 1 load Nevertheless, if server 1 has only 30% load  its RDL becomes 10 and it will be selected, so here OK  

Please  check if I am wrong somewhere, but currently RDL, for me, can give strange outputs.

About static weight I agree that static weight can be useful, e.g. a last hop DA can be configured with  the server weight to distribute its traffic among the servers it is connected to.    

My point is about the targeted load balance between the servers. Often, the objective is to have the same load among servers (even if they have some difference in their capacity / weight), which is the way to maximize the traffic without entering overload in any server. So the "DL" (as defined in the current draft) indicates whether they have the same load, and if the objective is achieved. For me I do not well see how you define the targeted load among servers with the RDL you mentioned.

 If received load from servers is not the same, the sending node has to send a bit more traffic to the less loaded node. For this, as you said, an objective is to avoid oscillations, and sending node has to evaluate the amount of traffic it will switch from the more loaded server to the less loaded server, this switched traffic being not too high to avoid oscillations and also not too low to avoid maintaining unbalanced situation. In the draft, it is left to implementation on the sending node on how to modify the current traffic distribution among the servers according to the received load (DL), and I am OK on this. In my previous mail  I indicated that the sending node will adjust its traffic distribution according to the updated load (DL) received from server and converge to the balanced situation, in this process, I agree that the weight attached to each server can be an additional useful input when available, but keeping the current load (DL) definition <JJ2> ====== from previous emails (end) ==========================

About the examples you discussed above. 
I think the results you got are valid. Take into account that the static weight identifies the server capability, then a server1 with weight:60000 has double resources than a server2 with 30000. Then, server2 45% load is making usage of double of resources than server1 90% load.
DL/Weight provides a value than indicates the load per "resource unit". Then, the least loaded server is the one that has less load per "resource unit".
This can be seen in the following example:
Server1 RDL= 10000 * (45/30000) = 15
Server2 RDL= 10000 * (90/60000) = 15
Both servers are equally loaded, as long as the Big Server (Server2) is loaded double as the Small Server (Server1), that is half size in resources than Big Server.

Then, JJacques, I think we agree the Diameter node that is responsible for server selection should have the means to select the least loaded server, and the available load depends on the capacity of each node, not only on the DL that is identified at a moment in time.
Then, I think we need to include some normative text on that, although the specific means to achieve so could remain implementation specific. Proposed text is:
"LOAD should be calculated in a way that reflects the available load independently of the weight of each server, in order to allow the Diameter node that performs server selection to accurately compare values from different servers, i.e. LOAD value identifies the same amount of available capacity, regardless the server that has calculate it.  The means to calculate the LOAD value that fulfils this requirement are implementation specific."
I think, as Steve agreed, that the example could be included in the draft as well, it is very useful to understand how the static weight determines the available load.

<JJ3> I analysed your above new example which raises same questioning from my side:
By applying the factor 10000 for simplification we have 
Server1 weight  6 so I can say that Server1  has a capacity of 6 resource units   
Server2 weight  3 so a capacity of 3 resource units.
Server1 load (DL)= 90% so Server 1 consumed resource units= 6x90% = 5,4
Server2 load (DL)= 45% so Server 2 consumed resource units= 3x45% = 1,65
Server1 remaining (available)  resources= 6-5,4 = 0,6  
Server2 remaining (available)  resources= 3-1,35 = 1,65 

So in my understanding server2 (the smaller server) has still more available capacity than server 1, so I would conclude to transfer some traffic from server 1 to server 2;  But you said that both servers are equally loadedas having the same RDL=15, and concluded the system being well balanced, which I do not agree

You also mention that we should have the "same load per resource unit", which also raises questioning from me:
If Server1 has a load of 90% for its 6 resource units, this also means that each resource unit has a load of 90%. 
I have difficulty to understand the definition of RDL = DL/Weight
So I think we are not yet well aligned on a common understanding of the example, so thanks if you can give still some more explanation.

I continued the same exercise but with my assumptions
a) objective (according to my previous mail) is that the two servers have the same load, this drives to DL=75% with 
Server1 load (DL)= 75% so Server 1 consumed resource units= 75% = 4,5
Server2 load (DL)= 75% so Server 2 consumed resource units= 3x75% = 2,25
Server1 remaining resources= 6-4,5 = 1,75  
Server2 remaining resources= 3-2,25 = 0,75 but the % of remaining resources compare to its capacity is the same for each server

b) another possible objective is that the two servers have the same remaining (available) resource, , this drives to 
Server1 load (DL)= 81,3% so Server 1 consumed resource units= 6x81,3% = 4,88
Server2 load (DL)= 62,4% so Server 2 consumed resource units= 3x62,4% = 1,87	
Server1 remaining resources= 6-4,88 = 1,12  
Server2 remaining resources= 3-1,87 = 1,13 so same remaining resources

Is this b) case relating more to your concern? The "least loaded" node would be the one having the highest remaining capacity, so with traffic transfer to this node until both nodes have the same remaining capacity

Other type of load balancing objectives may be considered, but load balancing  objectives are for me operator dependent and implementation specific.

Then to come back to weights:
- case a), as I said, according to received load from servers senders, a sender can adjust the traffic to converge to the same/nearly same load among servers. The fact to know server weights would improve the convergence to this objective 
- Case b) needs to have knowledge on the server weights as this is needed to evaluate the remaining resources objectives

As indicated, server weights can be configured (eg for DAs in front of servers) or obtained from a DNS query (as Steve mentioned), or through other means that are out of the scope.

I would like we share the same understanding before finalizing a normative text <JJ3/>

Best regards


DiME mailing list