Return-Path: <jean-jacques.trottin@nokia.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
 by ietfa.amsl.com (Postfix) with ESMTP id 12EBD12DC79
 for <dime@ietfa.amsl.com>; Tue, 19 Jul 2016 00:19:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.902
X-Spam-Level: 
X-Spam-Status: No, score=-6.902 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001,
 SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44])
 by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id Ia94csDRu9v7 for <dime@ietfa.amsl.com>;
 Tue, 19 Jul 2016 00:18:54 -0700 (PDT)
Received: from smtp-fr.alcatel-lucent.com (fr-hpida-esg-02.alcatel-lucent.com
 [135.245.210.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ietfa.amsl.com (Postfix) with ESMTPS id 28C1612D100
 for <dime@ietf.org>; Tue, 19 Jul 2016 00:18:54 -0700 (PDT)
Received: from fr712umx4.dmz.alcatel-lucent.com (unknown [135.245.210.45])
 by Websense Email Security Gateway with ESMTPS id 43E909E0DA212;
 Tue, 19 Jul 2016 07:18:50 +0000 (GMT)
Received: from fr712usmtp2.zeu.alcatel-lucent.com
 (fr712usmtp2.zeu.alcatel-lucent.com [135.239.2.42])
 by fr712umx4.dmz.alcatel-lucent.com (GMO-o) with ESMTP id u6J7IpMJ024382
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Tue, 19 Jul 2016 07:18:52 GMT
Received: from FR711WXCHHUB02.zeu.alcatel-lucent.com
 (fr711wxchhub02.zeu.alcatel-lucent.com [135.239.2.112])
 by fr712usmtp2.zeu.alcatel-lucent.com (GMO) with ESMTP id u6J7IfaU000351
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL);
 Tue, 19 Jul 2016 09:18:51 +0200
Received: from FR712WXCHMBA12.zeu.alcatel-lucent.com ([169.254.8.32]) by
 FR711WXCHHUB02.zeu.alcatel-lucent.com ([135.239.2.112]) with mapi id
 14.03.0195.001; Tue, 19 Jul 2016 09:18:48 +0200
From: "Trottin, Jean-Jacques (Nokia - FR)" <jean-jacques.trottin@nokia.com>
To: Maria Cruz Bartolome <maria.cruz.bartolome@ericsson.com>, "dime@ietf.org"
 <dime@ietf.org>
Thread-Topic: [Dime] WGLC #1 for draft-ietf-dime-load-02
Thread-Index: AQHRtdE4tMkqClz2NEuuQj0gjsd2sp/qPcJQgAnlegCAAXp7AIAHXWgAgAOaJgCAADIrgIABPp8ggAAIp4CAB5znIIADJcoAgAnFkrCAATPWAIAG/7gg
Date: Tue, 19 Jul 2016 07:18:47 +0000
Message-ID: <E194C2E18676714DACA9C3A2516265D29D5260BE@FR712WXCHMBA12.zeu.alcatel-lucent.com>
References: <5b31616d-efa3-ac03-8f1c-bd8883a35d65@gmail.com>
 <087A34937E64E74E848732CFF8354B9219758407@ESESSMB101.ericsson.se>
 <3e2082d80d8e45caaca581c9dcc98468@CSRRDU1EXM025.corp.csra.com>
 <71796571-c370-cae8-d456-9d2dfb02544c@usdonovans.com>
 <087A34937E64E74E848732CFF8354B921975C3F4@ESESSMB101.ericsson.se>
 <71ffc339-37e0-e4fd-a16e-59da7fe23b6d@usdonovans.com>
 <087A34937E64E74E848732CFF8354B921975E5AB@ESESSMB101.ericsson.se>
 <E194C2E18676714DACA9C3A2516265D29D520AC0@FR712WXCHMBA12.zeu.alcatel-lucent.com>
 <087A34937E64E74E848732CFF8354B921975E824@ESESSMB101.ericsson.se>
 <E194C2E18676714DACA9C3A2516265D29D52151C@FR712WXCHMBA12.zeu.alcatel-lucent.com>
 <087A34937E64E74E848732CFF8354B921975F63B@ESESSMB101.ericsson.se>
 <E194C2E18676714DACA9C3A2516265D29D524165@FR712WXCHMBA12.zeu.alcatel-lucent.com>
 <087A34937E64E74E848732CFF8354B9219761871@ESESSMB101.ericsson.se>
In-Reply-To: <087A34937E64E74E848732CFF8354B9219761871@ESESSMB101.ericsson.se>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [135.239.27.40]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/emVHua-8PXrW39tCzfpBQSRY82Y>
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>,
 <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>,
 <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Jul 2016 07:19:14 -0000

Hi MCruz=20

I continued to  investigate your inputs;  I am sorry but I still have issue=
s with your analysis and proposal.

We have to be careful on the definitions to put in the draft. In the curren=
t v2 draft we have:
- a definition in section 2 indicating the "relative capacity" of a node.=20
- the Load-Value AVP (section 7.3) conveying "relative load information" wi=
th a 0-65535 value range, value O corresponding to a fully loaded node and =
65535 to a node having no load=20

Dynamic Load (say DL) that you defined in % (100% is totally loaded) for ea=
sier calculations in a previous mail is equivalent to the above defined loa=
d although some difference in their value definition so in line with the cu=
rrent draft.

Then, you want to consider servers with different capacities (and I agree t=
hat the Load control mechanism covers this case); so you proposed to introd=
uce a "LOAD value that identifies the same amount of available capacity reg=
ardless the server that has calculate it ". For this you introduced the RDL=
 =3D DL/ Weight with the intent to have  a load per "resource unit", but th=
is RDL is not for me rightly defined and does not work (as shown in my prev=
ious mail) given that if a server comprising several resource units having =
a  global DL of 40% , this means that each resource units in average also h=
as a 40% load.  I do not think you can divide a DL by a weight.
So I do not yet identify how you evaluate this new Load value. It is import=
ant that the server sending this info and the traffic sender give the same =
meaning to this new Load value.  You need to do some proposals/ examples on=
 this calculation, and how you fulfill the load balancing objective.
=20
So I still remain on the current Load definition of the V2 draft and weight=
s (eg obtained by DNS), allowing each sender to evolve its traffic distribu=
tion towards a load balancing objective. I even described two possible obje=
ctives in a previous mail for the sender (according to operator policy):
- one ensuring each server to have the same proportion of available capacit=
y compared to its size (in fact to converge to the same DL).
- one ensuring each server to have the same available capacity independentl=
y of its size (your use case).

To note that when the overall traffic increases, the two above objectives w=
ill consume the capacities of all servers before entering in overload, whic=
h can also be an objective of the Load Control mechanism.

Best regards

JJacques=20


-----Message d'origine-----
De=A0: DiME [mailto:dime-bounces@ietf.org] De la part de Maria Cruz Bartolo=
me
Envoy=E9=A0: jeudi 14 juillet 2016 10:45
=C0=A0: Trottin, Jean-Jacques (Nokia - FR); dime@ietf.org
Objet=A0: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hello JJacques,

Thanks for your analysis.=20
I checked this through and I agree the proposal I made of a possible RDL ca=
lculation may not be providing the selection of the least loaded server in =
all cases, the examples you made are very good to understand that.
However, the concern behind keeps being perfectly valid, I think we need to=
 provide not the DL as such, but a value that takes into account the differ=
ent servers' weights, in order to be able to compare the amount of free res=
ources, that is, the capability of different servers to deal with traffic.
In case we have servers with different weights (what is the most common sce=
nario),  both may be 50% loaded (DL), and this value is not useful for the =
client to be able to perform load balancing. If a server has 4 times more r=
esources than the other, obviously that server has more free resources and =
it will be able to deal with more traffic. Therefore,  my concern is that D=
L does not provide enough information for the client to be able to perform =
load balancing, that is the ultimate expectation of this mechanism. We need=
 a "relative" DL, taking into account servers' weights.

Then, I would say that my proposal below keeps being valid.
We need to reflect in the draft that the LOAD provided should be the relati=
ve available load, taking into account the static weight. This is the only =
way we are providing a load value that can possibly be used by a client to =
LOAD-balance.=20
I agree that we leave the way to do so up to implementations.
Proposal: "LOAD should be calculated in a way that reflects the available l=
oad independently of the weight of each server, in order to allow the Diame=
ter node that performs server selection to accurately compare values from d=
ifferent servers, i.e. LOAD value identifies the same amount of available c=
apacity, regardless the server that has calculate it. "
=09
Let me know if you could agree with these conclusions, or if you have a dif=
ferent view.
Thanks
/MCruz


-----Original Message-----
From: DiME [mailto:dime-bounces@ietf.org] On Behalf Of Trottin, Jean-Jacque=
s (Nokia - FR)
Sent: mi=E9rcoles, 13 de julio de 2016 16:34
To: dime@ietf.org
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hi MCruz, all,

Thanks for your feedback.=20
I still have some understanding issues

See comments <JJ3> to your last reflections.

Best regards
JJacques

-----Message d'origine-----
De=A0: DiME [mailto:dime-bounces@ietf.org] De la part de Maria Cruz Bartolo=
me Envoy=E9=A0: jeudi 7 juillet 2016 11:10 =C0=A0: Trottin, Jean-Jacques (N=
okia - FR); dime@ietf.org Objet=A0: Re: [Dime] WGLC #1 for draft-ietf-dime-=
load-02

Hello JJacques, all,

See comments only to your last reflections.
Best regards
/MCruz


=3D=3D=3D=3D=3D=3D from previous emails (begin) =3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> 5.
> Now
>     The goal is make it possible to use both the load values received as
>      a part of the Diameter Load mechanism and weight values received as =
a
>      result of a DNS SRV query.  As a result, the Diameter load value has
>      a range of 0-65535.  This value and DNS SRV weight values are then
>      used in a distribution algorithm similar to that specified in
>      [RFC2782].
>
> Comments:
> In order to have an efficient load balancing algorithm, it is not enough =
for the reacting node (for the node in charge of load balancing) to know th=
e Load of each server, but it needs to know the load in relation to each se=
rver capacity. Unless we do so, the Load value of a server can't be compare=
d with the Load of a Server with a different weight.
> Then, in my opinion, we need to find a way to provide a Load value that i=
s in fact comparable with the rest of the Load values of the servers in the=
 group.
> Reflecting a bit longer on this, I think we need then to define a group o=
f servers in the load-balancing group, like a load-balancing context, and t=
hen, for all servers in such a group we need to provide a relative value of=
 dynamic Load.
>
> <JPG> Agree with the thought- if "Little Server" is 30% utilized and=20
> "Big Server" is 50% utilized, it still makes sense to send more=20
> traffic to Big Server.  But I am not sure if that is withn the scope=20
> of this document. </JPG>
> SRD> I don't understand the concern.  The load values supplied will be
> input into the route selection algorithm as specified in RFC2782.  If=20
> a node isn't getting enough traffic it will change its load value to a=20
> lower value and will start getting more traffic.
> MCRUZ> Unless the LOAD info provided is in fact a value that represents t=
he available capacity, then the load balancing will not select the less loa=
ded server. Being able to select the less loaded server is the whole purpos=
e of this mechanism, then we need to find a way to provide a LOAD value fro=
m different servers that we are able to compare, i.e. the value provide mus=
t indicate the available capacity regardless the static capacity of each se=
rver.
SRD> I view the goal of this a little differently.  The goal is to make
sure that requests are delivered to nodes with available capacity.  It is n=
ot strictly necessary that every request goes to the least loaded node.
MCRUZ> Well, I do not agree. The whole purpose of providing LOAD info is to=
 be able to choose a node with available  load (I agree), but among the nod=
e with available load we need to choose the least loaded (or one of the lea=
st loaded). It does not make sense, in my opinion, to simply select a node =
with available load, when we are providing info about load. The information=
 provided should be valid to be able to select the least (or close to) load=
ed.


> Providing an example, let me use dynamic Load (say DL) in % (100% is tota=
lly loaded) that I found it easier for calculation:
> - Server1: weight=3D1500; DL=3D 2%
> - Server2: weight=3D55000; DL=3D 70%
> Then, if we only use DL in the LB algorithm, obviously Server 1 seems to =
be clearly less loaded, but however, taking into account its weight is much=
 smaller it may be the other way around. In fact, if traffic is redirected =
to this server, it may get overloaded rapidly (due to its small capacity).
> One possible way to calculate the relative DL is  to divide it by the wei=
ght, then for this example:
> - Server1 RDL=3D 10000 * (2/1500) =3D 13.33
> - Server2 RDL=3D 10000 * (70/55000) =3D 12.73 (I multiplied by 10000=20
> simply to get rid of the decimals for our discussion).
> Then, we actually find out that available load for both servers is pretty=
 similar. In fact, in this case, a correct load balancing should select Ser=
ver2 as the less loaded server instead of server1.
> My proposal is to consider this reflection in the draft, and then make a =
clear distinction between dynamic load (DL) and RELATIVE DL. We need to pro=
vide the RDL in the message, not DL.
SRD> This is about how the load value is calculated which is explicitly
stated as being an implementation decision.
MCRUZ> Not exactly. We need to reflect in the draft that the LOAD provided =
should be the relative available load, taking into account the static weigh=
t. This is the only way we are providing a load value that can possibly be =
used by a client to LOAD-balance.=20
I could accept that we leave the way to do so up to implementations.
Proposal: "LOAD should be calculated in a way that reflects the available l=
oad independently of the weight of each server, in order to allow the Diame=
ter node that performs server selection to accurateraly compare values from=
 different servers, i.e. LOAD value identifies the same amount of available=
 capacity, regardless the server that has calculate it. "

JJ2> I analysed a bit more your example with Server1 RDL=3D 10000 *
JJ2> (2/1500) =3D 13.33 and Server2 RDL=3D 10000 * (70/55000) =3D 12.73 wit=
h=20
JJ2> the conclusion to select server2. This is a bit surprising as=20
JJ2> server 1 is only 2% loaded. This example is rather specific with a=20
JJ2> server 1 weight being 2,7% of server 2 weight. I did another=20
JJ2> example with less difference in the weights
- Server1: weight=3D30000; DL=3D 30%
- Server2: weight=3D60000; DL=3D 50%
This drives to
Server1 RDL=3D 10000 * (30/30000) =3D 10,0
Server2 RDL=3D 10000 * (50/60000) =3D 8,3	=20
Here also, if I follow your reasoning, this would drive to select server 2 =
to increase its RDL. Again the result is to increase server2 load

Even by taking a 80% load for server 2 (so a high load in practice) and 50%=
 for server 1
Server1 RDL=3D 10000 * (50/30000) =3D 16,7
Server2 RDL=3D 10000 * (80/60000) =3D 13,3
This still drives to select server 2, although the reasoning would be to in=
crease server 1 load Nevertheless, if server 1 has only 30% load  its RDL b=
ecomes 10 and it will be selected, so here OK =20

Please  check if I am wrong somewhere, but currently RDL, for me, can give =
strange outputs.

About static weight I agree that static weight can be useful, e.g. a last h=
op DA can be configured with  the server weight to distribute its traffic a=
mong the servers it is connected to.   =20

My point is about the targeted load balance between the servers. Often, the=
 objective is to have the same load among servers (even if they have some d=
ifference in their capacity / weight), which is the way to maximize the tra=
ffic without entering overload in any server. So the "DL" (as defined in th=
e current draft) indicates whether they have the same load, and if the obje=
ctive is achieved. For me I do not well see how you define the targeted loa=
d among servers with the RDL you mentioned.

 If received load from servers is not the same, the sending node has to sen=
d a bit more traffic to the less loaded node. For this, as you said, an obj=
ective is to avoid oscillations, and sending node has to evaluate the amoun=
t of traffic it will switch from the more loaded server to the less loaded =
server, this switched traffic being not too high to avoid oscillations and =
also not too low to avoid maintaining unbalanced situation. In the draft, i=
t is left to implementation on the sending node on how to modify the curren=
t traffic distribution among the servers according to the received load (DL=
), and I am OK on this. In my previous mail  I indicated that the sending n=
ode will adjust its traffic distribution according to the updated load (DL)=
 received from server and converge to the balanced situation, in this proce=
ss, I agree that the weight attached to each server can be an additional us=
eful input when available, but keeping the current load (DL) definition <JJ=
2> =3D=3D=3D=3D=3D=3D from previous emails (end) =3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

About the examples you discussed above.=20
I think the results you got are valid. Take into account that the static we=
ight identifies the server capability, then a server1 with weight:60000 has=
 double resources than a server2 with 30000. Then, server2 45% load is maki=
ng usage of double of resources than server1 90% load.
DL/Weight provides a value than indicates the load per "resource unit". The=
n, the least loaded server is the one that has less load per "resource unit=
".
This can be seen in the following example:
Server1 RDL=3D 10000 * (45/30000) =3D 15
Server2 RDL=3D 10000 * (90/60000) =3D 15
Both servers are equally loaded, as long as the Big Server (Server2) is loa=
ded double as the Small Server (Server1), that is half size in resources th=
an Big Server.

Then, JJacques, I think we agree the Diameter node that is responsible for =
server selection should have the means to select the least loaded server, a=
nd the available load depends on the capacity of each node, not only on the=
 DL that is identified at a moment in time.
Then, I think we need to include some normative text on that, although the =
specific means to achieve so could remain implementation specific. Proposed=
 text is:
"LOAD should be calculated in a way that reflects the available load indepe=
ndently of the weight of each server, in order to allow the Diameter node t=
hat performs server selection to accurately compare values from different s=
ervers, i.e. LOAD value identifies the same amount of available capacity, r=
egardless the server that has calculate it.  The means to calculate the LOA=
D value that fulfils this requirement are implementation specific."
I think, as Steve agreed, that the example could be included in the draft a=
s well, it is very useful to understand how the static weight determines th=
e available load.

<JJ3> I analysed your above new example which raises same questioning from =
my side:
By applying the factor 10000 for simplification we have=20
Server1 weight  6 so I can say that Server1  has a capacity of 6 resource u=
nits  =20
Server2 weight  3 so a capacity of 3 resource units.
Server1 load (DL)=3D 90% so Server 1 consumed resource units=3D 6x90% =3D 5=
,4
Server2 load (DL)=3D 45% so Server 2 consumed resource units=3D 3x45% =3D 1=
,65 Then
Server1 remaining (available)  resources=3D 6-5,4 =3D 0,6
Server2 remaining (available)  resources=3D 3-1,35 =3D 1,65=20

So in my understanding server2 (the smaller server) has still more availabl=
e capacity than server 1, so I would conclude to transfer some traffic from=
 server 1 to server 2;  But you said that both servers are equally loadedas=
 having the same RDL=3D15, and concluded the system being well balanced, wh=
ich I do not agree

You also mention that we should have the "same load per resource unit", whi=
ch also raises questioning from me:
If Server1 has a load of 90% for its 6 resource units, this also means that=
 each resource unit has a load of 90%.=20
I have difficulty to understand the definition of RDL =3D DL/Weight So I th=
ink we are not yet well aligned on a common understanding of the example, s=
o thanks if you can give still some more explanation.

I continued the same exercise but with my assumptions
a) objective (according to my previous mail) is that the two servers have t=
he same load, this drives to DL=3D75% with
Server1 load (DL)=3D 75% so Server 1 consumed resource units=3D 75% =3D 4,5
Server2 load (DL)=3D 75% so Server 2 consumed resource units=3D 3x75% =3D 2=
,25 Then
Server1 remaining resources=3D 6-4,5 =3D 1,75
Server2 remaining resources=3D 3-2,25 =3D 0,75 but the % of remaining resou=
rces compare to its capacity is the same for each server

b) another possible objective is that the two servers have the same remaini=
ng (available) resource, , this drives to
Server1 load (DL)=3D 81,3% so Server 1 consumed resource units=3D 6x81,3% =
=3D 4,88
Server2 load (DL)=3D 62,4% so Server 2 consumed resource units=3D 3x62,4% =
=3D 1,87=09
Then
Server1 remaining resources=3D 6-4,88 =3D 1,12
Server2 remaining resources=3D 3-1,87 =3D 1,13 so same remaining resources

Is this b) case relating more to your concern? The "least loaded" node woul=
d be the one having the highest remaining capacity, so with traffic transfe=
r to this node until both nodes have the same remaining capacity

Other type of load balancing objectives may be considered, but load balanci=
ng  objectives are for me operator dependent and implementation specific.

Then to come back to weights:
- case a), as I said, according to received load from servers senders, a se=
nder can adjust the traffic to converge to the same/nearly same load among =
servers. The fact to know server weights would improve the convergence to t=
his objective
- Case b) needs to have knowledge on the server weights as this is needed t=
o evaluate the remaining resources objectives

As indicated, server weights can be configured (eg for DAs in front of serv=
ers) or obtained from a DNS query (as Steve mentioned), or through other me=
ans that are out of the scope.

I would like we share the same understanding before finalizing a normative =
text <JJ3/>

Best regards

JJacques   =20


_______________________________________________
DiME mailing list
DiME@ietf.org
https://www.ietf.org/mailman/listinfo/dime

_______________________________________________
DiME mailing list
DiME@ietf.org
https://www.ietf.org/mailman/listinfo/dime

_______________________________________________
DiME mailing list
DiME@ietf.org
https://www.ietf.org/mailman/listinfo/dime

