Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hi MCruz

I analysed your proposal about RDL and please see my JJ2> comment below your last comments in 5.

Best regards

JJacques 

-----Message d'origine-----
De : DiME [mailto:dime-bounces@ietf.org] De la part de Maria Cruz Bartolome
Envoyé : jeudi 30 juin 2016 14:50
À : Trottin, Jean-Jacques (Nokia - FR); Steve Donovan; dime@ietf.org
Objet : Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hello JJ,

Nice to listening from you again.
See comments below
/MCruz

-----Original Message-----
From: Trottin, Jean-Jacques (Nokia - FR) [mailto:jean-jacques.trottin@nokia.com]
Sent: jueves, 30 de junio de 2016 13:00
To: Maria Cruz Bartolome; Steve Donovan; dime@ietf.org
Cc: Uveges, Balint (Nokia - HU/Budapest); Wiehe, Ulrich (Nokia - DE/Munich)
Subject: RE: [Dime] WGLC #1 for draft-ietf-dime-load-02

Dear all

About discussion regarding draft-ietf-dime-load-02, I was in line with the new 02 version Steve distributed some time ago. I here reviewed Maria Cruz comments and Steve's reactions.

Globally I remain in line with the Steve's hereafter comments. My main comment is on .5 about server capacity. The other updates have for me no protocol impact and were mainly wording enhancements  and  are worthwhile for me .
Please see my few comments in line (with JJ>). Main one is in 5. about the capacity topic

I would also take this opportunity to indicate that due to my job evolution, my colleague Balint Uveges will follow the load/overload dime aspects in IETF.

Best regards

JJacques

-----Message d'origine-----
De : DiME [mailto:dime-bounces@ietf.org] De la part de Maria Cruz Bartolome Envoyé : mercredi 29 juin 2016 19:19 À : Steve Donovan; dime@ietf.org Objet : Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hello Steve,
Thanks for the responses, see some more comments below Best regards /MCruz

On 6/27/16 2:18 AM, Maria Cruz Bartolome wrote:
>
>> 4.1
>> Now:
>>      None of this prevents a Diameter node from deciding to reduce the
>>      offered load based on load information.   .
>>
>> Proposed
>>     (remove)
>>
>> Reasoning:
>> This sentence is not properly linked to previous paragraph and it is 
>> covered by previous paragraph already
>>
>> <JPG> OK with this, though not sure it is necessary to delete.</JPG>
> SRD> This sentence adds emphasis to the point that a similar result 
> SRD> can
> happen between load and overload, leading into the next sentence outlining the fundamental difference between the two.  I don't see the harm in leaving it, even if what is says is implied by the previous paragraph.
> MCRUZ> My problem with the sentence is that it is not straight forward to what refers "none of this". The reader will look above to check what it refers to... and it seems to be the whole paragraph, i.e. the differences between load and overload. But this sentence refers again to something that is mentioned above. Then, I think the sentence, as it is, is misleading that turns reading a bit unease.
SRD> How about: "A Diameter node can, however, decide to reduce offered
load based on load information."
MCRUZ> Fine

>> 5.
>> Now
>>      The second big difference between DOIC and Load is visibility of the
>>      DOIC or Load information within a Diameter network.  DOIC information
>>      is sent end-to-end resulting in the ability of all nodes in the path
>>      of the answer message that carries the OC-OLR AVP to act on the
>>      information.  The DOIC overload reports much remain in the message
>>      all the way from the reporting node to the node that is the target
>>      for the answer message.
>>
>>      For the Load mechanism there are two types of load reports.
>>
>>      The first is the load of the endpoint sending the answer message.
>>      This load report is carried end-to-end to enable any nodes that make
>>      server selection decisions to use the load status of the sending
>>      endpoint as part of  the server selection decision.
>>
>>      The second type of load report is a peer report.  This report is used
>>      by Diameter nodes as part of the logic to select the next hop
>>      Diameter node and, as such, do not have significance beyond the peer
>>      node.  These load reports are removed by the first supporting
>>      Diameter node to receive the report.
>>
>> Proposed:
>>      The second big difference between DOIC and Load is visibility of the
>>      DOIC or Load information within a Diameter network.  DOIC information
>>      is sent end-to-end resulting in the ability of all nodes in the path
>>      of the answer message that carries the OC-OLR AVP to act on the
>>      information, *although only one node can actually consume the report*.  The DOIC overload reports much remain in the message
>>      all the way from the reporting node to the node that is the target
>>      for the answer message.
> SRD> How about "although only one node actually reacts to the report",
> changing consume to react.
> MCRUZ> I think "consume" is better since it implies that from then on the report is removed.
SRD> How about consume and react?  "although only one node actually
consumes and reacts to the report"
MCRUZ> Fine
JJ> this would be the only place (here and I also think in DOIC RFC) where "consume" word is used and raising the question what "consume" means, in particular this does not imply to remove the report. "react" was OK for me but no opposition to "consume and react".  

>
>>      *However,* for the Load mechanism there are two types of load reports *and only the
>>       first one is transmitted end-to-end*.
> SRD> This is covered in the following paragraphs.
> MCRUZ> Yes, but I think we need an introduction for the analysis below, in order to understand we are going to compare. Trying to ease reading.
SRD> Okay.
...
>
> 5.
> Now
>     The goal is make it possible to use both the load values received as
>      a part of the Diameter Load mechanism and weight values received as a
>      result of a DNS SRV query.  As a result, the Diameter load value has
>      a range of 0-65535.  This value and DNS SRV weight values are then
>      used in a distribution algorithm similar to that specified in
>      [RFC2782].
>
> Comments:
> In order to have an efficient load balancing algorithm, it is not enough for the reacting node (for the node in charge of load balancing) to know the Load of each server, but it needs to know the load in relation to each server capacity. Unless we do so, the Load value of a server can't be compared with the Load of a Server with a different weight.
> Then, in my opinion, we need to find a way to provide a Load value that is in fact comparable with the rest of the Load values of the servers in the group.
> Reflecting a bit longer on this, I think we need then to define a group of servers in the load-balancing group, like a load-balancing context, and then, for all servers in such a group we need to provide a relative value of dynamic Load.
>
> <JPG> Agree with the thought- if "Little Server" is 30% utilized and 
> "Big Server" is 50% utilized, it still makes sense to send more 
> traffic to Big Server.  But I am not sure if that is withn the scope 
> of this document. </JPG>
> SRD> I don't understand the concern.  The load values supplied will be
> input into the route selection algorithm as specified in RFC2782.  If 
> a node isn't getting enough traffic it will change its load value to a 
> lower value and will start getting more traffic.
> MCRUZ> Unless the LOAD info provided is in fact a value that represents the available capacity, then the load balancing will not select the less loaded server. Being able to select the less loaded server is the whole purpose of this mechanism, then we need to find a way to provide a LOAD value from different servers that we are able to compare, i.e. the value provide must indicate the available capacity regardless the static capacity of each server.
SRD> I view the goal of this a little differently.  The goal is to make
sure that requests are delivered to nodes with available capacity.  It is not strictly necessary that every request goes to the least loaded node.
MCRUZ> Well, I do not agree. The whole purpose of providing LOAD info is to be able to choose a node with available  load (I agree), but among the node with available load we need to choose the least loaded (or one of the least loaded). It does not make sense, in my opinion, to simply select a node with available load, when we are providing info about load. The information provided should be valid to be able to select the least (or close to) loaded.

> Providing an example, let me use dynamic Load (say DL) in % (100% is totally loaded) that I found it easier for calculation:
> - Server1: weight=1500; DL= 2%
> - Server2: weight=55000; DL= 70%
> Then, if we only use DL in the LB algorithm, obviously Server 1 seems to be clearly less loaded, but however, taking into account its weight is much smaller it may be the other way around. In fact, if traffic is redirected to this server, it may get overloaded rapidly (due to its small capacity).
> One possible way to calculate the relative DL is  to divide it by the weight, then for this example:
> - Server1 RDL= 10000 * (2/1500) = 13.33
> - Server2 RDL= 10000 * (70/55000) = 12.73 (I multiplied by 10000 
> simply to get rid of the decimals for our discussion).
> Then, we actually find out that available load for both servers is pretty similar. In fact, in this case, a correct load balancing should select Server2 as the less loaded server instead of server1.
> My proposal is to consider this reflection in the draft, and then make a clear distinction between dynamic load (DL) and RELATIVE DL. We need to provide the RDL in the message, not DL.
SRD> This is about how the load value is calculated which is explicitly
stated as being an implementation decision.
MCRUZ> Not exactly. We need to reflect in the draft that the LOAD provided should be the relative available load, taking into account the static weight. This is the only way we are providing a load value that can possibly be used by a client to LOAD-balance. 
I could accept that we leave the way to do so up to implementations.
Proposal: "LOAD should be calculated in a way that reflects the available load independently of the weight of each server, in order to allow the Diameter node that performs server selection to accurateraly compare values from different servers, i.e. LOAD value identifies the same amount of available capacity, regardless the server that has calculate it. "

JJ2> I analysed a bit more your example with Server1 RDL= 10000 * (2/1500) = 13.33 and Server2 RDL= 10000 * (70/55000) = 12.73 with the conclusion to select server2. This is a bit surprising as server 1 is only 2% loaded. This example is rather specific with a server 1 weight being 2,7% of server 2 weight. I did another example with less difference in the weights
- Server1: weight=30000; DL= 30%
- Server2: weight=60000; DL= 50%
This drives to 
Server1 RDL= 10000 * (30/30000) = 10,0
Server2 RDL= 10000 * (50/60000) = 8,3	 
Here also, if I follow your reasoning, this would drive to select server 2 to increase its RDL. Again the result is to increase server2 load

Even by taking a 80% load for server 2 (so a high load in practice) and 50% for server 1 
Server1 RDL= 10000 * (50/30000) = 16,7
Server2 RDL= 10000 * (80/60000) = 13,3
This still drives to select server 2, although the reasoning would be to increase server 1 load
Nevertheless, if server 1 has only 30% load  its RDL becomes 10 and it will be selected, so here OK  

Please  check if I am wrong somewhere, but currently RDL, for me, can give strange outputs.

About static weight I agree that static weight can be useful, e.g. a last hop DA can be configured with  the server weight to distribute its traffic among the servers it is connected to.    

My point is about the targeted load balance between the servers. Often, the objective is to have the same load among servers (even if they have some difference in their capacity / weight), which is the way to maximize the traffic without entering overload in any server. So the "DL" (as defined in the current draft) indicates whether they have the same load, and if the objective is achieved. For me I do not well see how you define the targeted load among servers with the RDL you mentioned.

 If received load from servers is not the same, the sending node has to send a bit more traffic to the less loaded node. For this, as you said, an objective is to avoid oscillations, and sending node has to evaluate the amount of traffic it will switch from the more loaded server to the less loaded server, this switched traffic being not too high to avoid oscillations and also not too low to avoid maintaining unbalanced situation. In the draft, it is left to implementation on the sending node on how to modify the current traffic distribution among the servers according to the received load (DL), and I am OK on this. In my previous mail  I indicated that the sending node will adjust its traffic distribution according to the updated load (DL) received from server and converge to the balanced situation, in this process, I agree that the weight attached to each server can be an additional useful input when available, but keeping the current load (DL) definition <JJ2>

JJ> I think we can remain with only the relative load information  proposed in the draft even when servers have different capacity. If a small capacity node sees its incoming traffic increasing it will quickly react by sending a higher load value, which, when received by a sending node, will become higher than the values  received from other nodes with higher capacity. Sending node will then reduce the traffic towards this node to ensure load balancing and will continue to adjust according to the load values received. This seems a simple loopback mechanism ensuring a right load balancing.
MCRUZ> This does not assure a proper load-balancing, because the client does not have information it can compare, then it does not know which server is less loaded, not even approximately. Obviously, when a server gets more loaded it will provide the information, but this will cause oscillations, that could even be critical for a server. For example, if one server has a very small weight, compared to the rest, it may be selected as the destination of requests but it would get easily loaded, and again, it needs to react. 
Moreover, the servers in the pool with higher capacity will be normally underutilized. In general, resources are not efficiently use (bigger server tend to be underutilized), the load fluctuates a lot (specially for the small servers), and some servers may be overloaded with peaks of traffic (small servers).
Then, my proposal is to include a normative sentence, as above, although the way to specifically do it may be operator specific. Then, on top of that, the example I provided is useful to understand the situation and I think should be in the draft as well.

Do you think it is not sufficient? To add capacity value (weight, group of servers) ..)increases the increases the complexity given this capacity may vary over time (eg a partial failure) so possibly requiring to be dynamically updated. 
More sophisticated behaviors can be introduced (implementation specific) without impacting the protocol and AVPs specified in the draft. If justified requirements drive to new AVPs, this could be part of future evolution, out of the scope of the present draft .

>
>
>> 5.
>> Now
>>      The load report includes the relative load of the sending node.  This
>>      relative load is specified in a manner consistent with that defined
>>      for DNS SRV [RFC2782].
>>
>> Proposed:
>>      The load report includes a value to identify the load of the sending node,
>>     specified in a manner consistent with that defined
>>      for DNS SRV [RFC2782].
>>
>> <JPG> Agree. </JPG>
> SRD> I don't understand the need for this change.
> MCRUZ> Using "relative" is misleading unless we clarify "relative to what".
SRD> Okay.  How about a small change to: "The load report includes a
value **indicating** the load of the sending node..."
MCRUZ> Fine

JJ> "relative" also used in the proposed update definition of "load" in section 2, which is consistent with the definition of the Load-Value AVP in 7.3., so for me relative is not misleading.
MCRUZ> I made same comment to that section. I think Steve's proposal is fine and more accurate.

>>
...
>> 6.1.1
>> Now:
>>      The method for determining the load value included in the load report
>>      is an implementation decision.
>>
>> Comments:
>> In line to comment above, I agree it should be implementation specific, but we need to provide some guidance to be able to provide a value that could be used to achieve a successful load balancing.
> SRD> See my comment above about DNS SRV algorithm.
> MCRUZ> This is related to my comment above to 5, but to the part related to a way to provide a LOAD value that represents the available capacity of a server, taking into account its static capacity.
SRD> Okay, I'll propose some text, based on your example, in the next
version of the draft.  This would be a non normative example of how someone might compute the load value.
MCRUZ> Including the example is fine. Although above I proposed some normative text as well that I think we need to consider.

JJ> see my above comments to 5. The sender adjusts its traffic according the evolution of the received load values.  
MCRUZ> See my comments above. This causes a bunch of problems.

...
>
>> 7.3
>> Now:
>>      The Load-Value AVP is specified in a manner similar to the weight
>>      value in DNS SRV ([RFC2782]).
>>
>>      The Load-Value has a range of 0-65535.
>>
>>      A higher value indicates a lower load on the sending node.  A lower
>>      value indicates that the sending node is heavily loaded.
>>
>>         Stated another way, a node that has zero load would have a load
>>         value of 65535.  A node that is 100% loaded would have a load
>>         value of 0.
>>
>> Comments:
>> I think it could be easier to use a %. It is more straight forward to figure out what it means.
> SRD> Percentage can be mapped to the range 0-65535 if that is the
> internal implementation decision.  The goal here is to be consistent 
> with RFC2782.
> MCRUZ> Why do we need to keep consistency to that RFC? I think it is clearer to use a percentage, it is more straight forward to identify the available load we refer to.
SRD> This was discussed and agreed to early in the process of writing
this mechanism.  There a a couple of reasons, first, its an algorithm that has already been specified and implemented.  Second, it  allows someone who has already implemented the DNS SRV algorithm to reuse it.  
Third, while RFC6733 doesn't directly address load balancing/distribution, it does reference use of DNS SRV for handling dynamic connections.  It is not unreasonable to expect that there are implementations would use the DNS SRV value for nodes that don't support load, along with load values received.
MCRUZ> I do not remember a discussion about this, sorry. I had the impression it was incorporated without much discussion.
However, I do not see that it helps reusing DNS SRV. Does an implementation take profit of anything previously implemented for DNS SRV when deciding what load value include in the AVP? I think the server will calculate the LOAD, and then it needs to reflect a value from totally-available to totally-loaded. It is more straight forward and more intuitive to simply use 0-100 as you agreed below.
I did not consider the comment before, sorry, but I think now it can be easily changed and simplify all the implementations and interpretation of LOAD value, don't you think?

> E.g. 50% loaded, using SRV is 32767,5;  25% is 49151,25;  and so on.
> In the mechanism we are defining we do not have the need to keep using a complex value like this one, when we can simply use 0 to 100%, 0 (totally available), 100 (totally loaded).
> In fact, this is in line to the definition in the doc:
SRD> I really don't want to revisit this decision this late in the
game.  While not as intuitive to a casual reader of the specification as a percentage value might be, Using the DNS SRV value works.
...
JJ> Preliminary version of the draft started with %, then Steve proposed to use the same definition as with SRV which didn't raise comments, I am OK to remain on Steve proposal.
MCRUZ> There is no reason to keep it unless we think it has some advantages, which as I explained above, I do not think there are. Let me know if you see any advantages.

_______________________________________________
DiME mailing list
DiME@ietf.org
https://www.ietf.org/mailman/listinfo/dime

_______________________________________________
DiME mailing list
DiME@ietf.org
https://www.ietf.org/mailman/listinfo/dime