Re: [Dime] Diameter load control inputs

Hi Ben

Thanks for your feedbac

Regarding your high level observations;

1) About  the figure you described  with a connection between  S1 and S2:

  —------S1
/        |
C1       |
\        |
  ——-----S2
You say that S1 or S2 can proxy a request, this is for me rather equivalent  to S1 having a collocated  proxy DA, so being a particular case of figure 5.3. So I suggest to continue to use 5.3 also covering servers with non colocated DAs, unless you would you like the above use case to be handled separately

I agree that the path C1>S1 (or C1>A1>S1 in 5.3) is more efficient than C1>S1>S2 ( or C1>A1>A2>S2 in 5.3) and that other “cross connect” use cases in other figures also present paths with more hops than others.  The question which you raise and on which the group will have to take  a decision is:  do we focus only on load balancing between the shortest paths (longer paths used only for overload or failure cases as you indicated ), or if it is relevant to also  consider longer paths for load balancing, the second choice may significantly make the load balancing information to deal with more complex. But we have first  to assess the requirement.
Two hereafter observations

1.1) In figure 5.3: if S1 is more loaded than S2 , for realm routed requests, C1, to achieve load balancing, needs a  load info for the realm via  A1 (according to S1 load),  and another load info via A2 (according to S2 load)

1.2)  Another drawing

C1---A1------S1
    |
    |
C2---A2------S2

if S1 is more loaded than S2, A1 may send some traffic coming from C1 to S2 via A2, so with  a longer path C1>A1>A2>S2 than C1>A1>S1. This means that A1 has load information about S2 and about S1 to do load balancing. Does this case enter our scope.?

2) About
   ——A1--S1...Sn
 /
C
 \
   ——A2--Sn+1...Sm

My drawings 5.5 to 5.7 appear to be a bit confusing, in fact my diagrams should be corrected as this one:

      /---S1
   ——A1---...
  /  |\---Sn
 /   |
C    |
 \   |/--Sn+1
   ——A2--...
      \--Sm
So the upstream servers are  directly connected to a DA.

3) Please let me know how you would like to Insert the analysis into the new version of your draft .  I think we may have a dedicated section about topology use cases.

Best regards

JJacques

De : Ben Campbell [mailto:ben@nostrum.com]
Envoyé : mercredi 7 janvier 2015 01:01
À : TROTTIN, JEAN-JACQUES (JEAN-JACQUES)
Cc : dime@ietf.org
Objet : Re: [Dime] Diameter load control inputs

Hi JJacques,

To answer your lat question first, I think this is a useful analysis, and would be willing to combine it into a new version of my previous draft. We can talk about the details of that offline.

Second, some high level observations:

First, From a load-balancing perspective, Figure 5.3 is functionally equivalent to the following:

  —------S1
/        |
C1       |
\        |
  ——-----S2
Where S1 and S2 can proxy requests to the other according to some policy (e.g. load balancing). I don’t mean to suggest you would deploy it that way, but it makes another observation more obvious, which is that C1->S1 or C1->S2 are more efficient than C1->S1->S2 or C1->S2->S1.

My initial reaction is to think that we can generalize this to the various use cases that have cross-connected agents. That is, when all else is equal, shorter paths between
client and server are better than longer ones. This seems to suggest that the cross-path connections be reserved for failure or overload cases. This probably has implications on whether the load metric sent by an agent should somehow aggregate the load metrics of its servers. (Or at least take them as an input.)

Second, your figures 5.5-5.7 suggest that we may have cases where the number of servers to the right of an agent may not be known by the client. Here’s a simplified view:

   ——A1--S1...Sn
 /
C
 \
   ——A2--Sn+1...Sm

In this case, the load metrics for the servers are of little use to C, since in general C will not know if it’s knowledge encompasses all possible destinations. (A related question is wether the _agents_ actually know the load metrics for all the possible destinations. I’m not sure they always do, e.g. if they are using dynamic discovery.

On Jan 4, 2015, at 11:38 AM, TROTTIN, JEAN-JACQUES (JEAN-JACQUES) <jean-jacques.trottin@alcatel-lucent.com<mailto:jean-jacques.trottin@alcatel-lucent.com>> wrote:

Dear all

I first wish you a  happy  new year for you and your family as well in your professional life.

I hereafter address our new topic on Diameter load control.
I am in favor to continue the work that Ben has covered in his “draft-campbell-dime-load-considerations-00” document, by adding, investigating various considerations.

I hereafter give several additional section parts or paragraphs  that I think could be inserted in Ben’s draft. I have put some titles and numbering (#L1, #L2 …)  to reuse in separated email treads for discussion.

I have included topology use cases. For Diameter overload,  Steve did several  presentations on the  various use cases to consider. I think also good to do the same for Diameter load, and to insert such use cases in the Ben’s draft. You may consider this list somewhat long but these use cases highlight a certain number of points for which we will have to take some decisions (eg to cover them or not or to see later if  solution assumptions apply well). There are more from a requirement viewpoint than in the solution itself as I do not address the exact content of load information and the way it will be transferred

#L1   Topology use cases (can be inserted in section 5 )

          ------S1
        /
      C1
        \
          ------S2
Figure 5.1

In this simple case (no DA) e.g. for a small network, the servers are peers of the client. The client selects the server according to the load information it receives from the servers.

          ------S1
        /
C1----A
        \
          ------S2
Figure 5.2

The agent selects the server according to the load information it receives from the servers. Here, the client does not need to receive load information as it does not do load balancing. It is to check if load information would be useful to the client for other purposes (e.g. anticipating a future overload of the server).

   ---A1------S1
/    |
C1   |
\    |
   ---A2------S2

Figure 5.3

C1 when sending a realm routed request even does not know which server will handle the request an even more does not know to which DA S1 and S2 are connected to. So the load information on the servers is not useful. But C1 has to select to which from A1 or A2 it will send the request. It can do it on the basis of the load information of A1 or A2. But is it sufficient?
If S1 has high load and S2 a small one, to improve the load balancing would mean that
a) C1 sends more traffic to A2 which forward to S2, meaning that C1 has received load information from A2 not only including A2 load but also the S2 load
b) or C1, having no information on server load equally sends traffic to A1 and A2. Then if A1 receives from A2 load information including the load of S2 may then decide to route more traffic to S2 via A2. This is less optimal than the first case as adding a new hop in the path (C1-A1-A2-S2.

An initial question is to consider if this subcases a) and b) are relevant and enter the scope.

C1, when sending a host routed request, can do it via A1 or A2 as routing for C1 is based on the realm. Is it to be considered if load control information received by C1 gives some hint to rather select A1 rather than A2 when S1 is the destination host, so to avoid an additional hop? Or if it is out of the scope?

         -----S3
       /
   ---A1------S1
/    |
C1   |
\    |
   ---A2------S2
       \
         ---- S4
Figure 5.4

This diagram is an extension of the previous one with additional servers connected to A1 and A2 and could be more common than 5.3.
Compared to the previous diagram,
- for the subcase a), for realm routed request C1 choice between A1 and A2 can rely on the combined (or aggregated) load of S1 and S3 compared to the aggregated load of S2 and S4.
- for the subcase b) similarly, A1 would compare an aggregated load of S2 and S4 received from A2 to the aggregated load of S1 and S3.

This use case may drive to consider if the load of a set of nodes (aggregated load) would be useful or if it can be avoided.

   ---A1---S1, S3 …
/    | \/
C1   | /\
\    |/  \
   ---A2---S2, S4 …

Figure 5.5

In this diagram, all servers are connected to the same set of DAs, which multiply the number of connections between servers and their DAs; otherwise, we come back to the figure 5.4 case or an hybrid case between figure 5.4 and 5.5.
C1 when sending a request may select A1 or A2 according to a load information only related to A1 load and A2 load. This selection is independent from the server load, except if we take into account e.g. the load of the A1/S1 connection and the load of A2/S1 connection which may be different.

A1 (or A2) then selects the server according to load information received from these servers, except for host routed requests.

   ---A1---A3----S1, S3 …
/    | \/ |
C1   | /\ |
\    |/  \|
   ---A2---A4----S2, S4 …

Figure 5.6

This diagram introduces “two layers” of DAs between clients and servers.
Compared to figure 5.3 or 5.4, C1 when sending a request may select A1 or A2 according to a load information only related to A1 load and A2 load. Then A1(or A2)when doing further routing, are in a similar situation to C1 in Figures 5.3 or 5.4.

   ---A1---A3---S1, S3 …
/    | \/ | \/
C1   | /\ | /\
\    |/  \|/  \
   ---A2---A4---S2, S4 …

Figure 5.7
As in figure 5.4, all servers are connected to a common set of DAs (here A3, A4), which multiply the number of connections between servers and their DAs; otherwise, we come back to the figure 5.6 case or a hybrid case between figures 5.6 and 5.7.
C1 when sending a request may select A1 or A2 according to a load information only related to A1 load and A2 load. This selection is independent from the server load.
Then A1 or A2 for futrjer routing have a similar situation to C1 in figure 5.5.
This case offer a high flexibility about load sharing that can be done in different places, but als o may introduce higher complexity about selecting the right routing.

___________________________

#L2    Partitioned servers (can be added in section 5)

This the case of realm routed request where only one or some servers in the same realm can handle the request, so no load balancing to other servers of the realm can be achieved. An intermediate proxy DA will select the possible server(s)for this request, and if several servers, the DA may do some load balancing between them and if one server, the DA may act as with host routed requests.

___________________________
#L3    Active and stand by nodes (can be inserted in section 5)

In the above topology use cases, it is assumed that load sharing can be used between the different possible connections of the figures. If nodes are in an  active and stand by configurations, the figures to consider are with the only active connections. The switch from the active to the stand by node with the load handling aspect is another topic.

__________________________________

#L4… Some considerations from the topology use cases (can be inserted in section 5)

- there is a large variety of use cases with various places where load balancing actions can be taken. This may impact the content of load information transferred downstream. It should also be avoided to add too much complexity, especially on the load information content transferred downstream, node behavior can be more sophisticated but normally remains outside the standardization scope.

- clients receiving load information have not to be aware of the DAs and server topologies, they nevertheless need relevant load information criteria to select (when needed) the right next node to which to send a request. This may drive to consider if some aggregation of load information to be performed before sending it to the clients, so to make it simpler to the clients.

_______________________________

#L6   Load information scopes (can be  added in section 6)

Load information can apply to various scopes that should be considered to retain or not :

Load of a node: this is the most usual to consider. A node can be an end point (server) or an agent.

Load of an application: The load of a server can vary according to the applications (eg due to differences in the used resources). So the possibility to have a load information per application should not be precluded at this level of analysis.

Load of a set of nodes: it may be considered interesting that the load of a set of nodes is transferred downstream. Use cases should identify such an interest. This drives to the aggregation of the load of the various nodes of the set to elaborate the load of the set. This can be done by an intermediate node receiving the load information of the various nodes. It can be a weighted average between the loads; the weight is another information to be considered (eg the capacity of the servers which may be different).

Load of an agent and load of a server.
Different paths via different DAs may exist between the node doing the server selection and the server. The load information on the server would be the same; but then to which peer to send the message? This would depend of the load of the peer (and even of the load behind the peer).

________________________________

#L7   Addition removal of a server (can be  added in section 6)

When adding a server, this one will start by advertising no or a small load, then it will be to the downstream nodes (eg DAs) handling load balancing between servers to ensure a smooth increase of the traffic to this server, to avoid to immediately send it too much traffic and even to not put it into overload. It may be questioned if a simple load information (load percentage) is sufficient for this.

When removing a server in a controlled way (e.g. for maintenance purpose, so outside a failure case), an objective can be to progressively reduce the traffic to this server, with a routing of the traffic to other servers. Simple load information (load percentage) would be not sufficient. It should be considered if this use case enter the scope of this load control mechanism.

__________________________

#L8   Implementation and standardization (can be added in section 1)

Through the use cases and the architectures considerations described in this document, the focus is to define which would be the relevant load information to be transferred to the nodes that will use it. Then, given the variety of use cases, the possible topologies, the place where the server selection is done, etc, flexibility is to be left to implementations about the node behavior regarding the elaboration of the load information and then its use. Guidance or examples may also be given to help but without normative aspects.

_____________________________

#L9…  Redirect agent : (can be added at the end of section 3)

A Diameter node (e.g. client) can use a redirect agent to get destination host addresses.  The redirect agent may return several destination host addresses, of which the Diameter node has to select one and for this can rely on the load information on these hosts
___________________________

So please let me know your comments.

Ben, would you be OK to complement your draft with those inputs . We can also have question marks or editor’s notes, but I think good to have a document about which we discuss these architectural considerations.

Best regards

JJacques

_______________________________________________
DiME mailing list
DiME@ietf.org<mailto:DiME@ietf.org>
https://www.ietf.org/mailman/listinfo/dime