[Dime] Diameter load control inputs

"TROTTIN, JEAN-JACQUES (JEAN-JACQUES)" <jean-jacques.trottin@alcatel-lucent.com> Sun, 04 January 2015 17:38 UTC

From: "TROTTIN, JEAN-JACQUES (JEAN-JACQUES)" <jean-jacques.trottin@alcatel-lucent.com>
To: "dime@ietf.org" <dime@ietf.org>
Thread-Topic: Diameter load control inputs
Thread-Index: AdAoRTX1mtXcCV8gROKEYHqQd0nf8g==
Date: Sun, 04 Jan 2015 17:38:12 +0000
Message-ID: <E194C2E18676714DACA9C3A2516265D2026F4908@FR712WXCHMBA12.zeu.alcatel-lucent.com>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
Content-Type: multipart/alternative; boundary="_000_E194C2E18676714DACA9C3A2516265D2026F4908FR712WXCHMBA12z_"
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/dime/IaELPJ21UNWgC-WxAoqk0LR-svY
Subject: [Dime] Diameter load control inputs
Precedence: list

Dear all

I first wish you a happy new year for you and your family as well in your professional life.

I hereafter address our new topic on Diameter load control.
I am in favor to continue the work that Ben has covered in his "draft-campbell-dime-load-considerations-00" document, by adding, investigating various considerations.

I hereafter give several additional section parts or paragraphs that I think could be inserted in Ben's draft. I have put some titles and numbering (#L1, #L2 ...) to reuse in separated email treads for discussion.

I have included topology use cases. For Diameter overload, Steve did several presentations on the various use cases to consider. I think also good to do the same for Diameter load, and to insert such use cases in the Ben's draft. You may consider this list somewhat long but these use cases highlight a certain number of points for which we will have to take some decisions (eg to cover them or not or to see later if solution assumptions apply well). There are more from a requirement viewpoint than in the solution itself as I do not address the exact content of load information and the way it will be transferred

#L1 Topology use cases (can be inserted in section 5 )

------S1
/
C1
\
------S2
Figure 5.1

In this simple case (no DA) e.g. for a small network, the servers are peers of the client. The client selects the server according to the load information it receives from the servers.

------S1
/
C1----A
\
------S2
Figure 5.2

The agent selects the server according to the load information it receives from the servers. Here, the client does not need to receive load information as it does not do load balancing. It is to check if load information would be useful to the client for other purposes (e.g. anticipating a future overload of the server).

---A1------S1
/ |
C1 |
\ |
---A2------S2

Figure 5.3

C1 when sending a realm routed request even does not know which server will handle the request an even more does not know to which DA S1 and S2 are connected to. So the load information on the servers is not useful. But C1 has to select to which from A1 or A2 it will send the request. It can do it on the basis of the load information of A1 or A2. But is it sufficient?
If S1 has high load and S2 a small one, to improve the load balancing would mean that
a) C1 sends more traffic to A2 which forward to S2, meaning that C1 has received load information from A2 not only including A2 load but also the S2 load
b) or C1, having no information on server load equally sends traffic to A1 and A2. Then if A1 receives from A2 load information including the load of S2 may then decide to route more traffic to S2 via A2. This is less optimal than the first case as adding a new hop in the path (C1-A1-A2-S2.

An initial question is to consider if this subcases a) and b) are relevant and enter the scope.

C1, when sending a host routed request, can do it via A1 or A2 as routing for C1 is based on the realm. Is it to be considered if load control information received by C1 gives some hint to rather select A1 rather than A2 when S1 is the destination host, so to avoid an additional hop? Or if it is out of the scope?

-----S3
/
---A1------S1
/ |
C1 |
\ |
---A2------S2
\
---- S4
Figure 5.4

This diagram is an extension of the previous one with additional servers connected to A1 and A2 and could be more common than 5.3.
Compared to the previous diagram,
- for the subcase a), for realm routed request C1 choice between A1 and A2 can rely on the combined (or aggregated) load of S1 and S3 compared to the aggregated load of S2 and S4.
- for the subcase b) similarly, A1 would compare an aggregated load of S2 and S4 received from A2 to the aggregated load of S1 and S3.

This use case may drive to consider if the load of a set of nodes (aggregated load) would be useful or if it can be avoided.

---A1---S1, S3 ...
/ | \/
C1 | /\
\ |/ \
---A2---S2, S4 ...

Figure 5.5

In this diagram, all servers are connected to the same set of DAs, which multiply the number of connections between servers and their DAs; otherwise, we come back to the figure 5.4 case or an hybrid case between figure 5.4 and 5.5.
C1 when sending a request may select A1 or A2 according to a load information only related to A1 load and A2 load. This selection is independent from the server load, except if we take into account e.g. the load of the A1/S1 connection and the load of A2/S1 connection which may be different.

A1 (or A2) then selects the server according to load information received from these servers, except for host routed requests.

---A1---A3----S1, S3 ...
/ | \/ |
C1 | /\ |
\ |/ \|
---A2---A4----S2, S4 ...

Figure 5.6

This diagram introduces "two layers" of DAs between clients and servers.
Compared to figure 5.3 or 5.4, C1 when sending a request may select A1 or A2 according to a load information only related to A1 load and A2 load. Then A1(or A2)when doing further routing, are in a similar situation to C1 in Figures 5.3 or 5.4.

---A1---A3---S1, S3 ...
/ | \/ | \/
C1 | /\ | /\
\ |/ \|/ \
---A2---A4---S2, S4 ...

Figure 5.7
As in figure 5.4, all servers are connected to a common set of DAs (here A3, A4), which multiply the number of connections between servers and their DAs; otherwise, we come back to the figure 5.6 case or a hybrid case between figures 5.6 and 5.7.
C1 when sending a request may select A1 or A2 according to a load information only related to A1 load and A2 load. This selection is independent from the server load.
Then A1 or A2 for futrjer routing have a similar situation to C1 in figure 5.5.
This case offer a high flexibility about load sharing that can be done in different places, but als o may introduce higher complexity about selecting the right routing.

___________________________

#L2 Partitioned servers (can be added in section 5)

This the case of realm routed request where only one or some servers in the same realm can handle the request, so no load balancing to other servers of the realm can be achieved. An intermediate proxy DA will select the possible server(s)for this request, and if several servers, the DA may do some load balancing between them and if one server, the DA may act as with host routed requests.

___________________________
#L3 Active and stand by nodes (can be inserted in section 5)

In the above topology use cases, it is assumed that load sharing can be used between the different possible connections of the figures. If nodes are in an active and stand by configurations, the figures to consider are with the only active connections. The switch from the active to the stand by node with the load handling aspect is another topic.

__________________________________

#L4... Some considerations from the topology use cases (can be inserted in section 5)

- there is a large variety of use cases with various places where load balancing actions can be taken. This may impact the content of load information transferred downstream. It should also be avoided to add too much complexity, especially on the load information content transferred downstream, node behavior can be more sophisticated but normally remains outside the standardization scope.

- clients receiving load information have not to be aware of the DAs and server topologies, they nevertheless need relevant load information criteria to select (when needed) the right next node to which to send a request. This may drive to consider if some aggregation of load information to be performed before sending it to the clients, so to make it simpler to the clients.

_______________________________

#L6 Load information scopes (can be added in section 6)

Load information can apply to various scopes that should be considered to retain or not :

Load of a node: this is the most usual to consider. A node can be an end point (server) or an agent.

Load of an application: The load of a server can vary according to the applications (eg due to differences in the used resources). So the possibility to have a load information per application should not be precluded at this level of analysis.

Load of a set of nodes: it may be considered interesting that the load of a set of nodes is transferred downstream. Use cases should identify such an interest. This drives to the aggregation of the load of the various nodes of the set to elaborate the load of the set. This can be done by an intermediate node receiving the load information of the various nodes. It can be a weighted average between the loads; the weight is another information to be considered (eg the capacity of the servers which may be different).

Load of an agent and load of a server.
Different paths via different DAs may exist between the node doing the server selection and the server. The load information on the server would be the same; but then to which peer to send the message? This would depend of the load of the peer (and even of the load behind the peer).

________________________________

#L7 Addition removal of a server (can be added in section 6)

When adding a server, this one will start by advertising no or a small load, then it will be to the downstream nodes (eg DAs) handling load balancing between servers to ensure a smooth increase of the traffic to this server, to avoid to immediately send it too much traffic and even to not put it into overload. It may be questioned if a simple load information (load percentage) is sufficient for this.

When removing a server in a controlled way (e.g. for maintenance purpose, so outside a failure case), an objective can be to progressively reduce the traffic to this server, with a routing of the traffic to other servers. Simple load information (load percentage) would be not sufficient. It should be considered if this use case enter the scope of this load control mechanism.

__________________________

#L8 Implementation and standardization (can be added in section 1)

Through the use cases and the architectures considerations described in this document, the focus is to define which would be the relevant load information to be transferred to the nodes that will use it. Then, given the variety of use cases, the possible topologies, the place where the server selection is done, etc, flexibility is to be left to implementations about the node behavior regarding the elaboration of the load information and then its use. Guidance or examples may also be given to help but without normative aspects.

_____________________________

#L9... Redirect agent : (can be added at the end of section 3)

A Diameter node (e.g. client) can use a redirect agent to get destination host addresses. The redirect agent may return several destination host addresses, of which the Diameter node has to select one and for this can rely on the load information on these hosts
___________________________

So please let me know your comments.

Ben, would you be OK to complement your draft with those inputs . We can also have question marks or editor's notes, but I think good to have a document about which we discuss these architectural considerations.

Best regards

JJacques

[Dime] Diameter load control inputs TROTTIN, JEAN-JACQUES (JEAN-JACQUES)
Re: [Dime] Diameter load control inputs Ben Campbell
Re: [Dime] Diameter load control inputs TROTTIN, JEAN-JACQUES (JEAN-JACQUES)
Re: [Dime] Diameter load control inputs Wiehe, Ulrich (NSN - DE/Munich)
Re: [Dime] Diameter load control inputs TROTTIN, JEAN-JACQUES (JEAN-JACQUES)
Re: [Dime] Diameter load control inputs Ben Campbell
Re: [Dime] Diameter load control inputs Steve Donovan