Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

"Gunn, Janet P" <Janet.Gunn@csra.com> Tue, 21 June 2016 16:16 UTC

Return-Path: <Janet.Gunn@csra.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5263C12D091 for <dime@ietfa.amsl.com>; Tue, 21 Jun 2016 09:16:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.327
X-Spam-Level:
X-Spam-Status: No, score=-3.327 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OWtc4QIBYa3U for <dime@ietfa.amsl.com>; Tue, 21 Jun 2016 09:16:04 -0700 (PDT)
Received: from mailport7.csra.com (mailport7.csra.com [131.131.97.25]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7491E12D9BC for <dime@ietf.org>; Tue, 21 Jun 2016 09:16:04 -0700 (PDT)
Received: from csrrdu1exm025.corp.csra.com (HELO mail.csra.com) ([10.8.2.25]) by mailport7.csra.com with ESMTP/TLS/AES256-SHA; 21 Jun 2016 12:15:45 -0400
Received: from CSRRDU1EXM025.corp.csra.com (10.8.2.25) by CSRRDU1EXM021.corp.csra.com (10.8.2.21) with Microsoft SMTP Server (TLS) id 15.0.1178.4; Tue, 21 Jun 2016 12:15:59 -0400
Received: from CSRRDU1EXM025.corp.csra.com ([10.8.2.25]) by CSRRDU1EXM025.corp.csra.com ([10.8.2.25]) with mapi id 15.00.1178.000; Tue, 21 Jun 2016 12:16:00 -0400
From: "Gunn, Janet P" <Janet.Gunn@csra.com>
To: Maria Cruz Bartolome <maria.cruz.bartolome@ericsson.com>, "jouni.nospam@gmail.com" <jouni.nospam@gmail.com>, "dime@ietf.org" <dime@ietf.org>
Thread-Topic: [Dime] WGLC #1 for draft-ietf-dime-load-02
Thread-Index: AQHRtdFbxNTUAf5iuEyzblc0BoJml5/yf3oAgAG3a2A=
Date: Tue, 21 Jun 2016 16:16:00 +0000
Message-ID: <3e2082d80d8e45caaca581c9dcc98468@CSRRDU1EXM025.corp.csra.com>
References: <5b31616d-efa3-ac03-8f1c-bd8883a35d65@gmail.com> <087A34937E64E74E848732CFF8354B9219758407@ESESSMB101.ericsson.se>
In-Reply-To: <087A34937E64E74E848732CFF8354B9219758407@ESESSMB101.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.136.2.8]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/agMSZp7nQgiISYHg-8J57FuVshY>
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jun 2016 16:16:07 -0000

Comments in line <JPG>

-----Original Message-----
From: DiME [mailto:dime-bounces@ietf.org] On Behalf Of Maria Cruz Bartolome
Sent: Monday, June 20, 2016 5:14 AM
To: jouni.nospam@gmail.com; dime@ietf.org
Subject: Re: [Dime] WGLC #1 for draft-ietf-dime-load-02

Hello all,

I would like to provide some questions, proposed changes and typos, see in different sections to ease reading.
Best regards
/MCruz


===========  SOME QUESTIONS ===========:

Appendix A.  Topology Scenarios
Does it really make sense to keep an appendix that states:
   "Nothing in this section should be construed to mean that a given
   scenario is in scope for this effort, or even a good idea."

I think we need to keep only the scenarios that are "in scope of this effort", what I understand as "suitable for load conveyance as stated in this draft".
If some of them are not considered suitable by any reasons, I presume they should be removed.
<JPG> Or note as (counter) examples of scenarios NOT suitable.</JPG>

A.10.  Addition and removal of Nodes
Shouldn't this part of the annex be in the regular body of the draft?


=========== PROPOSED CHANGES ===========:

Abstract:

Now:
   This document defines a mechanism for *sharing*  of Diameter load
   information.
Proposed:
   This document defines a mechanism for *conveying* Diameter load
   information.

Reasoning:
*Sharing" may be a bit misleading.

<JPG> Agree. Conveying is better. </JPG>

1. Introduction:
Now:
  In particular, DOIC does not fulfill Req 24, which requires a
   mechanism where Diameter nodes can indicate their *current load* , even
   if they are not currently overloaded.  DOIC also does not fulfill Req
   23, which requires that *nodes that divert traffic*  away from
   overloaded nodes be provided with sufficient information to select
   targets that are most likely to have sufficient capacity.

Proposal:
I think we need to include the exact requirement text from RFC7068, since the description you use does not keep the exact meaning.
E.g. *current load* should be replaced by *load levels", *nodes that divert traffic*, in fact is *nodes with traffic diversion capability*.
Better, just list requirements. If an interpretation is required, this is fine, but the original text is important to be kept:
REQ 23: The solution MUST provide sufficient information to enable a load-balancing node to divert messages that are rejected or otherwise throttled by an overloaded upstream node to other upstream nodes that are the most likely to have sufficient capacity to process them.
REQ 24: The solution MUST provide a mechanism for indicating load levels, even when not in an overload condition, to assist nodes in making decisions to prevent overload conditions from occurring.

<JPG> Agree.  Itt would make sense to have a section, or even an appendix, which lists the requirements, and notes whichare/are not met. </JPG>

1. Introduction

Now:
  There are several other requirements in [RFC7068] that mention both
   overload and load information that are only partially fulfilled by
   DOIC.
  [....]
   This document defines a mechanism that addresses the load-related
   requirements from RFC 7068.

Proposal
We need to list the requirements we refer to. They are not listed anywhere, right?
I think we refer to following Requirements:

REQ 1: The solution MUST provide a communication method for Diameter nodes to exchange load and overload information.
REQ 2: The solution MUST allow Diameter nodes to support overload control regardless of which Diameter applications they support. Diameter clients and agents must be able to use the received load and overload information to support graceful behavior during an overload condition. Graceful behavior under overload conditions is best described by REQ 3.
REQ 12: When a single network node fails, goes into overload, or suffers from reduced processing capacity, the solution MUST make it possible to limit the impact of the affected node on other nodes in the network. This helps to prevent a small- scale failure from becoming a widespread outage.
REQ 34: The solution SHOULD provide a method for exchanging overload and load information between elements that are connected by intermediaries that do not support the solution.

<JPG> Agree. See above comment. </JPG>

2. Terminology and abbreviations

Now:
Load
      The *relative  capacity of a Diameter node*.  A low load level
      indicates that the Diameter node is under utilized.  A high load
      level indicates that the node is closer to being fully utilized.

Proposed:
Load
      The* Diameter message processing  capacity of a node*.  A low load level
      indicates that the Diameter node is under utilized.  A high load
      level indicates that the node is closer to being fully utilized.

Reasoning:
I think using "relative" is misleading.

<JPG> I do not like either. "Capacity" is what the node can do.

 "Available capacity" is actually HIGH when there is a low load level, and LOW when there is a high laod level.

If you want to avoid "Utilization", which implies an explicit calculation, you could say "the relative usage of the Daimeter message processing capacity'" </JPG>

4.1
Now:
   Second, Overload information, in the form of a DOIC Overload Report
   (OLR) [RFC7683] indicates an explicit request for action on the part
   of the reacting node.  That is, the OLR requests that the reacting
   node reduce the offered load -- the actual traffic sent to the
   reporting node after overload abatement and routing decisions are
   made -- by an indicated amount *or to an indicated level *.

Proposed:
   Second, Overload information, in the form of a DOIC Overload Report
   (OLR) [RFC7683] indicates an explicit request for action on the part
   of the reacting node.  That is, the OLR requests that the reacting
   node reduce the offered load -- the actual traffic sent to the
   reporting node after overload abatement and routing decisions are
   made -- by an indicated amount *(by default, or other optional abatement algorithms).*

  - Or remove everything after "amount".

<JPG> RFC7683 is clear that the Overload Report may be used to trigger EITHER a loss based algorithm, or a different (e.g. rate based) algorthm.  So the summary here should not be restricted to a loss-based description.  Perhaps "--by an indicated amount (by default), or as prescribed by the selected abatement algorithm." </JPG>

4.1
Now:
   None of this prevents a Diameter node from deciding to reduce the
   offered load based on load information.   .

Proposed
  (remove)

Reasoning:
This sentence is not properly linked to previous paragraph and it is covered by previous paragraph already

<JPG> OK with this, though not sure it is necessary to delete.</JPG>

4.2
Now:
   Req 24 discusses how Diameter load information might be used when no
   overload condition currently exists.  Diameter nodes can use the load
   information to make decisions to try to avoid overload conditions in
   the first place.  Normal load-balancing falls into this category.  A
   node might also take other proactive steps to reduce offered load
   based on load information, so that the loaded node never goes into
   overload in the first place.

Proposed:
   Req 24 discusses how Diameter load information might be used when no
   overload condition currently exists.  Diameter nodes can use the load
   information to make decisions to try to avoid overload conditions in
   the first place.  Normal load-balancing falls into this category, but
   the diameter node can  take other proactive steps as well.

<JPG> Agree </JPG>

4.2
Now
   If the loaded nodes are Diameter servers (or clients in the case of
   server-to-client transactions), both of these uses are most
   effectively accomplished  by a Diameter node that performs server
   selection.

Proposed:
   If the loaded nodes are Diameter servers (or clients in the case of
   server-to-client transactions), both of these *load information* uses *should
   be*  accomplished  by a Diameter node that performs server
   selection.

Reasoning:
  Diverting traffic can only be performed by a node that performs server selection, or?

<JPG> Agree in principle, but I think that "..both of these uses of laod information should be ..." reads better than  "... both of these load information uses should be ...". </JPG>

5.
Now
   The second big difference between DOIC and Load is visibility of the
   DOIC or Load information within a Diameter network.  DOIC information
   is sent end-to-end resulting in the ability of all nodes in the path
   of the answer message that carries the OC-OLR AVP to act on the
   information.  The DOIC overload reports much remain in the message
   all the way from the reporting node to the node that is the target
   for the answer message.

   For the Load mechanism there are two types of load reports.

   The first is the load of the endpoint sending the answer message.
   This load report is carried end-to-end to enable any nodes that make
   server selection decisions to use the load status of the sending
   endpoint as part of  the server selection decision.

   The second type of load report is a peer report.  This report is used
   by Diameter nodes as part of the logic to select the next hop
   Diameter node and, as such, do not have significance beyond the peer
   node.  These load reports are removed by the first supporting
   Diameter node to receive the report.

Proposed:
   The second big difference between DOIC and Load is visibility of the
   DOIC or Load information within a Diameter network.  DOIC information
   is sent end-to-end resulting in the ability of all nodes in the path
   of the answer message that carries the OC-OLR AVP to act on the
   information, *although only one node can actually consume the report*.  The DOIC overload reports much remain in the message
   all the way from the reporting node to the node that is the target
   for the answer message.

   *However,* for the Load mechanism there are two types of load reports *and only the
    first one is transmitted end-to-end*.

   The first is the load of the endpoint sending the answer message.
   This load report is carried end-to-end to enable any nodes that make
   server selection decisions to use the load status of the sending
   endpoint as part of  the server selection decision. *More than one node may make use of the load information received*

   The second type of load report is a peer report.  This report is used
   by Diameter nodes as part of the logic to select the next hop
   Diameter node and, as such, do not have significance beyond the peer
   node.  These load reports are removed by the first supporting
   Diameter node to receive the report.

<JPG> Slightly different comment.  I think the phrase " The DOIC overload reports much remain in the message..." is a typo and should be " The DOIC overload reports must (or MUST?) remain in the message.."  <?JPG>

5.
Now
  The goal is make it possible to use both the load values received as
   a part of the Diameter Load mechanism and weight values received as a
   result of a DNS SRV query.  As a result, the Diameter load value has
   a range of 0-65535.  This value and DNS SRV weight values are then
   used in a distribution algorithm similar to that specified in
   [RFC2782].

Comments:
In order to have an efficient load balancing algorithm, it is not enough for the reacting node (for the node in charge of load balancing) to know the Load of each server, but it needs to know the load in relation to each server capacity. Unless we do so, the Load value of a server can't be compared with the Load of a Server with a different weight.
Then, in my opinion, we need to find a way to provide a Load value that is in fact comparable with the rest of the Load values of the servers in the group.
Reflecting a bit longer on this, I think we need then to define a group of servers in the load-balancing group, like a load-balancing context, and then, for all servers in such a group we need to provide a relative value of dynamic Load.

<JPG> Agree with the thought- if "Little Server" is 30% utilized and "Big Server" is 50% utilized, it still makes sense to send more traffic to Big Server.  But I am not sure if that is withn the scope of this document. </JPG>


5.
Now
   The load report includes the relative load of the sending node.  This
   relative load is specified in a manner consistent with that defined
   for DNS SRV [RFC2782].

Proposed:
   The load report includes a value to identify the load of the sending node,
  specified in a manner consistent with that defined
   for DNS SRV [RFC2782].

<JPG> Agree. </JPG>

5.
Now:
The distribution algorithm used by Diameter nodes supporting the
   Diameter Load mechanism is an implementation decision but it needs to
   result in similar behavior as the algorithm specified in [RFC2782].

Proposed:
The distribution algorithm used by Diameter nodes supporting the
   Diameter Load mechanism is an implementation decision but it needs to
   result in similar behavior as the algorithm *described
   for the use of weigth values in* [RFC2782].

<JPG> Agree in principle. NIT- replace "similar behavior as" with "similar behavior to", and repalce "weigth" with "weight". </JPG> (End of my comments)


5.1
Now:
  If Agent A4 supports the Load mechanism then it will verify that the
   load information received is valid.  For a HOST load report this is
   achieved by matching the identity included in the load information
   with the identity of the host node from which the answer message was
   received.

Comments:
A4 behaviour should be defined generically. In the example, we know S[n] is a peer of A4, but generically A4 will not know it when receiving a HOST report.
Then, for an AgentX the HOST load report is valid as long as it is responsible for server selection, as explained for A1 below:
A1's actions depend on whether A1 is
   responsible for doing server selection.  If A1 is not doing server
   selection then A1 ignores the HOST load report.  If A1 is responsible
   for doing server selection then it stores the load information for
   S[n] in its routing information for the handling of subsequent
   request messages.  In both cases A1 leaves the HOST report in the
   message

6.1.1
Now:
   The method for determining the load value included in the load report
   is an implementation decision.

Comments:
In line to comment above, I agree it should be implementation specific, but we need to provide some guidance to be able to provide a value that could be used to achieve a successful load balancing.


6.2
Now:
   If the Diameter node is responsible for doing server selection then
   it SHOULD save the load value included in the Value AVP included in
   the Load AVP of type HOST in its routing information.

Proposed:
   If the Diameter node is responsible for doing server selection then
   it SHOULD save the load value included in the Value AVP included in
   the Load AVP of type HOST.

Reasoning:
It is a bit misleading to state that is should be stored "in its routing information". It has to be used for server selection, regardless "how" and "where" it is stored.

7.3
Now:
   The Load-Value AVP (AVP code TBD3) is of type Unsigned64.  It is used
   to convey relative load information about the sender of the load
   report.

Comments:
*Relative load*
It seems it refers to what I commented before, about the "relative dynamic load", in that comment it is relative to the weight.
But as the draft is now, I think it is misleading, since it is not clear to what it refers.


7.3
Now:
   The Load-Value AVP is specified in a manner similar to the weight
   value in DNS SRV ([RFC2782]).

   The Load-Value has a range of 0-65535.

   A higher value indicates a lower load on the sending node.  A lower
   value indicates that the sending node is heavily loaded.

      Stated another way, a node that has zero load would have a load
      value of 65535.  A node that is 100% loaded would have a load
      value of 0.

Comments:
I think it could be easier to use a %. It is more straight forward to figure out what it means.



=========== TYPOS========:

2. Terminology and abbreviations

   Routing Information

      Routing Information - Routing information referred to in this
      document can include the Routing and Peer tables defined in RFC
      6733.  It can also include other implementation specific tables
      used to store load information.  This document does not define the
      structure of such tables.

Remove *Routing information* duplicated sentence.

4.1
At any given time that load *maybe*  effectively
   zero
*May be*

5.1
Because the load report is *an* HOST load report, A4 leaves the load
   report in the message it relays.

5.1
   A1 then calculates its own load information and inserts load
   information AVPs of type PEER in the message before sending the
   message to *A1*

  *A1* should be C

6.1.1
      For instance, if the only consumer of the load reports is the
     * endpoints peer* then the endpoint can choose to only include a load
      report when the load of the endpoint has changed by a meaningful
      percentage.  If there are consumers of the endpoint load report
      other *thaen* the *endpoints peer* (this will be the case if other
      nodes are responsible for server selection) then the endpoint
      might choose to include load reports in all answer messages as a
      way of ensuring that all nodes doing server selection get accurate
      load information.

     *endpoint's peer*

6.2
A Diameter node MUST be prepared to process load reports of type HOST
   *and* of type PEER

6.2
      Note that the node needs to be able to handle messages with no
      load reports, messages with just a PEER load report, messages with
      just *an* HOST load report and messages with both types of load
      reports.



-----Original Message-----
From: DiME [mailto:dime-bounces@ietf.org] On Behalf Of Jouni Korhonen
Sent: martes, 24 de mayo de 2016 17:30
To: dime@ietf.org
Subject: [Dime] WGLC #1 for draft-ietf-dime-load-02

Folks,

This email starts the WGLC #1 for draft-ietf-dime-load-02. Please, review the document, post your comments to the mailing list and also insert them into the Issue Tracker with your proposed resolution.

WGLC starts: 5/24/2016
        ends: 6/7/2016 EOB PDT

- Jouni & Lionel

_______________________________________________
DiME mailing list
DiME@ietf.org
https://www.ietf.org/mailman/listinfo/dime

_______________________________________________
DiME mailing list
DiME@ietf.org
https://www.ietf.org/mailman/listinfo/dime

This electronic message transmission contains information from CSRA that may be attorney-client privileged, proprietary or confidential. The information in this message is intended only for use by the individual(s) to whom it is addressed. If you believe you have received this message in error, please contact me immediately and be aware that any use, disclosure, copying or distribution of the contents of this message is strictly prohibited. NOTE: Regardless of content, this email shall not operate to bind CSRA to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of email for such purpose.