Re: [Dime] Discussion on draft-ietf-dime-load-02

Alan DeKok <aland@deployingradius.com> Thu, 21 July 2016 14:37 UTC

Return-Path: <aland@deployingradius.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DE90412D677 for <dime@ietfa.amsl.com>; Thu, 21 Jul 2016 07:37:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t-bUClZXF344 for <dime@ietfa.amsl.com>; Thu, 21 Jul 2016 07:37:25 -0700 (PDT)
Received: from mail.networkradius.com (mail.networkradius.com [62.210.147.122]) by ietfa.amsl.com (Postfix) with ESMTP id 3369812D666 for <dime@ietf.org>; Thu, 21 Jul 2016 07:37:25 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mail.networkradius.com (Postfix) with ESMTP id 1B3231BA7; Thu, 21 Jul 2016 14:37:24 +0000 (UTC)
Received: from mail.networkradius.com ([127.0.0.1]) by localhost (mail-server.vmhost2.networkradius.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LQlvjaVzS5A1; Thu, 21 Jul 2016 14:37:24 +0000 (UTC)
Received: from dhcp-b14d.meeting.ietf.org (dhcp-b14d.meeting.ietf.org [31.133.177.77]) by mail.networkradius.com (Postfix) with ESMTPSA id AACD1B92; Thu, 21 Jul 2016 14:37:23 +0000 (UTC)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Alan DeKok <aland@deployingradius.com>
In-Reply-To: <248b648144a14fd4ad5bff65cdc40886@CSRRDU1EXM025.corp.csra.com>
Date: Thu, 21 Jul 2016 16:37:23 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <08CCC2E7-96FB-4433-A940-B22D22DCA09A@deployingradius.com>
References: <248b648144a14fd4ad5bff65cdc40886@CSRRDU1EXM025.corp.csra.com>
To: "Gunn, Janet P" <Janet.Gunn@csra.com>
X-Mailer: Apple Mail (2.3124)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/gIMk9p74okR_yhceDpqHbuSZUqo>
Cc: "dime@ietf.org" <dime@ietf.org>
Subject: Re: [Dime] Discussion on draft-ietf-dime-load-02
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2016 14:37:30 -0000

On Jul 21, 2016, at 4:08 PM, Gunn, Janet P <Janet.Gunn@csra.com> wrote:
> 
> I do not think I am going to be able to remotely participate in Friday's DIME meeting, but I do want to make a high level  comment on the discussion about draft-ietf-dime-load-02.
>  
> A lot of the current discussion seems to be focusing on the best load balancing ALGORITHM, and the right way to calculate the load VALUE.

  I don't see much in the way of a load-balancing algorithm in the draft.  Just a discussion that the systems can use the load values to do load-balancing, and a small discussion on active/active vs active/passive.

  Which part refers to specific algorithms?

> I think that this ID needs to focus on the means for SHARING load information, without pre-supposing the way in which the "load value" will be used, or how I is calculated

  The systems should use "load value" to do load balancing.  That necessitates some discussion of how that value is used.

> I suspect that most of the environments in which load balancing will be deployed will be "walled gardens", so there is not an overwhelming need for  nodes in different environments to use the load value in the same way, or even use the same load balancing approach.

  I suspect that there are very few stable load-balancing algorithms.  As such, I think everyone will end up implementing pretty much the same thing.

  I do have other concerns with the graph.  Using inverted numbers for "load" is confusing.  Perhaps an explanation that the numbers refer to *capacity* would be good.  And capacity is measured on an arbitrary scale from 0..65535.

  Section A.10 is also confusing:

  When removing a node in a controlled way (e.g. for maintenance
   purpose, so outside a failure case), it might be appropriate to
   progressively reduce the traffic to this node by routing traffic to
   other nodes.  Simple load information (load percentage) would be not
   sufficient.

  Why?  The load numbers of 0 (loaded) to 65535 (unloaded) are just load percentage described in a 16-bit fixed-width number.  Why does a percentage make any difference?

  Further:

   The handling of the node removal is implementation
   specific but it can rely on the evolution of the load information
   received from the node to be removed

  I suspect the simple way to implement node removal is for the node to just gradually report loads leading to 0.  Once the reported load is zero, and it has no active traffic, it can be safely removed.  This should be suggested in the draft as a safe way to remove nodes.

  Also, A.9 says:

  The previous scenarios assume that traffic can be load balanced among
   all peers that are eligible to handle a request.  That is, the peers
   operate in an "active-active" configuration.  In an "active-standby"
   configuration, traffic would be load-balanced among active peers.
   Requests would only be sent to peers in a "standby" state if the
   active peers became unavailable.  For example, requests might be
   diverted to a stand-by peer if one or more active peers becomes
   overloaded.

  If it's possible for the nodes to signal the active/passive state in the protocol, it would help a lot.  It would mean that the difference between active/active and active/passive is just signalling, and not configuration.

  i.e. a node may signal load 65535, but also desire itself to be in the "passive" state.  Perhaps some way of having references would be good.  i.e. "load on system B is 0, unless the load on system A is also 0, in which case load on B is 65535".

  I'm not sure how to implement that, tho.

  Alan DeKok.