[Dime] DOIC Requirements Analysis

Ben Campbell <ben@nostrum.com> Tue, 21 October 2014 22:10 UTC

Return-Path: <ben@nostrum.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3E2B41A879A for <dime@ietfa.amsl.com>; Tue, 21 Oct 2014 15:10:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.791
X-Spam-Level:
X-Spam-Status: No, score=0.791 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_MESSAGE=0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iuPqo_emGzPa for <dime@ietfa.amsl.com>; Tue, 21 Oct 2014 15:10:08 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4FE341A8764 for <dime@ietf.org>; Tue, 21 Oct 2014 15:10:07 -0700 (PDT)
Received: from [10.0.1.23] (cpe-173-172-146-58.tx.res.rr.com [173.172.146.58]) (authenticated bits=0) by nostrum.com (8.14.9/8.14.7) with ESMTP id s9LMA5iI018797 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 21 Oct 2014 17:10:06 -0500 (CDT) (envelope-from ben@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host cpe-173-172-146-58.tx.res.rr.com [173.172.146.58] claimed to be [10.0.1.23]
From: Ben Campbell <ben@nostrum.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_9EFDF2A6-3419-457F-B098-42E53345FCC4"
X-Mao-Original-Outgoing-Id: 435622205.185286-c523a46eb674ef8e08bf7466b23266b6
Message-Id: <2A9AB49D-E2FA-4BF9-9399-361464482DA3@nostrum.com>
Date: Tue, 21 Oct 2014 17:10:05 -0500
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\))
To: "dime@ietf.org list" <dime@ietf.org>
X-Mailer: Apple Mail (2.1990.1)
Archived-At: http://mailarchive.ietf.org/arch/msg/dime/4mj-RJ45ieibX-Km-CiVPm0alng
Subject: [Dime] DOIC Requirements Analysis
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2014 22:10:16 -0000

Hi,
Benoit requested that the DOIC draft include a requirements analysis against the requirements from RFC 7068. I’ve made an attempt that that, below.
I realize not everyone will agree on my analysis, and there is considerable room for discussion. However, given Benoit’s request, I think it’s important to get these in as soon as possible; if not in 04, then soon afterwards (i.e. before WGLC, and preferably before Honolulu.)
I also created a branch called “Reqs-Analysis" in the GitHub repository that contains  these in XML form.
Thanks!
Ben.
----------------------------------------------
Appendix D.  Requirements Analysis

   This section analyzes the mechanism described in this document
   against the set of requirements detailed in [RFC7068].

D.1.  General

   REQ 1:  The solution MUST provide a communication method for Diameter
           nodes to exchange load and overload information.

           *Partially Compliant*. The mechanism uses new AVPs
           piggybacked on existing Diameter messages to exchange
           overload information.  It does not currently support "load"
           information.  Indication of "Load" information has been left
           for a future extension.



   REQ 2:  The solution MUST allow Diameter nodes to support overload
           control regardless of which Diameter applications they
           support.  Diameter clients and agents must be able to use the
           received load and overload information to support graceful
           behavior during an overload condition.  Graceful behavior
           under overload conditions is best described by REQ 3.

           *Compliant*. The DOIC AVPs can be used in any application
           that allows the extension of AVPs.



   REQ 3:  The solution MUST limit the impact of overload on the overall
           useful throughput of a Diameter server, even when the
           incoming load on the network is far in excess of its
           capacity.  The overall useful throughput under load is the
           ultimate measure of the value of a solution.

           *Compliant*. DOIC provides information that nodes can use to
           reduce the impact of overload.



   REQ 4:  Diameter allows requests to be sent from either side of a
           connection, and either side of a connection may have need to
           provide its overload status.  The solution MUST allow each
           side of a connection to independently inform the other of its
           overload status.

           *Compliant*. DOIC AVPs can be included regardless of
           transaction "direction"



   REQ 5:  Diameter allows nodes to determine their peers via dynamic
           discovery or manual configuration.  The solution MUST work
           consistently without regard to how peers are determined.

           *Compliant*. DOIC contains no assumptions about how peers are
           discovered.  [Note: This may require further study]



   REQ 6:  The solution designers SHOULD seek to minimize the amount of
           new configuration required in order to work.  For example, it
           is better to allow peers to advertise or negotiate support
           for the solution, rather than to require that this knowledge
           to be configured at each node.

           *Partially Compliant*. Most DOIC parameters are advertised
           using the DOIC capability announcement mechanism.  However,
           there are some situations where configuration is required.
           For example, a DOIC node detect the fact that a peer may not
           support DOIC when nodes on the other side of the non-
           supporting node do support DOIC without configuration.



D.2.  Performance

   REQ 7:  The solution and any associated default algorithm(s) MUST
           ensure that the system remains stable.  At some point after
           an overload condition has ended, the solution MUST enable
           capacity to stabilize and become equal to what it would be in
           the absence of an overload condition.  Note that this also
           requires that the solution MUST allow nodes to shed load
           without introducing non-converging oscillations during or
           after an overload condition.

           *Compliant*. The specification offers guidance that
           implementations should apply hysteresis when recovering from
           overload, and avoid sudden ramp ups in offered load when
           recovering.



   REQ 8:  Supporting nodes MUST be able to distinguish current overload
           information from stale information.

           *Partially Compliant*. DOIC overload reports are "soft
           state", that is they expire after an indicated period.  DOIC
           nodes may also send reports that end existing overload
           conditions.  DOIC requires reporting nodes to ensure that all
           relevant reacting nodes receive overload reports.

           However, since DOIC does not allow reports to send OLRs in
           watchdog messages, if an overload condition results in zero
           offered load, the reporting node cannot update the condition
           until the expiration of the original OLR.



   REQ 9:  The solution MUST function across fully loaded as well as
           quiescent transport connections.  This is partially derived
           from the requirement for stability in REQ 7.

           *Not Compliant*. DOIC does not allow OLRs to be sent over
           quiescent transport connections.  This is due to the fact
           that OLRs cannot be sent outside of the application to which
           they apply.



   REQ 10: Consumers of overload information MUST be able to determine
           when the overload condition improves or ends.

           *Partially Compliant*. (See response to previous two
           requirements.)



   REQ 11: The solution MUST be able to operate in networks of different
           sizes.

           *Compliant*. DOIC makes no assumptions about the size of the
           network.  DOIC can operate purely between clients and
           servers, or across agents.



   REQ 12: When a single network node fails, goes into overload, or
           suffers from reduced processing capacity, the solution MUST
           make it possible to limit the impact of the affected node on
           other nodes in the network.  This helps to prevent a small-
           scale failure from becoming a widespread outage.

           *Partially Compliant*. DOIC allows overload reports for an
           entire realm, where abated traffic will not be redirected
           towards another server.  But in situations where nodes choose
           to divert traffic to other nodes, DOIC offers no way of
           knowing whether the new recipients can handle the traffic if
           they have not already indicated overload.  This may be
           mitigated with the use of a future "load" extension, or with
           the use of proprietary dynamic load-balancing mechanisms.



   REQ 13: The solution MUST NOT introduce substantial additional work
           for a node in an overloaded state.  For example, a
           requirement for an overloaded node to send overload
           information every time it received a new request would
           introduce substantial work.

           *Not Compliant*. DOIC does in fact encourage an overloaded
           node to send an OLR in every response.  The working group
           that other mechanisms to ensure that every relevant node
           receives an OLR would create even more work.  [Note: This
           needs discussion.]



   REQ 14: Some scenarios that result in overload involve a rapid
           increase of traffic with little time between normal levels
           and levels that induce overload.  The solution SHOULD provide
           for rapid feedback when traffic levels increase.

           *Compliant*. The piggyback mechanism allows OLRs to be sent
           at the same rate as application traffic.



   REQ 15: The solution MUST NOT interfere with the congestion control
           mechanisms of underlying transport protocols.  For example, a
           solution that opened additional TCP connections when the
           network is congested would reduce the effectiveness of the
           underlying congestion control mechanisms.

           *Compliant*. DOIC does not require or recommend changes in
           the handling of transport protocols or connections.



D.3.  Heterogeneous Support for Solution

   REQ 16: The solution is likely to be deployed incrementally.  The
           solution MUST support a mixed environment where some, but not
           all, nodes implement it.

           *Partially Compliant*. DOIC works with most mixed-deployment
           scenarios.  However, it cannot work across a non-supporting
           proxy that modifies Origin-Host AVPs in answer messages.
           DOIC will have limited impact in networks where the nodes
           that perform server selections do not support the mechanism.



   REQ 17: In a mixed environment with nodes that support the solution
           and nodes that do not, the solution MUST NOT result in
           materially less useful throughput during overload as would
           have resulted if the solution were not present.  It SHOULD
           result in less severe overload in this environment.

           *Compliant*. In most mixed-support deployment, DOIC will
           offer at least some value, and will not make things worse.



   REQ 18: In a mixed environment of nodes that support the solution and
           nodes that do not, the solution MUST NOT preclude elements
           that support overload control from treating elements that do
           not support overload control in an equitable fashion relative
           to those that do.  Users and operators of nodes that do not
           support the solution MUST NOT unfairly benefit from the
           solution.  The solution specification SHOULD provide guidance
           to implementors for dealing with elements not supporting
           overload control.

           *Compliant*. DOIC provides mechanisms to abate load from non-
           supporting sources.  Furthermore, it recommends that
           reporting nodes will still need to be able to apply whatever
           protections they would ordinarily apply if DOIC were not in
           use.



   REQ 19: It MUST be possible to use the solution between nodes in
           different realms and in different administrative domains.

           *Partially Compliant*. DOIC allows sending OLRs across
           administrative domains, and potentially to nodes in other
           realms.  However, an OLR cannot indicate overload for realms
           other than the one in the Origin-Realm AVP of the containing
           answer.



   REQ 20: Any explicit overload indication MUST be clearly
           distinguishable from other errors reported via Diameter.

           *Compliant*. DOIC sends explicit overload indication in
           overload reports.  It does not depend on error result codes.
           [Note: I don't think the resuse of too-busy and unable-to-
           comply for throttled requests impacts this requirement.  Do
           others agree?]



   REQ 21: In cases where a network node fails, is so overloaded that it
           cannot process messages, or cannot communicate due to a
           network failure, it may not be able to provide explicit
           indications of the nature of the failure or its levels of
           overload.  The solution MUST result in at least as much
           useful throughput as would have resulted if the solution were
           not in place.

           *Compliant*. DOIC overload reports have the primary effect of
           suppressing message retries in overload conditions.  DOIC
           recommends that messages never be silently dropped if at all
           possible.



D.4.  Granular Control

   REQ 22: The solution MUST provide a way for a node to throttle the
           amount of traffic it receives from a peer node.  This
           throttling SHOULD be graded so that it can be applied
           gradually as offered load increases.  Overload is not a
           binary state; there may be degrees of overload.

           *Compliant*. The "loss" algorithm expresses a percentage
           reduction.



   REQ 23: The solution MUST provide sufficient information to enable a
           load-balancing node to divert messages that are rejected or
           otherwise throttled by an overloaded upstream node to other
           upstream nodes that are the most likely to have sufficient
           capacity to process them.

           *Not Compliant*. DOIC provides no built in mechanism to
           determine the best place to divert messages that would
           otherwise be throttled.  This can be accomplished with a
           future "load" extension, or with proprietary load balancing
           mechanisms.



   REQ 24: The solution MUST provide a mechanism for indicating load
           levels, even when not in an overload condition, to assist
           nodes in making decisions to prevent overload conditions from
           occurring.

           *Not Compliant*. "Load" information has been left for a
           future extension.



D.5.  Priority and Policy

   REQ 25: The base specification for the solution SHOULD offer general
           guidance on which message types might be desirable to send or
           process over others during times of overload, based on
           application-specific considerations.  For example, it may be
           more beneficial to process messages for existing sessions
           ahead of new sessions.  Some networks may have a requirement
           to give priority to requests associated with emergency
           sessions.  Any normative or otherwise detailed definition of
           the relative priorities of message types during an overload
           condition will be the responsibility of the application
           specification.

           *Compliant*. The specification offers guidance on how
           requests might be prioritized for different types of
           applications.



   REQ 26: The solution MUST NOT prevent a node from prioritizing
           requests based on any local policy, so that certain requests
           are given preferential treatment, given additional
           retransmission, not throttled, or processed ahead of others.

           *Compliant*. Nothing in the specification prevents
           application-specific, implementation-specific, or local
           policies.



D.6.  Security

   REQ 27: The solution MUST NOT provide new vulnerabilities to
           malicious attack or increase the severity of any existing
           vulnerabilities.  This includes vulnerabilities to DoS and
           DDoS attacks as well as replay and man-in-the-middle attacks.
           Note that the Diameter base specification [RFC6733] lacks
           end-to-end security and this must be considered (see the
           Security Considerations in [RFC7068]).  Note that this
           requirement was expressed at a high level so as to not
           preclude any particular solution.  Is is expected that the
           solution will address this in more detail.

           *Unknown*. [Needs further analysis.]



   REQ 28: The solution MUST NOT depend on being deployed in
           environments where all Diameter nodes are completely trusted.
           It SHOULD operate as effectively as possible in environments
           where other nodes are malicious; this includes preventing
           malicious nodes from obtaining more than a fair share of
           service.  Note that this does not imply any responsibility on
           the solution to detect, or take countermeasures against,
           malicious nodes.

           *Partially Compliant*. Since all Diameter security is
           currently at the transport layer, nodes must trust immediate
           peers to enforce trust policies.  However, there are
           situations where a DOIC node cannot determine if an immediate
           peer supports DOIC.  The authors recommend an expert security
           review.



   REQ 29: It MUST be possible for a supporting node to make
           authorization decisions about what information will be sent
           to peer nodes based on the identity of those nodes.  This
           allows a domain administrator who considers the load of their
           nodes to be sensitive information to restrict access to that
           information.  Of course, in such cases, there is no
           expectation that the solution itself will help prevent
           overload from that peer node.

           *Partially Compliant*. (See response to previous
           requirement.)



   REQ 30: The solution MUST NOT interfere with any Diameter-compliant
           method that a node may use to protect itself from overload
           from non-supporting nodes or from denial-of-service attacks.

           *Compliant*. The specification recommends that any such
           protection mechanism needed without DOIC should continue to
           be employed with DOIC.



D.7.  Flexibility and Extensibility

   REQ 31: There are multiple situations where a Diameter node may be
           overloaded for some purposes but not others.  For example,
           this can happen to an agent or server that supports multiple
           applications, or when a server depends on multiple external
           resources, some of which may become overloaded while others
           are fully available.  The solution MUST allow Diameter nodes
           to indicate overload with sufficient granularity to allow
           clients to take action based on the overloaded resources
           without unreasonably forcing available capacity to go unused.
           The solution MUST support specification of overload
           information with granularities of at least "Diameter node",
           "realm", and "Diameter application" and MUST allow
           extensibility for others to be added in the future.

           *Partially Compliant*. All DOIC overload reports are scoped
           to the specific application and realm.  Inside that scope,
           overload can be reported at the specific server or whole
           realm scope.  As currently specified, DOIC cannot indicate
           local overload for an agent.  At the time of this writing,
           the DIME working group has plans to work on an agent-overload
           extension.

           DOIC allows new "scopes" through the use of extended report
           types.



   REQ 32: The solution MUST provide a method for extending the
           information communicated and the algorithms used for overload
           control.

           *Compliant*. DOIC allows new report types and abatement
           algorithms to be created.  These may be indicated using the
           OC-Supported-Features AVP.



   REQ 33: The solution MUST provide a default algorithm that is
           mandatory to implement.

           *Compliant*. The "loss" algorithm is mandatory to implement.



   REQ 34: The solution SHOULD provide a method for exchanging overload
           and load information between elements that are connected by
           intermediaries that do not support the solution.

           *Partially Compliant*. DOIC information can traverse non-
           supporting agents, as long as those agents do not modify
           certain AVPs. (e.g., Origin-Host)