[Dime] draft-ietf-dime-doic-rate-control-08

Ben Campbell <ben@nostrum.com> Wed, 16 May 2018 05:31 UTC

Return-Path: <ben@nostrum.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B2FD2128954; Tue, 15 May 2018 22:31:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.02
X-Spam-Level:
X-Spam-Status: No, score=0.02 tagged_above=-999 required=5 tests=[T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EFLBSl5zjPg5; Tue, 15 May 2018 22:31:22 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9741A124234; Tue, 15 May 2018 22:31:19 -0700 (PDT)
Received: from [10.0.1.94] (cpe-66-25-7-22.tx.res.rr.com [66.25.7.22]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id w4G5VIeS098513 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 16 May 2018 00:31:19 -0500 (CDT) (envelope-from ben@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host cpe-66-25-7-22.tx.res.rr.com [66.25.7.22] claimed to be [10.0.1.94]
From: Ben Campbell <ben@nostrum.com>
Content-Type: multipart/signed; boundary="Apple-Mail=_A0F71940-3FF5-45D2-B465-5E038B0A45A7"; protocol="application/pgp-signature"; micalg="pgp-sha512"
Mime-Version: 1.0 (Mac OS X Mail 11.3 \(3445.6.18\))
Message-Id: <8FB01050-7B63-4A1B-B50A-974D0FA448C4@nostrum.com>
Date: Wed, 16 May 2018 00:31:17 -0500
Cc: dime@ietf.org
To: draft-ietf-dime-doic-rate-control.all@ietf.org
X-Mailer: Apple Mail (2.3445.6.18)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/9boUSFFKA27hCrkPvqiaYBKUYPU>
Subject: [Dime] draft-ietf-dime-doic-rate-control-08
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 May 2018 05:31:25 -0000

Hi,

This is my AD Evaluation of draft-ietf-dime-doic-rate-control-08.

Thank you for doing the work on this. I think the document is on the right track, but needs a little work before it will be ready for IETF last call.

Thanks!

Ben.

————————————————————

Substantive Comments:

General:
The document seems inconsistent about whether rate limits are only reported during overload conditions, or in advance of overload conditions.

I’d like to see the need to allocate the rate limit across all potential sources of traffic given some more emphasis. (Maybe a sub-section of its own?)

§1:
- “ While this can effectively decrease the load handled by the
   server, it does not directly address cases where the rate of arrival
   of service requests increases quickly."

I think it fails to address cases where the load changes rapidly in either direction, right? At least, the following text seems to say that.

§3: Does the need for future report types to consider the rate algorithm have IANA implications?

§5.1: The first paragraph indicates state should be kept for every reacting node to which it sends an OLR. But the 5th paragraph can be interpreted to say it sends an OLR to every reacting node with which it has negotiated use of the rate algorithm. (see general comment).

§5.4: The first paragraph seems to suggest the reacting node keeps OCS for every server that has indicated support for the rate algorithm, not just nodes that have sent OLRs. Is that the intent?

§5.6, first paragraph: The MAY seems week here. I know and agree that we don’t want to force a particular application. But don’t we need to say that if an implementation uses a different algorithm, it MUST have the same behavior as the algorithm in section 7?

§7.2, third and 4th paragraphs: I don’t understand what this is trying to say. Please elaborate.
-6th paragraph: “  may receive requests at a rate below its target maximum Diameter  request rate while others above that target rate.  But the resulting request rate presented to the overloaded reporting node will converge towards the target Diameter request rate.”

Why do we expect traffic to converge to the rate limit? It seems like that won't happen if some reporting nodes are not sending at full capacity, unless work can be shifted from the high-rate sources to the slow-rate ones.

§7.3.1: paragraph starting with “ In situations where reacting nodes are configured with some knowledge”

that requires knowledge of other traffic sources, not just knowledge of the reporting node.

The example code says to transmit a message if (Xp <= TAU). But the text said the limit was “T+TAU).

§9: I think the security considerations need more thought. What are the security considerations specific to the rate algorithm? If there aren’t any, then please describe the rational behind that. But I suspect there are, for example, can this be used for a DoS? Can it be used to help _mitigate_ a DoS? Could one reacting node cause others to be traffic starved?

Editorial Comments:

General: IDNits returns several issues. Some of those may be errors on its part, but I’m pretty sure some of them are real. Please resolve these.

Requirements: There are instances of lower case “must” and “should”. Please use the new boilerplate from RFC 8174.

§1
- “protect the stability” seems awkward. Maybe “ensure the stability”?
- Also s/ “subjected with” / “subjected to”.

- Please cite the definitions for “reporting node” and “reacting node”. I know they are defined later, but these are somewhat non-intuitive concepts and people will likely stumble over the terms when they are used before they are defined.
- Please expand DOIC and SOC on first mention in the body. (Even if they were expanded in the abstract.)

§2:
 - Definitions of “Diameter Node” and “Diameter Endpoint”: Please use proper citations rather than just referring to the RFC in text. For example: “Diameter Node: A Diameter client, server, or agent.  [RFC6733]”

§4,
- first paragraph:
— “This extension defines”: I think this should say “This document defines…”
— Please consider active voice for the last sentence.

- 2nd paragraph: The first sentence seems awkward. Consider something to the effect of “Since all nodes that support DOIC are required to support the loss algorithm…”

- 3rd paragraph: This paragraph seems to belong as part of the previous paragraph.

- 4th paragraph: “ AVP in the sent to the DOIC reacting nodes”: Missing word(s)?

-5th paragraph: “A reporting node MAY select…” : Is that a new permission, or a statement of fact?

§5.1, third paragraph: The text is not clear whether this means OCS should be maintained per supported application, etc, or that it should maintain state when the rate algorithm on a per supported application, etc, basis.
- 4th paragraph: s/overoload/overload

§5.3: 2nd paragraph: This seems like a redundant restatement of the first paragraph.
- third paragraph: The first sentence is convoluted; can it be broken into simpler sentences?

§6.1.1, definition of " OLR_RATE_ALGORITHM”: Two periods at end of sentence.


§7.1, 2nd paragraph: “ signal one another support for rate-based overload
   control”: This seems awkward; are there missing words?

§7.2, last two paragraphs: The MUSTs do not seem necessary. 2119 keywords should be used when there is some sort of choice or room for error. You don’t need them to define the basic operation of the protocol.

§7.3.1: I found the text hard to follow. It would help to declare all the identifiers and initialization up front, and to present things in more of a stepwise fashion.

- T is effectively a time interval, right? It would help to say that, especially later when you subtract a different time interval from it.

- paragraph 9: Should “admit” be “emit”?

- the example code has several mentions of SIP requests.

§7.3.2: “ Request candidates for reduction, requests not subject to reduction (except under extenuating circumstances when there aren’t any messages in the first category that can be reduced).”: That seems like an awkward way to say that the second category is the set of requests that is only subject to reduction if there are no messages left in the first category.

- “ This can be generalized to n priorities using n thresholds for n>2 in the obvious way.”: I suggest you refrain from calling it “obvious".

§7.3.3: Paragraph starting with “ Then (only) if the arrival is admitted, increase the bucket by an amount…”: I think you increase the bucket _count_, right?