Re: [Dime] New version of DOIC rate draft

Ben Campbell <ben@nostrum.com> Sat, 15 September 2018 23:56 UTC

Return-Path: <ben@nostrum.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 17CC4130E60; Sat, 15 Sep 2018 16:56:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.879
X-Spam-Level:
X-Spam-Status: No, score=-1.879 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JVjmOnJhypM4; Sat, 15 Sep 2018 16:56:50 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9533E130E5B; Sat, 15 Sep 2018 16:56:49 -0700 (PDT)
Received: from [10.0.1.95] (cpe-70-122-203-106.tx.res.rr.com [70.122.203.106]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id w8FNum1U049412 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sat, 15 Sep 2018 18:56:48 -0500 (CDT) (envelope-from ben@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host cpe-70-122-203-106.tx.res.rr.com [70.122.203.106] claimed to be [10.0.1.95]
From: Ben Campbell <ben@nostrum.com>
Message-Id: <6725C490-0449-45BE-BF74-0B937D72CD96@nostrum.com>
Content-Type: multipart/signed; boundary="Apple-Mail=_123590F1-876C-4330-BEF2-92A5114E1322"; protocol="application/pgp-signature"; micalg="pgp-sha512"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Date: Sat, 15 Sep 2018 18:56:46 -0500
In-Reply-To: <a7a5c833-427c-acd4-1502-675ce3c1bbac@usdonovans.com>
Cc: "dime@ietf.org" <dime@ietf.org>
To: draft-ietf-dime-doic-rate-control.all@ietf.org
References: <a7a5c833-427c-acd4-1502-675ce3c1bbac@usdonovans.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/HXKwHDgXevY5TpTHf9ywRoxMCz4>
Subject: Re: [Dime] New version of DOIC rate draft
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Sep 2018 23:56:53 -0000

Hi, thanks for posting the update; I think it is making progress. However,  I still have some comments I would like to address before IETF LC:

Thanks!

Ben.

Substantive Comments:

- General: I still think more discussion is needed about allocating rate to multiple input sources. I get that the actual allocation is a matter of local policy, but there’s still implications that need discussion. I’m not sure I got my concern across in previous discussion, so here’s another attempt:

The issue I think needs elaboration on is how the offered load varies with the number of sources times the (average) rate per each source. That is, if the number of sources changes, the reacting node may need to change the rate limits assigned to each existing source.

As a hypothetical example, lets assume a reporting node wants to limit its entire offered load to 1000 tps. Further assume it has 10 active reacting nodes (all supporting the rate algorithm). Local policy is to allocate the rate limit equally across sources. So it sends an OLR to each of those clients to give it a rate limit of 100 tps. Now, if another 10 reacting nodes become active, it needs to reallocate the load across all 20, giving each a limit of 50 tps. Now, if some of those reacting nodes go off-line, or simply reduce their activity beneath the limit for an extended period of time, the reporting node may need to increase the allocation to the remaining nodes.

This is a fairly fundamental difference between rate and load; rate uses absolute numbers while load uses percentages.

§5.1:

- The text is still not clear when a reporting node must send the first OLR. I understand that the choice of when to set a rate limit is local policy, but some of the text in this section suggests to me that you expect a reporting node to send an OLR immediately when it selects the rate algorithm for a specific reacting node.

For example, paragraph 6 says:

"A reporting node that has selected the rate overload abatement algorithm MUST indicate the rate requested to be applied by DOIC reacting nodes in the OC-Maximum-Rate AVP included in the OC-OLR AVP."

This needs to talk about _when_ it must do that. Without some comment, it seems like it means “when it selects the algorithm” (i.e. “when you indicate support” ). That seems to conflict with the idea of this being local policy.

- new 5th paragraph: Why SHOULD instead of MUST? Is there a situation where you have an OCS but no allocated rate? (e.g. you’ve selected the rate algorithm for the reacting node, but have not send an OLR?)

§5.4, paragraph 1:

Discussion indicated that the intent of this paragraph was the reacting node keeps OCS for each server than indicated support for the rate algorithm. But I don’t see text that says when the reacting note needs to actually create the state entry. I think the answer is “immediately when a reporting node indicates support”, right?

§5.6, first two paragraphs:

I still think the text talking about using different algorithms needs to say something normative about the characteristics of those algorithms. Janet’s comments indicated the normative text is in §7.3.1. But that’s part of the algorithm that they MAY use, so it would not be constraining against other algorithm choices. Perhaps the 2nd paragraph should be normative?

§7.2: “ But the resulting request rate presented to the overloaded reporting node will converge towards the target Diameter request rate.”

Wasn’t there discussion to change this to “... the target Diameter request rate or a lower rate”?

§7.3: "In situations where reacting nodes are configured with some knowledge
   about the reporting node (e.g., operator pre-provisioning), it can be
   beneficial to choose a value of TAU based on how many reacting nodes
   will be sending requests to the reporting node.”

I previously commented that this requires knowledge of other traffic sources, not just the reporting node. I did not see a response.

Editorial Comments:

[I note a number of editorial comments that fell into the “made changes unless indicated otherwise” category did not seem to get changed. I included those here again.]

- There are still things reported by IDNits that need checking. Some are obviously noise, but some appear to be real. (Line length, references in abstract, and references to RFC 5226)

- Was there a reason not to use the RFC 8174 boilerplate in the “Requirements” section? (I thought you had intended to do so.)

§1:
- first paragraph: There were some editorial fixes that I thought were agreed that did not appear in the new version:
s/“protect the stability”/“ensure the stability”
s/“subjected with”/“subjected to”
(new comment): In the new last sentence, is there a reason for the all-caps? That’s normally reserved for normative keywords.

§4:
- first paragraph: Please consider active voice in the last sentence,

§5.1:
- New 5th paragraph:
s/entery/entry

§7.1, 2nd paragraph: “ signal one another support for rate-based overload
  control”: This seems awkward; are there missing words? Perhaps there should be something like “their” or “that they” between “another” and “support”?

§7.2, last two paragraphs: The MUSTs do not seem necessary. 2119 keywords should be used when there is some sort of choice or room for error. You don’t need them to define the basic operation of the protocol.

§7.3.1: I found the text hard to follow. It would help to declare all the identifiers and initialization up front, and to present things in more of a stepwise fashion.

- T is effectively a time interval, right? It would help to say that, especially later when you subtract a different time interval from it.

- paragraph 9: Should “admit” be “emit”?

- the example code has several mentions of SIP requests.

§7.3.2:

- “ Request candidates for reduction, requests not subject to reduction (except under extenuating circumstances when there aren’t any messages in the first category that can be reduced).”:

That seems like an awkward way to say that the second category is the set of requests that is only subject to reduction if there are no messages left in the first category.

- “ This can be generalized to n priorities using n thresholds for n>2 in the obvious way.”: I suggest you refrain from calling it “obvious”.

§7.3.3: Paragraph starting with “ Then (only) if the arrival is admitted, increase the bucket by an amount…”: I think you increase the bucket _count_, right?


> On Sep 10, 2018, at 3:44 PM, Steve Donovan <srdonovan@usdonovans.com> wrote:
> 
> I've posted a new version of the rate draft.
> 
> I've attached the diff file.
> 
> Regards,
> 
> Steve
> <Diff  draft-ietf-dime-doic-rate-control-08.txt - draft-ietf-dime-doic-rate-control-09.txt.html>_______________________________________________
> DiME mailing list
> DiME@ietf.org
> https://www.ietf.org/mailman/listinfo/dime