Re: [Dime] draft-ietf-dime-doic-rate-control-08

Steve Donovan <srdonovan@usdonovans.com> Wed, 13 June 2018 17:30 UTC

Return-Path: <srdonovan@usdonovans.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C0C4F130F18 for <dime@ietfa.amsl.com>; Wed, 13 Jun 2018 10:30:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.121
X-Spam-Level:
X-Spam-Status: No, score=-1.121 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_NEUTRAL=0.779] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m3i7tBqYT8wB for <dime@ietfa.amsl.com>; Wed, 13 Jun 2018 10:30:41 -0700 (PDT)
Received: from biz131.inmotionhosting.com (biz131.inmotionhosting.com [173.247.247.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5D53C130E6C for <dime@ietf.org>; Wed, 13 Jun 2018 10:30:41 -0700 (PDT)
Received: from [38.98.37.142] (port=34776 helo=Steves-MacBook-Air.local) by biz131.inmotionhosting.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89_1) (envelope-from <srdonovan@usdonovans.com>) id 1fT9bB-00AN4P-TH; Wed, 13 Jun 2018 10:30:41 -0700
To: dime@ietf.org, ecnoel@research.att.com
References: <8FB01050-7B63-4A1B-B50A-974D0FA448C4@nostrum.com> <fd1b638dfc8f48b5b46b105ac40e5124@CSRRDU1EXM025.corp.csra.com>
From: Steve Donovan <srdonovan@usdonovans.com>
Message-ID: <06dc2172-4b2a-c0c4-e411-8928acd13a1b@usdonovans.com>
Date: Wed, 13 Jun 2018 18:30:12 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <fd1b638dfc8f48b5b46b105ac40e5124@CSRRDU1EXM025.corp.csra.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-OutGoing-Spam-Status: No, score=0.7
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - biz131.inmotionhosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - usdonovans.com
X-Get-Message-Sender-Via: biz131.inmotionhosting.com: authenticated_id: srdonovan@usdonovans.com
X-Authenticated-Sender: biz131.inmotionhosting.com: srdonovan@usdonovans.com
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/L9ExKl3gPdEWwtiTkeJogjjnzdQ>
Subject: Re: [Dime] draft-ietf-dime-doic-rate-control-08
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Jun 2018 17:30:44 -0000

See my comments inline.

Steve

On 5/25/18 4:17 PM, Gunn, Janet P (CNV) wrote:
> Not an author, but I have a strong interest in this ID.  Comments in line.
> Janet
>
> -----Original Message-----
> From: DiME <dime-bounces@ietf.org> On Behalf Of Ben Campbell
> Sent: Wednesday, May 16, 2018 1:31 AM
> To: draft-ietf-dime-doic-rate-control.all@ietf.org
> Cc: dime@ietf.org
> Subject: [Dime] draft-ietf-dime-doic-rate-control-08
>
> Substantive Comments:
>
> General:
> The document seems inconsistent about whether rate limits are only reported during overload conditions, or in advance of overload conditions.
>
> <JPG> I think that would be "local policy" of the serving (reporting) node, and independent of the protocol used to communicate it.  I think most cases would be reactive, but I can see situations where it could be proactive.<JPG>
SRD> I agree with Janet, when the report is sent is very much local 
policy.  There is no reason to attempt to prevent proactive use of the 
rate mechanism.
>
> I’d like to see the need to allocate the rate limit across all potential sources of traffic given some more emphasis. (Maybe a sub-section of its own?)
>
> <JPG> I agree, but again I see that as "local policy" of the serving (reporting) node. In particular, there may be reacting nodes that do not support the rate abatement algorithm.<JPG>
SRD> Again, I agree with Janet.  This is local policy and there may well 
be a mix of rate and loss if not all nodes support rate.  I don't think 
it is appropriate to say that rate should be preferred over loss.  But 
maybe I'm missing your meaning on "allocate the rate limit".
>
>
> §1:
> - “ While this can effectively decrease the load handled by the
>     server, it does not directly address cases where the rate of arrival
>     of service requests increases quickly."
>
> I think it fails to address cases where the load changes rapidly in either direction, right? At least, the following text seems to say that.
>
> <JPG> I agree.  When there are rapid fluctuations in the offered load, the "loss" algorithm errs both in  throttling TOO MUCH when there is a dip in offered load, and throttling NOT ENOUGH when there is a spike in offered load.<JPG>
SRD> The text in section 1 talks about this already.  Is there a 
specific change being suggested?
>
>
>
> §3: Does the need for future report types to consider the rate algorithm have IANA implications?
SRD> Are you suggesting that the IANA section indicate that all new 
report types MUST indicate whether or not the rate algorithm can be used 
with that report type?  I can make that change to the IANA section if it 
would be appropriate.
>
> §5.1: The first paragraph indicates state should be kept for every reacting node to which it sends an OLR. But the 5th paragraph can be interpreted to say it sends an OLR to every reacting node with which it has negotiated use of the rate algorithm. (see general comment).
SRD> I'm missing something.  The first paragraph says the reporting node 
maintains state any time it sends a rate overload report.  The fifth 
paragraph is just saying that the report must include the rate information.
>
> §5.4: The first paragraph seems to suggest the reacting node keeps OCS for every server that has indicated support for the rate algorithm, not just nodes that have sent OLRs. Is that the intent?
SRD>  Yes, that is the intent.  This allows the reacting node to make 
sure that the machinery needed to respond to a rate request is in place 
prior to receiving an OLR.
>
> §5.6, first paragraph: The MAY seems week here. I know and agree that we don’t want to force a particular application. But don’t we need to say that if an implementation uses a different algorithm, it MUST have the same behavior as the algorithm in section 7?
>
> <JPG> I think it MUST "limit the message rate to the OC-Maximum-Rate AVP value in units of messages per second" (as stated in 7.3.1).  The algorithm described in the rest of 7.3.1 and 7.3.2 is somewhat more sophisticated, allowing for a smoothing factor (TAU) and prioritization.  I do not think we need  to say that the selected algorithm MUST have those features.<JPG>
SRD> Again, I agree with Janet.
>
> §7.2, third and 4th paragraphs: I don’t understand what this is trying to say. Please elaborate.
>
> <JPG>3rd para - Just as a "for instance"- if the reacting node has 50/second low priority messages and 50/second high priority messages that it want to send, and has a rate limit of 75/second, it will send 25/second low priority messages and 50 /second high priority messages.  The limit of 75/second applies to the combined stream of high and low priority messages, even though only the low priority messages are being abated.<JPG >
>
> <JPG> 4th para - in the same example, it could be that the high priority messages typically require more processing resources (cpu, etc) than the low priority messages (or vice versa).  So cutting the rate to 75/sec may NOT produce the expected reduction in resource usage.<JPG>
SRD> Thanks Janet, I couldn't have explained it better.
>
>
> -6th paragraph: “  may receive requests at a rate below its target maximum Diameter  request rate while others above that target rate.  But the resulting request rate presented to the overloaded reporting node will converge towards the target Diameter request rate.”
>
> Why do we expect traffic to converge to the rate limit? It seems like that won't happen if some reporting nodes are not sending at full capacity, unless work can be shifted from the high-rate sources to the slow-rate ones.
>
> <JPG> Probably would be better to say that it "will converge  toward a rate at or below the target Diameter request rate.”<JPG>
SRD> I'm okay with making Janet's suggested change.
>
> §7.3.1: paragraph starting with “ In situations where reacting nodes are configured with some knowledge”
>
> that requires knowledge of other traffic sources, not just knowledge of the reporting node.
>
> The example code says to transmit a message if (Xp <= TAU). But the text said the limit was “T+TAU).
>
> <JPG> I think it is supposed to be "T+TAU"<JPG>
SRD> I'd like to get Eric's opinion on this.  This section was copied 
from the SOC RFC so if it is in error here than it is in error there as 
well.
>
> §9: I think the security considerations need more thought. What are the security considerations specific to the rate algorithm? If there aren’t any, then please describe the rational behind that. But I suspect there are, for example, can this be used for a DoS? Can it be used to help _mitigate_ a DoS? Could one reacting node cause others to be traffic starved?
>
> <JPG>It is possible that a reacting node that does not support overload control could starve the nodes that do support overload control, but this is also true of the loss based version<JPG>
SRD> I'm not convinced that there are security scenarios that are new or 
different for rate versus those documented in the existing DOIC 
specifications.
>
> Editorial Comments:
>
> General: IDNits returns several issues. Some of those may be errors on its part, but I’m pretty sure some of them are real. Please resolve these.
SRD> I'll look at those next time I try to submit but I've not gotten 
IDNits errors in the past.

SRD> I've made the suggested changes below unless indicated otherwise.
>
> Requirements: There are instances of lower case “must” and “should”. Please use the new boilerplate from RFC 8174.
>
> §1
> - “protect the stability” seems awkward. Maybe “ensure the stability”?
> - Also s/ “subjected with” / “subjected to”..
>
> - Please cite the definitions for “reporting node” and “reacting node”. I know they are defined later, but these are somewhat non-intuitive concepts and people will likely stumble over the terms when they are used before they are defined.
> - Please expand DOIC and SOC on first mention in the body. (Even if they were expanded in the abstract.)
>
> §2:
>   - Definitions of “Diameter Node” and “Diameter Endpoint”: Please use proper citations rather than just referring to the RFC in text. For example: “Diameter Node: A Diameter client, server, or agent.  [RFC6733]”
>
> §4,
> - first paragraph:
> — “This extension defines”: I think this should say “This document defines…”
> — Please consider active voice for the last sentence.
>
> - 2nd paragraph: The first sentence seems awkward. Consider something to the effect of “Since all nodes that support DOIC are required to support the loss algorithm…”
>
> - 3rd paragraph: This paragraph seems to belong as part of the previous paragraph.
SRD> I think it is a new paragraph.
>
> - 4th paragraph: “ AVP in the sent to the DOIC reacting nodes”: Missing word(s)?
SRD> Yes indeed, added "message sent".
>
> -5th paragraph: “A reporting node MAY select…” : Is that a new permission, or a statement of fact?
SRD> It is a statement of fact.  Changed may to can.
>
> §5.1, third paragraph: The text is not clear whether this means OCS should be maintained per supported application, etc, or that it should maintain state when the rate algorithm on a per supported application, etc, basis.
SRD> I don't understand the point being made here.
> - 4th paragraph: s/overoload/overload
>
> §5.3: 2nd paragraph: This seems like a redundant restatement of the first paragraph.
SRD> The first paragraph indicates that the reporting node must indicate 
the rate abatement algorithm in the OCS.  The second paragraph indicates 
that the reporting node must include the rate value in the OCS.
> - third paragraph: The first sentence is convoluted; can it be broken into simpler sentences?
SRD> I shorted it from:

    When selecting the rate algorithm in the response to a request that
    contained an OC-Supporting-Features AVP with an OC-Feature-Vector AVP
    indicating support for the rate feature, a reporting node...

to

When selecting the rate algorithm a reporting node...
>
> §6.1.1, definition of " OLR_RATE_ALGORITHM”: Two periods at end of sentence.
>
SRD> I am hesitant to change any of the test in section 7 given that it 
is taken from the SOC specification.  I would prefer that Eric comment 
on these proposed changes before including them in the next version.
> §7.1, 2nd paragraph: “ signal one another support for rate-based overload
>     control”: This seems awkward; are there missing words?
>
> §7.2, last two paragraphs: The MUSTs do not seem necessary. 2119 keywords should be used when there is some sort of choice or room for error. You don’t need them to define the basic operation of the protocol.
>
> §7.3.1: I found the text hard to follow. It would help to declare all the identifiers and initialization up front, and to present things in more of a stepwise fashion.
>
> - T is effectively a time interval, right? It would help to say that, especially later when you subtract a different time interval from it.
>
> - paragraph 9: Should “admit” be “emit”?
>
> - the example code has several mentions of SIP requests.
>
> §7.3.2: “ Request candidates for reduction, requests not subject to reduction (except under extenuating circumstances when there aren’t any messages in the first category that can be reduced).”: That seems like an awkward way to say that the second category is the set of requests that is only subject to reduction if there are no messages left in the first category.
>
> <JPG> Yes, that is what it means.<JPG>
>
> - “ This can be generalized to n priorities using n thresholds for n>2 in the obvious way.”: I suggest you refrain from calling it “obvious".
>
> §7.3.3: Paragraph starting with “ Then (only) if the arrival is admitted, increase the bucket by an amount…”: I think you increase the bucket _count_, right?
>
>
>
>
>
>
>
>
>
>
>
>
>
> This electronic message transmission contains information from CSRA that may be attorney-client privileged, proprietary or confidential. The information in this message is intended only for use by the individual(s) to whom it is addressed. If you believe you have received this message in error, please contact me immediately and be aware that any use, disclosure, copying or distribution of the contents of this message is strictly prohibited. NOTE: Regardless of content, this email shall not operate to bind CSRA to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of email for such purpose.
> _______________________________________________
> DiME mailing list
> DiME@ietf.org
> https://www.ietf.org/mailman/listinfo/dime