Re: [Dime] draft-ietf-dime-doic-rate-control-08

"NOEL, ERIC C" <en5192@att.com> Tue, 11 September 2018 18:22 UTC

Return-Path: <en5192@att.com>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 29BC8130F02 for <dime@ietfa.amsl.com>; Tue, 11 Sep 2018 11:22:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.6
X-Spam-Level:
X-Spam-Status: No, score=-0.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id as1OSGI1lAMA for <dime@ietfa.amsl.com>; Tue, 11 Sep 2018 11:22:21 -0700 (PDT)
Received: from mx0a-00191d01.pphosted.com (mx0a-00191d01.pphosted.com [67.231.149.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0D7551292F1 for <dime@ietf.org>; Tue, 11 Sep 2018 11:22:21 -0700 (PDT)
Received: from pps.filterd (m0049295.ppops.net [127.0.0.1]) by m0049295.ppops.net-00191d01. (8.16.0.22/8.16.0.22) with SMTP id w8BGb92j030066; Tue, 11 Sep 2018 12:42:20 -0400
Received: from alpi155.enaf.aldc.att.com (sbcsmtp7.sbc.com [144.160.229.24]) by m0049295.ppops.net-00191d01. with ESMTP id 2meh4ggrc5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 11 Sep 2018 12:42:19 -0400
Received: from enaf.aldc.att.com (localhost [127.0.0.1]) by alpi155.enaf.aldc.att.com (8.14.5/8.14.5) with ESMTP id w8BGgHLE027652; Tue, 11 Sep 2018 12:42:18 -0400
Received: from zlp27130.vci.att.com (zlp27130.vci.att.com [135.66.87.38]) by alpi155.enaf.aldc.att.com (8.14.5/8.14.5) with ESMTP id w8BGgBsJ027465; Tue, 11 Sep 2018 12:42:12 -0400
Received: from zlp27130.vci.att.com (zlp27130.vci.att.com [127.0.0.1]) by zlp27130.vci.att.com (Service) with ESMTP id C9BC240F6CE3; Tue, 11 Sep 2018 16:42:11 +0000 (GMT)
Received: from MISOUT7MSGHUBAC.ITServices.sbc.com (unknown [130.9.129.147]) by zlp27130.vci.att.com (Service) with ESMTPS id 8FE8F40F6CE5; Tue, 11 Sep 2018 16:42:11 +0000 (GMT)
Received: from MISOUT7MSGUSRDC.ITServices.sbc.com ([169.254.3.214]) by MISOUT7MSGHUBAC.ITServices.sbc.com ([130.9.129.147]) with mapi id 14.03.0415.000; Tue, 11 Sep 2018 12:42:10 -0400
From: "NOEL, ERIC C" <en5192@att.com>
To: Steve Donovan <srdonovan@usdonovans.com>, "dime@ietf.org" <dime@ietf.org>
Thread-Topic: [Dime] draft-ietf-dime-doic-rate-control-08
Thread-Index: AQHUSUWjR15/agoh/EyBO1EhrVAQvqTrSfmg
Date: Tue, 11 Sep 2018 16:42:10 +0000
Message-ID: <432544DCDB78E046B9E22D0EE8F4190326288428@MISOUT7MSGUSRDC.ITServices.sbc.com>
References: <2C4A8F89-9FB1-483F-B160-52822F71531F@nostrum.com> <b65b92d8-e1ba-195d-0662-95cb3492e4eb@usdonovans.com> <432544DCDB78E046B9E22D0EE8F419032625DFDB@MISOUT7MSGUSRDC.ITServices.sbc.com> <a503a984-3f82-5872-0c60-68102efd47e2@usdonovans.com>
In-Reply-To: <a503a984-3f82-5872-0c60-68102efd47e2@usdonovans.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [135.207.139.197]
Content-Type: multipart/alternative; boundary="_000_432544DCDB78E046B9E22D0EE8F4190326288428MISOUT7MSGUSRDC_"
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-09-11_08:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809110164
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/PotgtaEElyMYpiY2I_uEIewx6HE>
Subject: Re: [Dime] draft-ietf-dime-doic-rate-control-08
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Sep 2018 18:22:27 -0000

Hi Steve,

All looks good. Many thanks for following through,

Eric Noel
AT&T Labs, Inc.
Rethink Possible

Optimization, Reliability and Customer Analytics
200 South Laurel Avenue, D5-3C38
Middletown, NJ 07748
P: 732.420.4174
ecnoel@att.com<mailto:jsmith@att.com>

From: Steve Donovan [mailto:srdonovan@usdonovans.com]
Sent: Monday, September 10, 2018 4:33 PM
To: NOEL, ERIC C <en5192@att.com>; dime@ietf.org
Subject: Re: [Dime] draft-ietf-dime-doic-rate-control-08

Eric,

Thanks for the comments.  See my comments inline.

I will be submitting a new version of the document shortly.  Hopefully I have captured all of the suggested changes.

Steve

On 8/8/18 5:05 PM, NOEL, ERIC C wrote:
>
> Hi Steve,
>
>
>
> It seems my mail system has been moving to spam your emails.
>
>
>
> Just noticed below while cleaning my spam folder, apologies for the
> lack of responses.
>
>
>
> Please see below.
>
>
>
>>>> General: The document seems inconsistent about whether rate
>>>> limits are only reported during overload conditions, or in
>>>> advance of overload conditions.
>>>>
>>>> <JPG> I think that would be "local policy" of the serving
>>>> (reporting) node, and independent of the protocol used to
>>>> communicate it.  I think most cases would be reactive, but I
>>>> can see situations where it could be proactive.<JPG>
>>> SRD> I agree with Janet, when the report is sent is very much
>>> local policy.  There is no reason to attempt to prevent proactive
>>> use of the rate mechanism.
>>
>> That’s fine with me, but a sentence or two to that effect would be
>> helpful. (On re-reading, I see where section 1 talks about
>> “approaching overload or overloaded”, but I assume neither of those
>> conditions are necessary?)

srd> I've added the following in section 5.5:

          Note: It is also possible for the reporting node to send overload
          reports with the rate algorithm indicated when the reporting node is
          not in an overloaded state.  This could be a strategy to proactively
          avoid entering into an overloaded state.  Whether to do so is
          up to local policy.
>>
>>>>
>>>> I’d like to see the need to allocate the rate limit across all
>>>> potential sources of traffic given some more emphasis. (Maybe a
>>>> sub-section of its own?)
>>>>
>>>> <JPG> I agree, but again I see that as "local policy" of the
>>>> serving (reporting) node. In particular, there may be reacting
>>>> nodes that do not support the rate abatement algorithm.<JPG>
>>> SRD> Again, I agree with Janet.  This is local policy and there
>>> may well be a mix of rate and loss if not all nodes support rate.
>>> I don't think it is appropriate to say that rate should be
>>> preferred over loss.  But maybe I'm missing your meaning on
>>> "allocate the rate limit”.
>>
>> Okay, but you still have to allocate the rate limit across all
>> sources that you apply the rate algorithm to, right? That is, the
>> total offered rate will be something like (average rate per source)
>> * (number of sources). Or am I misunderstanding something?
>
>
>
> <EN> Correct, the reporting node maintains an overload control state
> for each reacting node.
>
SRD> I've added the following to section 5.1:

          The rate OCS entery SHOULD include the rate allocated to each reacting note.

>
>
>
>
>>>> §1: - “ While this can effectively decrease the load handled by
>>>> the server, it does not directly address cases where the rate
>>>> of arrival of service requests increases quickly."
>>>>
>>>> I think it fails to address cases where the load changes
>>>> rapidly in either direction, right? At least, the following
>>>> text seems to say that.
>>>>
>>>> <JPG> I agree.  When there are rapid fluctuations in the
>>>> offered load, the "loss" algorithm errs both in  throttling TOO
>>>> MUCH when there is a dip in offered load, and throttling NOT
>>>> ENOUGH when there is a spike in offered load.<JPG>
>>> SRD> The text in section 1 talks about this already.  Is there a
>>> specific change being suggested?
>>
>> Section 1 talks about rapidly increasing load. Did I miss mention
>> of rapidly _decreasing_ load?
>
>
>
> <EN> Suggested edit:
>
>
>
> While this can effectively decrease the load handled by the
>
> server, it does not directly address cases where the rate of arrival
>
> of service requests _changes_increases quickly.  _For instance,_ if
> the service requests that
>
> result in Diameter transactions increase quickly then the loss
>
> algorithm cannot guarantee the load presented to the server remains
>
> below a specific rate level.  _The loss algorithm can be slow to_
>
> _protect the stability of reporting nodes when subjected with
> rapidly_
>
> _changing loads. The "loss" algorithm errs both in throttling TOO
> MUCH _
>
> _when there is a dip in offered load, and throttling NOT ENOUGH when
> there _
>
> _is a spike in offered load._
>
>
SRD> I've incorporated Eric's suggested wording.
>
>
>
>>>> §3: Does the need for future report types to consider the rate
>>>> algorithm have IANA implications?
>>> SRD> Are you suggesting that the IANA section indicate that all
>>> new report types MUST indicate whether or not the rate algorithm
>>> can be used with that report type?  I can make that change to the
>>> IANA section if it would be appropriate.
>>>>
>>
>> I’m not suggesting, I’m asking :-)  But my point was more along
>> whether the IANA registry for report types should include a field
>> about supporting the rate algorithm.
>
> <EN> Not sure how to address that comment.

SRD> I've added a section the to the IANA considerations section:

8.2.   New DOIC Report Types

          All DOIC report types defined in the future MUST
          indicate whether or not the rate algorithm can be used with that
          report type.

>
>
>>>> §5.1: The first paragraph indicates state should be kept for
>>>> every reacting node to which it sends an OLR. But the 5th
>>>> paragraph can be interpreted to say it sends an OLR to every
>>>> reacting node with which it has negotiated use of the rate
>>>> algorithm. (see general comment).
>>> SRD> I'm missing something.  The first paragraph says the
>>> reporting node maintains state any time it sends a rate overload
>>> report.  The fifth paragraph is just saying that the report must
>>> include the rate information.
>>
>> Actually, my question relates to my question on 5.4. The first
>> paragraph says to keep state for every reporting node to which it
>> sends an OLR. The 5th paragraph implies that it sends on OLR when
>> it selects the rate algorithm for a reacting node. (If you are
>> going to include the rate info, you have to have a report to
>> include it in.)
>>
>> So it comes down to clarity about what events happen “when the
>> reporting node selects the algorithm for a reacting node” vs “when
>> the reporting node sends an OLR”.
>>
>>>>
>>>> §5.4: The first paragraph seems to suggest the reacting node
>>>> keeps OCS for every server that has indicated support for the
>>>> rate algorithm, not just nodes that have sent OLRs. Is that the
>>>> intent?
>>> SRD>  Yes, that is the intent.  This allows the reacting node to
>>> make sure that the machinery needed to respond to a rate request
>>> is in place prior to receiving an OLR.
>>
>> See above.
>
> <EN> Agreed. Do not know what does ‘see above’ refers to.

SRD> I don't see the need for a change to the document.  If there is still confusion then it can be addressed in the next version of the document.

>
>>>> §5.6, first paragraph: The MAY seems week here. I know and
>>>> agree that we don’t want to force a particular application. But
>>>> don’t we need to say that if an implementation uses a different
>>>> algorithm, it MUST have the same behavior as the algorithm in
>>>> section 7?
>>>>
>>>> <JPG> I think it MUST "limit the message rate to the
>>>> OC-Maximum-Rate AVP value in units of messages per second" (as
>>>> stated in 7.3.1).  The algorithm described in the rest of 7.3.1
>>>> and 7.3.2 is somewhat more sophisticated, allowing for a
>>>> smoothing factor (TAU) and prioritization.  I do not think we
>>>> need  to say that the selected algorithm MUST have those
>>>> features.<JPG>
>>> SRD> Again, I agree with Janet.
>>
>> Okay.
>
> <EN> Agreed.
>
>
>>>> §7.2, third and 4th paragraphs: I don’t understand what this is
>>>> trying to say. Please elaborate.
>>>>
>>>> <JPG>3rd para - Just as a "for instance"- if the reacting node
>>>> has 50/second low priority messages and 50/second high priority
>>>> messages that it want to send, and has a rate limit of
>>>> 75/second, it will send 25/second low priority messages and 50
>>>> /second high priority messages.  The limit of 75/second applies
>>>> to the combined stream of high and low priority messages, even
>>>> though only the low priority messages are being abated.<JPG >
>>>>
>>>> <JPG> 4th para - in the same example, it could be that the high
>>>> priority messages typically require more processing resources
>>>> (cpu, etc) than the low priority messages (or vice versa).  So
>>>> cutting the rate to 75/sec may NOT produce the expected
>>>> reduction in resource usage.<JPG>
>>> SRD> Thanks Janet, I couldn't have explained it better.
>>
>> Janet’s explanation is good, but I don’t get that from the text, at
>> least in the case of the third paragraph.
>>
>> In the 4th paragraph, I don’t understand how the reporting node
>> would “take into account the workload” in a useful way. That seems
>> to suggest the reporting node can predict the impact of workload
>> based decisions made by the reacting node, which seems unlikely
>> unless there is some out-of-band agreement.
>>
>> Is this really saying anything more than “The reacting nodes will
>> decide which messages to send and which to drop, and the result may
>> not be predictable by the reporting node”?
>
> <EN>  Computation of maximum rate per reacting node modulated with
> its measured workload (message type frequency) improves throttling
> when reacting nodes prioritize messages.
>
>>>> -6th paragraph: “  may receive requests at a rate below its
>>>> target maximum Diameter  request rate while others above that
>>>> target rate.  But the resulting request rate presented to the
>>>> overloaded reporting node will converge towards the target
>>>> Diameter request rate.”
>>>>
>>>> Why do we expect traffic to converge to the rate limit? It
>>>> seems like that won't happen if some reporting nodes are not
>>>> sending at full capacity, unless work can be shifted from the
>>>> high-rate sources to the slow-rate ones.
>>>>
>>>> <JPG> Probably would be better to say that it "will converge
>>>> toward a rate at or below the target Diameter request
>>>> rate.”<JPG>
>>> SRD> I'm okay with making Janet's suggested change.
>>
>> WFM.
>
> <EN>  Agreed.
>
>>>> §7.3.1: paragraph starting with “ In situations where reacting
>>>> nodes are configured with some knowledge”
>>>>
>>>> that requires knowledge of other traffic sources, not just
>>>> knowledge of the reporting node.
>>>>
>>>> The example code says to transmit a message if (Xp <= TAU). But
>>>> the text said the limit was “T+TAU).
>>>>
>>>> <JPG> I think it is supposed to be "T+TAU"<JPG>
>>> SRD> I'd like to get Eric's opinion on this.  This section was
>>> copied from the SOC RFC so if it is in error here than it is in
>>> error there as well.
>>
>> I agree with getting Eric’s opinion :-)
>
> <EN>  Went back to IETF document  Xp <= TAU is correct (continuous
> state leaky bucket)
>
>>>> §9: I think the security considerations need more thought. What
>>>> are the security considerations specific to the rate algorithm?
>>>> If there aren’t any, then please describe the rational behind
>>>> that. But I suspect there are, for example, can this be used
>>>> for a DoS? Can it be used to help _mitigate_ a DoS? Could one
>>>> reacting node cause others to be traffic starved?
>>>>
>>>> <JPG>It is possible that a reacting node that does not support
>>>> overload control could starve the nodes that do support
>>>> overload control, but this is also true of the loss based
>>>> version<JPG>
>>> SRD> I'm not convinced that there are security scenarios that are
>>> new or different for rate versus those documented in the existing
>>> DOIC specifications.
>>
>> This draft adds a new feature that isn’t in the base mechanism. I
>> gather your point is to say that the new feature shares the same
>> security considerations as the base, and adds no new ones. Right
>> now, section 9 only states the former.
>
> <EN>  Computation
>
>>>> Editorial Comments:
>>>>
>>>> General: IDNits returns several issues. Some of those may be
>>>> errors on its part, but I’m pretty sure some of them are real.
>>>> Please resolve these.
>>> SRD> I'll look at those next time I try to submit but I've not
>>> gotten IDNits errors in the past.
>>>
>>> SRD> I've made the suggested changes below unless indicated
>>> otherwise.
>>>>
>>
>> … and I’ve deleted sections that seem resolved.
>>
>> […]
>
> <EN>  I think comment resolved
>
>>>> §5.1, third paragraph: The text is not clear whether this means
>>>> OCS should be maintained per supported application, etc, or
>>>> that it should maintain state when the rate algorithm on a per
>>>> supported application, etc, basis.
>>> SRD> I don't understand the point being made here.
>>
>> Probably because I failed to make it.
>>
>> I _think_ that the reporting node keeps state for each reporting
>> node for which it selects rate, and further needs to subdivide that
>> state by application and report type. But the it doesn’t need to
>> keep state for a reporting node for which it doesn’t select rate,
>> even though that reporting node might have an application in common
>> with the first reporting node?
>>
>> […]
>
> <EN>  Reporting node maintain state for each reacting node that
> executes the rate algorithm.
>
>>>> §6.1.1, definition of " OLR_RATE_ALGORITHM”: Two periods at end
>>>> of sentence.
>>>>
>>> SRD> I am hesitant to change any of the test in section 7 given
>>> that it is taken from the SOC specification.  I would prefer that
>>> Eric comment on these proposed changes before including them in
>>> the next version.
>>
>> Makes sense.
>
> <EN>  Typo, please remove extra period
>
>>>> §7.1, 2nd paragraph: “ signal one another support for
>>>> rate-based overload control”: This seems awkward; are there
>>>> missing words?
>
> <EN>  Looks ok to me , but I am not an English native speaker.
>
>>>> §7.2, last two paragraphs: The MUSTs do not seem necessary.
>>>> 2119 keywords should be used when there is some sort of choice
>>>> or room for error. You don’t need them to define the basic
>>>> operation of the protocol.
>
> <EN>  OK with me
>
>>>> §7.3.1: I found the text hard to follow. It would help to
>>>> declare all the identifiers and initialization up front, and to
>>>> present things in more of a stepwise fashion.
>>>>
>>>> - T is effectively a time interval, right? It would help to say
>>>> that, especially later when you subtract a different time
>>>> interval from it.
>>>>
>>>> - paragraph 9: Should “admit” be “emit”?
>>>>
>>>> - the example code has several mentions of SIP requests.
>
> <EN>   T  = 1/[OC-Maximum-Rate] is the target inter-Diameter request
> interval  per 7.3.1 paragraph 1. “admit” usage is correct SIP should
> be replace with Diameter
>
>>>> §7.3.2: “ Request candidates for reduction, requests not
>>>> subject to reduction (except under extenuating circumstances
>>>> when there aren’t any messages in the first category that can
>>>> be reduced).”: That seems like an awkward way to say that the
>>>> second category is the set of requests that is only subject to
>>>> reduction if there are no messages left in the first category.
>>>>
>>>> <JPG> Yes, that is what it means.<JPG>
>>>>
>>>> - “ This can be generalized to n priorities using n thresholds
>>>> for n>2 in the obvious way.”: I suggest you refrain from
>>>> calling it “obvious".
>
> <EN>    Agreed
>
>>>> §7.3.3: Paragraph starting with “ Then (only) if the arrival is
>>>> admitted, increase the bucket by an amount…”: I think you
>>>> increase the bucket _count_, right?
>
>
>
> <EN>   Correct
>
>
>
> Thanks,
>
>
>
> Eric Noel
>
> *AT&T Labs, Inc.* /Rethink Possible/
>
>
>
> Optimization, Reliability and Customer Analytics
>
> 200 South Laurel Avenue, D5-3C38 Middletown, NJ 07748 P:
> 732.420.4174
>
> ecnoel@att.com<mailto:ecnoel@att.com> <mailto:jsmith@att.com><mailto:jsmith@att.com>
>
>
>
> *From:*Steve Donovan [mailto:srdonovan@usdonovans.com] *Sent:*
> Monday, August 06, 2018 12:42 PM *To:* NOEL, ERIC C <en5192@att.com><mailto:en5192@att.com>
> *Subject:* Fwd: Re: [Dime] draft-ietf-dime-doic-rate-control-08
>
>
>
> Eric,
>
> If not I will make the necessary changes.  I would prefer that you
> handle the comments on the actual rate algorithm but, if you are not
> able then I will make the necessary changes.
>
> Regards,
>
> Steve
>
>
>
> -------- Forwarded Message --------
>
> *Subject: *
>
>
>
> Re: [Dime] draft-ietf-dime-doic-rate-control-08
>
> *Date: *
>
>
>
> Tue, 26 Jun 2018 16:08:56 -0500
>
> *From: *
>
>
>
> Ben Campbell <ben@nostrum.com><mailto:ben@nostrum.com> <mailto:ben@nostrum.com><mailto:ben@nostrum.com>
>
> *To: *
>
>
>
> Steve Donovan <srdonovan@usdonovans.com><mailto:srdonovan@usdonovans.com>
> <mailto:srdonovan@usdonovans.com><mailto:srdonovan@usdonovans.com>
>
> *CC: *
>
>
>
> ecnoel@research.att.com<mailto:ecnoel@research.att.com> <mailto:ecnoel@research.att.com><mailto:ecnoel@research.att.com>,
> dime@ietf.org<mailto:dime@ietf.org> <mailto:dime@ietf.org><mailto:dime@ietf.org>
>
>
>
> <bump>
>
>> On Jun 14, 2018, at 4:38 PM, Ben Campbell <ben@nostrum.com><mailto:ben@nostrum.com>
>> <mailto:ben@nostrum.com><mailto:ben@nostrum.com> wrote:
>>
>> Hi,
>>
>> See my responses inline. I removed sections that seem resolved.
>>
>> Thanks,
>>
>> Ben.
>>
>>> On Jun 13, 2018, at 12:30 PM, Steve Donovan
>>> <srdonovan@usdonovans.com><mailto:srdonovan@usdonovans.com> <mailto:srdonovan@usdonovans.com><mailto:srdonovan@usdonovans.com>
>>> wrote:
>>>
>>> See my comments inline.
>>>
>>> Steve
>>>
>>> On 5/25/18 4:17 PM, Gunn, Janet P (CNV) wrote:
>>>> Not an author, but I have a strong interest in this ID.
>>>> Comments in line. Janet
>>>>
>>>> -----Original Message----- From: DiME <dime-bounces@ietf.org><mailto:dime-bounces@ietf.org>
>>>> <mailto:dime-bounces@ietf.org><mailto:dime-bounces@ietf.org> On Behalf Of Ben Campbell Sent:
>>>> Wednesday, May 16, 2018 1:31 AM To:
>>>> draft-ietf-dime-doic-rate-control.all@ietf.org<mailto:draft-ietf-dime-doic-rate-control.all@ietf.org>
>>>> <mailto:draft-ietf-dime-doic-rate-control.all@ietf.org><mailto:draft-ietf-dime-doic-rate-control.all@ietf.org> Cc:
>>>> dime@ietf.org<mailto:dime@ietf.org> <mailto:dime@ietf.org><mailto:dime@ietf.org> Subject: [Dime]
>>>> draft-ietf-dime-doic-rate-control-08
>>>>
>>>> Substantive Comments:
>>>>
>>>> General: The document seems inconsistent about whether rate
>>>> limits are only reported during overload conditions, or in
>>>> advance of overload conditions.
>>>>
>>>> <JPG> I think that would be "local policy" of the serving
>>>> (reporting) node, and independent of the protocol used to
>>>> communicate it.  I think most cases would be reactive, but I
>>>> can see situations where it could be proactive.<JPG>
>>> SRD> I agree with Janet, when the report is sent is very much
>>> local policy.  There is no reason to attempt to prevent proactive
>>> use of the rate mechanism.
>>
>> That’s fine with me, but a sentence or two to that effect would be
>> helpful. (On re-reading, I see where section 1 talks about
>> “approaching overload or overloaded”, but I assume neither of those
>> conditions are necessary?)
>>
>>>>
>>>> I’d like to see the need to allocate the rate limit across all
>>>> potential sources of traffic given some more emphasis. (Maybe a
>>>> sub-section of its own?)
>>>>
>>>> <JPG> I agree, but again I see that as "local policy" of the
>>>> serving (reporting) node. In particular, there may be reacting
>>>> nodes that do not support the rate abatement algorithm.<JPG>
>>> SRD> Again, I agree with Janet.  This is local policy and there
>>> may well be a mix of rate and loss if not all nodes support rate.
>>> I don't think it is appropriate to say that rate should be
>>> preferred over loss.  But maybe I'm missing your meaning on
>>> "allocate the rate limit”.
>>
>> Okay, but you still have to allocate the rate limit across all
>> sources that you apply the rate algorithm to, right? That is, the
>> total offered rate will be something like (average rate per source)
>> * (number of sources). Or am I misunderstanding something?
>>
>>
>>>>
>>>>
>>>> §1: - “ While this can effectively decrease the load handled by
>>>> the server, it does not directly address cases where the rate
>>>> of arrival of service requests increases quickly."
>>>>
>>>> I think it fails to address cases where the load changes
>>>> rapidly in either direction, right? At least, the following
>>>> text seems to say that.
>>>>
>>>> <JPG> I agree.  When there are rapid fluctuations in the
>>>> offered load, the "loss" algorithm errs both in  throttling TOO
>>>> MUCH when there is a dip in offered load, and throttling NOT
>>>> ENOUGH when there is a spike in offered load.<JPG>
>>> SRD> The text in section 1 talks about this already.  Is there a
>>> specific change being suggested?
>>
>> Section 1 talks about rapidly increasing load. Did I miss mention
>> of rapidly _decreasing_ load?
>>
>>>>
>>>>
>>>>
>>>> §3: Does the need for future report types to consider the rate
>>>> algorithm have IANA implications?
>>> SRD> Are you suggesting that the IANA section indicate that all
>>> new report types MUST indicate whether or not the rate algorithm
>>> can be used with that report type?  I can make that change to the
>>> IANA section if it would be appropriate.
>>>>
>>
>> I’m not suggesting, I’m asking :-)  But my point was more along
>> whether the IANA registry for report types should include a field
>> about supporting the rate algorithm.
>>
>>
>>>> §5.1: The first paragraph indicates state should be kept for
>>>> every reacting node to which it sends an OLR. But the 5th
>>>> paragraph can be interpreted to say it sends an OLR to every
>>>> reacting node with which it has negotiated use of the rate
>>>> algorithm. (see general comment).
>>> SRD> I'm missing something.  The first paragraph says the
>>> reporting node maintains state any time it sends a rate overload
>>> report.  The fifth paragraph is just saying that the report must
>>> include the rate information.
>>
>> Actually, my question relates to my question on 5.4. The first
>> paragraph says to keep state for every reporting node to which it
>> sends an OLR. The 5th paragraph implies that it sends on OLR when
>> it selects the rate algorithm for a reacting node. (If you are
>> going to include the rate info, you have to have a report to
>> include it in.)
>>
>> So it comes down to clarity about what events happen “when the
>> reporting node selects the algorithm for a reacting node” vs “when
>> the reporting node sends an OLR”.
>>
>>>>
>>>> §5.4: The first paragraph seems to suggest the reacting node
>>>> keeps OCS for every server that has indicated support for the
>>>> rate algorithm, not just nodes that have sent OLRs. Is that the
>>>> intent?
>>> SRD>  Yes, that is the intent.  This allows the reacting node to
>>> make sure that the machinery needed to respond to a rate request
>>> is in place prior to receiving an OLR.
>>
>> See above.
>>
>>
>>>>
>>>> §5.6, first paragraph: The MAY seems week here. I know and
>>>> agree that we don’t want to force a particular application. But
>>>> don’t we need to say that if an implementation uses a different
>>>> algorithm, it MUST have the same behavior as the algorithm in
>>>> section 7?
>>>>
>>>> <JPG> I think it MUST "limit the message rate to the
>>>> OC-Maximum-Rate AVP value in units of messages per second" (as
>>>> stated in 7.3.1).  The algorithm described in the rest of 7.3.1
>>>> and 7.3.2 is somewhat more sophisticated, allowing for a
>>>> smoothing factor (TAU) and prioritization.  I do not think we
>>>> need  to say that the selected algorithm MUST have those
>>>> features.<JPG>
>>> SRD> Again, I agree with Janet.
>>
>> Okay.
>>
>>
>>>>
>>>> §7.2, third and 4th paragraphs: I don’t understand what this is
>>>> trying to say. Please elaborate.
>>>>
>>>> <JPG>3rd para - Just as a "for instance"- if the reacting node
>>>> has 50/second low priority messages and 50/second high priority
>>>> messages that it want to send, and has a rate limit of
>>>> 75/second, it will send 25/second low priority messages and 50
>>>> /second high priority messages.  The limit of 75/second applies
>>>> to the combined stream of high and low priority messages, even
>>>> though only the low priority messages are being abated.<JPG >
>>>>
>>>> <JPG> 4th para - in the same example, it could be that the high
>>>> priority messages typically require more processing resources
>>>> (cpu, etc) than the low priority messages (or vice versa).  So
>>>> cutting the rate to 75/sec may NOT produce the expected
>>>> reduction in resource usage.<JPG>
>>> SRD> Thanks Janet, I couldn't have explained it better.
>>
>> Janet’s explanation is good, but I don’t get that from the text, at
>> least in the case of the third paragraph.
>>
>> In the 4th paragraph, I don’t understand how the reporting node
>> would “take into account the workload” in a useful way. That seems
>> to suggest the reporting node can predict the impact of workload
>> based decisions made by the reacting node, which seems unlikely
>> unless there is some out-of-band agreement.
>>
>> Is this really saying anything more than “The reacting nodes will
>> decide which messages to send and which to drop, and the result may
>> not be predictable by the reporting node”?
>>
>>
>>
>>>>
>>>>
>>>> -6th paragraph: “  may receive requests at a rate below its
>>>> target maximum Diameter  request rate while others above that
>>>> target rate.  But the resulting request rate presented to the
>>>> overloaded reporting node will converge towards the target
>>>> Diameter request rate.”
>>>>
>>>> Why do we expect traffic to converge to the rate limit? It
>>>> seems like that won't happen if some reporting nodes are not
>>>> sending at full capacity, unless work can be shifted from the
>>>> high-rate sources to the slow-rate ones.
>>>>
>>>> <JPG> Probably would be better to say that it "will converge
>>>> toward a rate at or below the target Diameter request
>>>> rate.”<JPG>
>>> SRD> I'm okay with making Janet's suggested change.
>>
>> WFM.
>>
>>
>>>>
>>>> §7.3.1: paragraph starting with “ In situations where reacting
>>>> nodes are configured with some knowledge”
>>>>
>>>> that requires knowledge of other traffic sources, not just
>>>> knowledge of the reporting node.
>>>>
>>>> The example code says to transmit a message if (Xp <= TAU). But
>>>> the text said the limit was “T+TAU).
>>>>
>>>> <JPG> I think it is supposed to be "T+TAU"<JPG>
>>> SRD> I'd like to get Eric's opinion on this.  This section was
>>> copied from the SOC RFC so if it is in error here than it is in
>>> error there as well.
>>
>> I agree with getting Eric’s opinion :-)
>>
>>
>>>>
>>>> §9: I think the security considerations need more thought. What
>>>> are the security considerations specific to the rate algorithm?
>>>> If there aren’t any, then please describe the rational behind
>>>> that. But I suspect there are, for example, can this be used
>>>> for a DoS? Can it be used to help _mitigate_ a DoS? Could one
>>>> reacting node cause others to be traffic starved?
>>>>
>>>> <JPG>It is possible that a reacting node that does not support
>>>> overload control could starve the nodes that do support
>>>> overload control, but this is also true of the loss based
>>>> version<JPG>
>>> SRD> I'm not convinced that there are security scenarios that are
>>> new or different for rate versus those documented in the existing
>>> DOIC specifications.
>>
>> This draft adds a new feature that isn’t in the base mechanism. I
>> gather your point is to say that the new feature shares the same
>> security considerations as the base, and adds no new ones. Right
>> now, section 9 only states the former.
>>
>>>>
>>>> Editorial Comments:
>>>>
>>>> General: IDNits returns several issues. Some of those may be
>>>> errors on its part, but I’m pretty sure some of them are real.
>>>> Please resolve these.
>>> SRD> I'll look at those next time I try to submit but I've not
>>> gotten IDNits errors in the past.
>>>
>>> SRD> I've made the suggested changes below unless indicated
>>> otherwise.
>>>>
>>
>> … and I’ve deleted sections that seem resolved.
>>
>> […]
>>
>>
>>>>
>>>> §5.1, third paragraph: The text is not clear whether this means
>>>> OCS should be maintained per supported application, etc, or
>>>> that it should maintain state when the rate algorithm on a per
>>>> supported application, etc, basis.
>>> SRD> I don't understand the point being made here.
>>
>> Probably because I failed to make it.
>>
>> I _think_ that the reporting node keeps state for each reporting
>> node for which it selects rate, and further needs to subdivide that
>> state by application and report type. But the it doesn’t need to
>> keep state for a reporting node for which it doesn’t select rate,
>> even though that reporting node might have an application in common
>> with the first reporting node?
>>
>> […]
>>
>>
>>>>
>>>> §6.1.1, definition of " OLR_RATE_ALGORITHM”: Two periods at end
>>>> of sentence.
>>>>
>>> SRD> I am hesitant to change any of the test in section 7 given
>>> that it is taken from the SOC specification.  I would prefer that
>>> Eric comment on these proposed changes before including them in
>>> the next version.
>>
>> Makes sense.
>>
>>>> §7.1, 2nd paragraph: “ signal one another support for
>>>> rate-based overload control”: This seems awkward; are there
>>>> missing words?
>>>>
>>>> §7.2, last two paragraphs: The MUSTs do not seem necessary.
>>>> 2119 keywords should be used when there is some sort of choice
>>>> or room for error. You don’t need them to define the basic
>>>> operation of the protocol.
>>>>
>>>> §7.3.1: I found the text hard to follow. It would help to
>>>> declare all the identifiers and initialization up front, and to
>>>> present things in more of a stepwise fashion.
>>>>
>>>> - T is effectively a time interval, right? It would help to say
>>>> that, especially later when you subtract a different time
>>>> interval from it.
>>>>
>>>> - paragraph 9: Should “admit” be “emit”?
>>>>
>>>> - the example code has several mentions of SIP requests.
>>>>
>>>> §7.3.2: “ Request candidates for reduction, requests not
>>>> subject to reduction (except under extenuating circumstances
>>>> when there aren’t any messages in the first category that can
>>>> be reduced).”: That seems like an awkward way to say that the
>>>> second category is the set of requests that is only subject to
>>>> reduction if there are no messages left in the first category.
>>>>
>>>> <JPG> Yes, that is what it means.<JPG>
>>>>
>>>> - “ This can be generalized to n priorities using n thresholds
>>>> for n>2 in the obvious way.”: I suggest you refrain from
>>>> calling it “obvious".
>>>>
>>>> §7.3.3: Paragraph starting with “ Then (only) if the arrival is
>>>> admitted, increase the bucket by an amount…”: I think you
>>>> increase the bucket _count_, right?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This electronic message transmission contains information from
>>>> CSRA that may be attorney-client privileged, proprietary or
>>>> confidential. The information in this message is intended only
>>>> for use by the individual(s) to whom it is addressed. If you
>>>> believe you have received this message in error, please contact
>>>> me immediately and be aware that any use, disclosure, copying
>>>> or distribution of the contents of this message is strictly
>>>> prohibited. NOTE: Regardless of content, this email shall not
>>>> operate to bind CSRA to any order or other contract unless
>>>> pursuant to explicit written agreement or government initiative
>>>> expressly permitting the use of email for such purpose.
>>>> _______________________________________________ DiME mailing
>>>> list DiME@ietf.org<mailto:DiME@ietf.org> <mailto:DiME@ietf.org><mailto:DiME@ietf.org>
>>>> https://www.ietf.org/mailman/listinfo/dime<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dime&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=SKkl9o9SIAeqd7FMlTEbQQ&m=UibRRRVW7rkRT3DUGwGwbauOt9s4BWEwCrGXKEcraBE&s=gsOe8rkPMVdmrlkxfNHyVCsFW3UxmsXEzdd3qHPBvI8&e=>
>>>
>>> _______________________________________________ DiME mailing
>>> list DiME@ietf.org<mailto:DiME@ietf.org> <mailto:DiME@ietf.org><mailto:DiME@ietf.org>
>>> https://www.ietf.org/mailman/listinfo/dime<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_dime&d=DwMDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=SKkl9o9SIAeqd7FMlTEbQQ&m=UibRRRVW7rkRT3DUGwGwbauOt9s4BWEwCrGXKEcraBE&s=gsOe8rkPMVdmrlkxfNHyVCsFW3UxmsXEzdd3qHPBvI8&e=>
>>
>>
>>
>
>