From: Ben Campbell <ben@nostrum.com>
Message-Id: <2C4A8F89-9FB1-483F-B160-52822F71531F@nostrum.com>
Content-Type: multipart/signed;
 boundary="Apple-Mail=_4EAE7243-9599-4BF7-824F-20D54D2BD053";
 protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 11.4 \(3445.8.2\))
Date: Tue, 26 Jun 2018 16:08:56 -0500
In-Reply-To: <8B34DBC2-3A66-43CB-A9B2-4CB51C36E062@nostrum.com>
Cc: ecnoel@research.att.com, dime@ietf.org
To: Steve Donovan <srdonovan@usdonovans.com>
References: <8FB01050-7B63-4A1B-B50A-974D0FA448C4@nostrum.com>
 <fd1b638dfc8f48b5b46b105ac40e5124@CSRRDU1EXM025.corp.csra.com>
 <06dc2172-4b2a-c0c4-e411-8928acd13a1b@usdonovans.com>
 <8B34DBC2-3A66-43CB-A9B2-4CB51C36E062@nostrum.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/tvrNQIA-CGx9YgZHRQRMkkjryuM>
Subject: Re: [Dime] draft-ietf-dime-doic-rate-control-08
Precedence: list


--Apple-Mail=_4EAE7243-9599-4BF7-824F-20D54D2BD053
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

<bump>

> On Jun 14, 2018, at 4:38 PM, Ben Campbell <ben@nostrum.com> wrote:
>=20
> Hi,
>=20
> See my responses inline. I removed sections that seem resolved.
>=20
> Thanks,
>=20
> Ben.
>=20
>> On Jun 13, 2018, at 12:30 PM, Steve Donovan =
<srdonovan@usdonovans.com> wrote:
>>=20
>> See my comments inline.
>>=20
>> Steve
>>=20
>> On 5/25/18 4:17 PM, Gunn, Janet P (CNV) wrote:
>>> Not an author, but I have a strong interest in this ID.  Comments in =
line.
>>> Janet
>>>=20
>>> -----Original Message-----
>>> From: DiME <dime-bounces@ietf.org> On Behalf Of Ben Campbell
>>> Sent: Wednesday, May 16, 2018 1:31 AM
>>> To: draft-ietf-dime-doic-rate-control.all@ietf.org
>>> Cc: dime@ietf.org
>>> Subject: [Dime] draft-ietf-dime-doic-rate-control-08
>>>=20
>>> Substantive Comments:
>>>=20
>>> General:
>>> The document seems inconsistent about whether rate limits are only =
reported during overload conditions, or in advance of overload =
conditions.
>>>=20
>>> <JPG> I think that would be "local policy" of the serving =
(reporting) node, and independent of the protocol used to communicate =
it.  I think most cases would be reactive, but I can see situations =
where it could be proactive.<JPG>
>> SRD> I agree with Janet, when the report is sent is very much local =
policy.  There is no reason to attempt to prevent proactive use of the =
rate mechanism.
>=20
> That=E2=80=99s fine with me, but a sentence or two to that effect =
would be helpful. (On re-reading, I see where section 1 talks about =
=E2=80=9Capproaching overload or overloaded=E2=80=9D, but I assume =
neither of those conditions are necessary?)
>=20
>>>=20
>>> I=E2=80=99d like to see the need to allocate the rate limit across =
all potential sources of traffic given some more emphasis. (Maybe a =
sub-section of its own?)
>>>=20
>>> <JPG> I agree, but again I see that as "local policy" of the serving =
(reporting) node. In particular, there may be reacting nodes that do not =
support the rate abatement algorithm.<JPG>
>> SRD> Again, I agree with Janet.  This is local policy and there may =
well be a mix of rate and loss if not all nodes support rate.  I don't =
think it is appropriate to say that rate should be preferred over loss.  =
But maybe I'm missing your meaning on "allocate the rate limit=E2=80=9D.
>=20
> Okay, but you still have to allocate the rate limit across all sources =
that you apply the rate algorithm to, right? That is, the total offered =
rate will be something like (average rate per source) * (number of =
sources). Or am I misunderstanding something?
>=20
>=20
>>>=20
>>>=20
>>> =C2=A71:
>>> - =E2=80=9C While this can effectively decrease the load handled by =
the
>>> server, it does not directly address cases where the rate of arrival
>>> of service requests increases quickly."
>>>=20
>>> I think it fails to address cases where the load changes rapidly in =
either direction, right? At least, the following text seems to say that.
>>>=20
>>> <JPG> I agree.  When there are rapid fluctuations in the offered =
load, the "loss" algorithm errs both in  throttling TOO MUCH when there =
is a dip in offered load, and throttling NOT ENOUGH when there is a =
spike in offered load.<JPG>
>> SRD> The text in section 1 talks about this already.  Is there a =
specific change being suggested?
>=20
> Section 1 talks about rapidly increasing load. Did I miss mention of =
rapidly _decreasing_ load?
>=20
>>>=20
>>>=20
>>>=20
>>> =C2=A73: Does the need for future report types to consider the rate =
algorithm have IANA implications?
>> SRD> Are you suggesting that the IANA section indicate that all new =
report types MUST indicate whether or not the rate algorithm can be used =
with that report type?  I can make that change to the IANA section if it =
would be appropriate.
>>>=20
>=20
> I=E2=80=99m not suggesting, I=E2=80=99m asking :-)  But my point was =
more along whether the IANA registry for report types should include a =
field about supporting the rate algorithm.
>=20
>=20
>>> =C2=A75.1: The first paragraph indicates state should be kept for =
every reacting node to which it sends an OLR. But the 5th paragraph can =
be interpreted to say it sends an OLR to every reacting node with which =
it has negotiated use of the rate algorithm. (see general comment).
>> SRD> I'm missing something.  The first paragraph says the reporting =
node maintains state any time it sends a rate overload report.  The =
fifth paragraph is just saying that the report must include the rate =
information.
>=20
> Actually, my question relates to my question on 5.4. The first =
paragraph says to keep state for every reporting node to which it sends =
an OLR. The 5th paragraph implies that it sends on OLR when it selects =
the rate algorithm for a reacting node. (If you are going to include the =
rate info, you have to have a report to include it in.)
>=20
> So it comes down to clarity about what events happen =E2=80=9Cwhen the =
reporting node selects the algorithm for a reacting node=E2=80=9D vs =
=E2=80=9Cwhen the reporting node sends an OLR=E2=80=9D.
>=20
>>>=20
>>> =C2=A75.4: The first paragraph seems to suggest the reacting node =
keeps OCS for every server that has indicated support for the rate =
algorithm, not just nodes that have sent OLRs. Is that the intent?
>> SRD>  Yes, that is the intent.  This allows the reacting node to make =
sure that the machinery needed to respond to a rate request is in place =
prior to receiving an OLR.
>=20
> See above.
>=20
>=20
>>>=20
>>> =C2=A75.6, first paragraph: The MAY seems week here. I know and =
agree that we don=E2=80=99t want to force a particular application. But =
don=E2=80=99t we need to say that if an implementation uses a different =
algorithm, it MUST have the same behavior as the algorithm in section 7?
>>>=20
>>> <JPG> I think it MUST "limit the message rate to the OC-Maximum-Rate =
AVP value in units of messages per second" (as stated in 7.3.1).  The =
algorithm described in the rest of 7.3.1 and 7.3.2 is somewhat more =
sophisticated, allowing for a smoothing factor (TAU) and prioritization. =
 I do not think we need  to say that the selected algorithm MUST have =
those features.<JPG>
>> SRD> Again, I agree with Janet.
>=20
> Okay.
>=20
>=20
>>>=20
>>> =C2=A77.2, third and 4th paragraphs: I don=E2=80=99t understand what =
this is trying to say. Please elaborate.
>>>=20
>>> <JPG>3rd para - Just as a "for instance"- if the reacting node has =
50/second low priority messages and 50/second high priority messages =
that it want to send, and has a rate limit of 75/second, it will send =
25/second low priority messages and 50 /second high priority messages.  =
The limit of 75/second applies to the combined stream of high and low =
priority messages, even though only the low priority messages are being =
abated.<JPG >
>>>=20
>>> <JPG> 4th para - in the same example, it could be that the high =
priority messages typically require more processing resources (cpu, etc) =
than the low priority messages (or vice versa).  So cutting the rate to =
75/sec may NOT produce the expected reduction in resource usage.<JPG>
>> SRD> Thanks Janet, I couldn't have explained it better.
>=20
> Janet=E2=80=99s explanation is good, but I don=E2=80=99t get that from =
the text, at least in the case of the third paragraph.
>=20
> In the 4th paragraph, I don=E2=80=99t understand how the reporting =
node would =E2=80=9Ctake into account the workload=E2=80=9D in a useful =
way. That seems to suggest the reporting node can predict the impact of =
workload based decisions made by the reacting node, which seems unlikely =
unless there is some out-of-band agreement.
>=20
> Is this really saying anything more than =E2=80=9CThe reacting nodes =
will decide which messages to send and which to drop, and the result may =
not be predictable by the reporting node=E2=80=9D?
>=20
>=20
>=20
>>>=20
>>>=20
>>> -6th paragraph: =E2=80=9C  may receive requests at a rate below its =
target maximum Diameter  request rate while others above that target =
rate.  But the resulting request rate presented to the overloaded =
reporting node will converge towards the target Diameter request =
rate.=E2=80=9D
>>>=20
>>> Why do we expect traffic to converge to the rate limit? It seems =
like that won't happen if some reporting nodes are not sending at full =
capacity, unless work can be shifted from the high-rate sources to the =
slow-rate ones.
>>>=20
>>> <JPG> Probably would be better to say that it "will converge  toward =
a rate at or below the target Diameter request rate.=E2=80=9D<JPG>
>> SRD> I'm okay with making Janet's suggested change.
>=20
> WFM.
>=20
>=20
>>>=20
>>> =C2=A77.3.1: paragraph starting with =E2=80=9C In situations where =
reacting nodes are configured with some knowledge=E2=80=9D
>>>=20
>>> that requires knowledge of other traffic sources, not just knowledge =
of the reporting node.
>>>=20
>>> The example code says to transmit a message if (Xp <=3D TAU). But =
the text said the limit was =E2=80=9CT+TAU).
>>>=20
>>> <JPG> I think it is supposed to be "T+TAU"<JPG>
>> SRD> I'd like to get Eric's opinion on this.  This section was copied =
from the SOC RFC so if it is in error here than it is in error there as =
well.
>=20
> I agree with getting Eric=E2=80=99s opinion :-)
>=20
>=20
>>>=20
>>> =C2=A79: I think the security considerations need more thought. What =
are the security considerations specific to the rate algorithm? If there =
aren=E2=80=99t any, then please describe the rational behind that. But I =
suspect there are, for example, can this be used for a DoS? Can it be =
used to help _mitigate_ a DoS? Could one reacting node cause others to =
be traffic starved?
>>>=20
>>> <JPG>It is possible that a reacting node that does not support =
overload control could starve the nodes that do support overload =
control, but this is also true of the loss based version<JPG>
>> SRD> I'm not convinced that there are security scenarios that are new =
or different for rate versus those documented in the existing DOIC =
specifications.
>=20
> This draft adds a new feature that isn=E2=80=99t in the base =
mechanism. I gather your point is to say that the new feature shares the =
same security considerations as the base, and adds no new ones. Right =
now, section 9 only states the former.
>=20
>>>=20
>>> Editorial Comments:
>>>=20
>>> General: IDNits returns several issues. Some of those may be errors =
on its part, but I=E2=80=99m pretty sure some of them are real. Please =
resolve these.
>> SRD> I'll look at those next time I try to submit but I've not gotten =
IDNits errors in the past.
>>=20
>> SRD> I've made the suggested changes below unless indicated =
otherwise.
>>>=20
>=20
> =E2=80=A6 and I=E2=80=99ve deleted sections that seem resolved.
>=20
> [=E2=80=A6]
>=20
>=20
>>>=20
>>> =C2=A75.1, third paragraph: The text is not clear whether this means =
OCS should be maintained per supported application, etc, or that it =
should maintain state when the rate algorithm on a per supported =
application, etc, basis.
>> SRD> I don't understand the point being made here.
>=20
> Probably because I failed to make it.
>=20
> I _think_ that the reporting node keeps state for each reporting node =
for which it selects rate, and further needs to subdivide that state by =
application and report type. But the it doesn=E2=80=99t need to keep =
state for a reporting node for which it doesn=E2=80=99t select rate, =
even though that reporting node might have an application in common with =
the first reporting node?
>=20
> [=E2=80=A6]
>=20
>=20
>>>=20
>>> =C2=A76.1.1, definition of " OLR_RATE_ALGORITHM=E2=80=9D: Two =
periods at end of sentence.
>>>=20
>> SRD> I am hesitant to change any of the test in section 7 given that =
it is taken from the SOC specification.  I would prefer that Eric =
comment on these proposed changes before including them in the next =
version.
>=20
> Makes sense.
>=20
>>> =C2=A77.1, 2nd paragraph: =E2=80=9C signal one another support for =
rate-based overload
>>> control=E2=80=9D: This seems awkward; are there missing words?
>>>=20
>>> =C2=A77.2, last two paragraphs: The MUSTs do not seem necessary. =
2119 keywords should be used when there is some sort of choice or room =
for error. You don=E2=80=99t need them to define the basic operation of =
the protocol.
>>>=20
>>> =C2=A77.3.1: I found the text hard to follow. It would help to =
declare all the identifiers and initialization up front, and to present =
things in more of a stepwise fashion.
>>>=20
>>> - T is effectively a time interval, right? It would help to say =
that, especially later when you subtract a different time interval from =
it.
>>>=20
>>> - paragraph 9: Should =E2=80=9Cadmit=E2=80=9D be =E2=80=9Cemit=E2=80=9D=
?
>>>=20
>>> - the example code has several mentions of SIP requests.
>>>=20
>>> =C2=A77.3.2: =E2=80=9C Request candidates for reduction, requests =
not subject to reduction (except under extenuating circumstances when =
there aren=E2=80=99t any messages in the first category that can be =
reduced).=E2=80=9D: That seems like an awkward way to say that the =
second category is the set of requests that is only subject to reduction =
if there are no messages left in the first category.
>>>=20
>>> <JPG> Yes, that is what it means.<JPG>
>>>=20
>>> - =E2=80=9C This can be generalized to n priorities using n =
thresholds for n>2 in the obvious way.=E2=80=9D: I suggest you refrain =
from calling it =E2=80=9Cobvious".
>>>=20
>>> =C2=A77.3.3: Paragraph starting with =E2=80=9C Then (only) if the =
arrival is admitted, increase the bucket by an amount=E2=80=A6=E2=80=9D: =
I think you increase the bucket _count_, right?
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>> This electronic message transmission contains information from CSRA =
that may be attorney-client privileged, proprietary or confidential. The =
information in this message is intended only for use by the =
individual(s) to whom it is addressed. If you believe you have received =
this message in error, please contact me immediately and be aware that =
any use, disclosure, copying or distribution of the contents of this =
message is strictly prohibited. NOTE: Regardless of content, this email =
shall not operate to bind CSRA to any order or other contract unless =
pursuant to explicit written agreement or government initiative =
expressly permitting the use of email for such purpose.
>>> _______________________________________________
>>> DiME mailing list
>>> DiME@ietf.org
>>> https://www.ietf.org/mailman/listinfo/dime
>>=20
>> _______________________________________________
>> DiME mailing list
>> DiME@ietf.org
>> https://www.ietf.org/mailman/listinfo/dime
>=20
>=20
>=20


--Apple-Mail=_4EAE7243-9599-4BF7-824F-20D54D2BD053
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEExW9rpd7ez4DexOFOgFZKbJXz1A0FAlsyq2gACgkQgFZKbJXz
1A2amxAAwdKDEFIYXxTB5lMMAN/Buyo+jt+LleEOm/BOzO7uEjLqW1gVFNnSHvv+
w5miXpuxAkGEpnIFswfDw8ggUeHKgcEhcEwfvhh//53n8jORc9QVPciDoqH2JYH0
+xCnvEvg8xg3mhkikrlXQT5m4qIHYKRYoQ+TpkVOi31cXBH8bl2qRebTrmmQTlgH
8hXGK3MfbX84kdTRP0GdgjKyA3yLQvUiogvsvZ3uTQmSZ5HEHEFK7Wf02ZzbP8B4
VPrAozYo5AiuDVj0wvqVV/AANo2Q3kn4ZZ3mg33rNkOG2jQOUzLym1oesb5/3dOH
VmvQaMkogpR6YdeR7gFyTe3DesKZ+GzhUqmSEDjxg/rwZuCg2aW71glR7nOt55Pr
ECwoWos8vCcNHKIj6kI5J59euiXDQzO+UBhFpqETN9ZEa1vYQlPptTJPL6LopYHy
UDMd/byU2J49CT4iijPppQCz+nBGlR43KyjQ1KJUL8Z359Zx/MTROvZ/a8SQ1Rik
df+kBtHPgoSz7DcNWK5S9BHnT8AEsuP7ofPg6tJhk2/VMweB2lktR5yYau4UvnBo
B5vkKIKUw7BZHiJJ19ydwzC4qy11bYMYP71QtGrTNiBmNvIoMNAz0clw5rQ8yjzu
NvjB2jocpq0pGYYx5ZcJpmBLAOC+SE70pzPpL9yjIodpBvGfi/0=
=RDpX
-----END PGP SIGNATURE-----

--Apple-Mail=_4EAE7243-9599-4BF7-824F-20D54D2BD053--