Re: [Dime] Benjamin Kaduk's No Objection on draft-ietf-dime-doic-rate-control-10: (with COMMENT)

Benjamin,

Thanks for the review and comments.  Please see my responses below.

Steve

On 1/23/19 1:03 PM, Benjamin Kaduk wrote:
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-dime-doic-rate-control-10: No Objection
>
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
>
>
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
>
>
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-dime-doic-rate-control/
>
>
>
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
> Thanks for this well-written document!  My comments are all essentially
> editorial in nature.
>
> One comment of a general note regards the usage of the word "indicate" --
> usually when I read "indicate" I expect that to be part of some
> protocol message or other formal data structure, but IIUC the OCS is
> entirely a local matter, so "indicating" something in the OCS could be
> equally well said as "storing" or "noting" or similar.  (I do see other
> similar usage of "indicate" in RFC 7683, so it's unclear that there are
> really any grounds for changing the usage in this document.)
SRD> I would prefer to not make changes unless there is strong feedback
for the need to not use the word indicate in the fashion it is used.
>
> Section 4
>
> nit: Saying that nodes MUST indicate support for *both* loss and rate seems
> to duplicate the requirement from RFC 7683 and would potentially complicate
> future updates.  The descriptive note about "nodes supporting the rate
> feature will support both" seems a better way to phrase things.
SRD> An update has already been made based on Ben's comments.  Let me
know if you think the following is acceptable:

        DOIC reacting nodes supporting the rate feature MUST indicate
support
        for both the loss and rate algorithms in the OC-Feature-Vector
AVP and MAY
        indicate support for other algorithms.
>
> Section 5.1
>
> Is keeping track of how much a reacting node is actually sending considered
> to not be part of the OCS (as opposed to the allocated rate, which is part
> of the OCS as noted here)?
SRD> There was been no discussion around keeping track of the actual
rate in the OCS.  It is likely that this rate will be tracked by the
rate algorithm but that is very dynamic state information.  There is
little value in storing in the OCS, which is used to build overload
report sent to the reacting node.
>
> Section 6.2
>
>    This extension does not define new overload report types.  The
>    existing report types of host and realm defined in [RFC7683] apply to
>    the rate control algorithm.  The peer report type defined in
>    [I-D.ietf-dime-agent-overload] also applies to the rate control
>    algorithm.
>
> side note: I'm curious how the directionality is such that the report type
> applies to the algorithm, as opposed to the other way around.
SRD> I'm not sure I understand the question.  There are three report
types defined -- realm, host and peer.  This is just saying that both
loss and rate apply to all three report types.  Can you clarify the
question?
>
> Section 7.1
>
>    Upon receiving the overload report with a target maximum Diameter
>    request rate, each reacting node applies abatement treatment for new
>    Diameter requests towards the reporting node.
>
> (nit?) My (hasty) reading of 7683 is that "abatement treatment" means
> either diversion or throttling, and that traffic processed normally is not
> considered to receive "abatement treatment".  If that reading is correct,
> then this text is suggesting that no new requests receive normal treatment
> after the reception of an OLR with a target rate, which does not seem quite
> right.
SRD> RFC7683 talks about overload abatement as the follows:

   Reacting nodes perform overload abatement according to an agreed-upon
   abatement algorithm.  An abatement algorithm defines the meaning of
   some of the parameters of an OLR and the procedures required for
   overload abatement.  An overload abatement algorithm separates
   Diameter requests into two sets.  The first set contains the requests
   that are to undergo overload abatement treatment of either throttling
   or diversion.  The second set contains the requests that are to be
   given normal routing treatment.  This document specifies a single
   "must-support" algorithm, namely, the "loss" algorithm (Section 6).
   Future specifications may introduce new algorithms.

I can see the confusion with using "abatement treatment" in the
paragraph you reference.  I propose changing "abatement treatment" to
"overload abatement".
> Section 7.2
>
>    Note that the value of OC-Maximum-Rate AVP (in request messages per
>    second) for the rate algorithm provides an upper bound on the traffic
>    sent by the reacting node to the reporting node.
>
> I see that this is not using normative language, and that the following
> paragraph does clarify the caveats, but "upper bound" usually is read as
> "strict upper bound", and there are several ways in which this bound could
> (at least temporarily) not be strict.  Perhaps "loose upper bound" is
> better phrasing.
SRD> Agreed.
>
> Section 7.3.1
>
> Perhaps note explicitly that "//" denotes comments?
SRD> Done.
>
>    In determining whether or not to transmit a specific message, the
>    reacting node can use any algorithm that limits the message rate to
>    the OC-Maximum-Rate AVP value in units of messages per second.  For
>    ease of discussion, we define T = 1/[OC-Maximum-Rate] as the target
>    inter-Diameter request interval.  It may be strictly deterministic,
>    or it may be probabilistic.  It may, or may not, have a tolerance
>
> nit: The intervening sentence defining 'T' seems to change the binding of
> "It" away from "the algorithm".
>    Note that when the OC-Maximum-Rate value is 0 with a non-zero OC-
>    Validity-Duration, then the reacting node should apply abatement
>    treatment to 100% of Diameter requests destined to the overloaded
>    reporting node.  However, when the OC-Validity-Duration value is 0,
>    the reacting node should stop applying abatement treatment.
>
> nit: this paragraph seems like it would be better placed elsewhere, as its
> content is independent of any particular throttling algorithm.
>
>    Reporting nodes with a very large number of reacting nodes, each with
>    a relatively small arrival rate, will generally benefit from a
>    smaller value for TAU in order to limit queuing (and hence response
>    times) at the reporting node when subjected to a sudden surge of
>    traffic from all reacting nodes.  Conversely, a reporting node with a
>    relatively small number of reacting nodes, each with proportionally
>    larger arrival rate, will benefit from a larger value of TAU.
>
> Am I correct in assuming that "larger" and "smaller" values of TAU here are
> to be measured with respect to T (i.e., as a ratio)?  This may be worth
> stating more explicitly.
SRD> I'm reluctant to change the above as they are consistent with the
wording in the SIP Overload specification.
>
> Section 8.3
>
> Do you want to add this requirement as a "Note" on the IANA registry
> itself?
SRD> I don't understand the subtlety of the question.  Do you have
suggested wording or can you explain what a "Note" on the IANA registry is?
>
> Section 9
>
> Other than what Mirja has already noted, I only have one minor remark.
>
> It seems that an attacker that can set up reacting nodes has a slightly
> different way to disrupt legitimate traffic when "rate" is used vs. "loss",
> but the details of any attack depend on implementation behavior at the
> reporting node (e.g., whether it divides its total capacity evenly amongst
> reacting nodes or uses a more complicated allocation scheme).  And since an
> attacker that can set up new reacting nodes is almost certainly able to
> send traffic from those nodes, in practice there is no substantial
> difference, so the decision to ignore this difference and just refer to the 7683 security
> considerations seems justified.
>
>
> _______________________________________________
> DiME mailing list
> DiME@ietf.org
> https://www.ietf.org/mailman/listinfo/dime