[Pce] Benjamin Kaduk's Discuss on draft-ietf-pce-stateful-pce-auto-bandwidth-11: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 17 September 2019 19:34 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: pce@ietf.org
Delivered-To: pce@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 66E761209AF; Tue, 17 Sep 2019 12:34:34 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-pce-stateful-pce-auto-bandwidth@ietf.org, Adrian Farrel <adrian@olddog.co.uk>, pce-chairs@ietf.org, adrian@olddog.co.uk, pce@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.101.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <156874887441.17499.7855633347557404477.idtracker@ietfa.amsl.com>
Date: Tue, 17 Sep 2019 12:34:34 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/pce/VO-c5JHCwtQ-GIuEDA3K6Ku4Kgk>
Subject: [Pce] Benjamin Kaduk's Discuss on draft-ietf-pce-stateful-pce-auto-bandwidth-11: (with DISCUSS and COMMENT)
X-BeenThere: pce@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Path Computation Element <pce.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pce>, <mailto:pce-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/pce/>
List-Post: <mailto:pce@ietf.org>
List-Help: <mailto:pce-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pce>, <mailto:pce-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Sep 2019 19:34:35 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-pce-stateful-pce-auto-bandwidth-11: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-pce-stateful-pce-auto-bandwidth/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

This document seems generally useful, and thanks for it.  I think
there's a couple places where it's ambiguous or unclear, and hopefully
we can tighten up the language without needing extensive changes.

There may be an internal inconsistency in Section 2.3's definitions
relating the Overflow-Count and the Overflow-Threshold: the description
of Overflow-Count reads as if it is a reported quantity, but the
description of Overflow-Threshold implies that Overflow-Count is a
configured parameter.  (Similarly for the Underflow- variants.)
Specifically, the sentence "[t]his value indicates how many times
consecutively, the percentage or absolute difference between the current
MaxAvgBw and the current bandwidth reservation of the LSP is greater
than or equal to the Overflow-Threshold value" uses the descriptive
verb "is" as opposed to a conditional such as "needs to be [...] in
order to [...]".

Section 5.2.3.2 says:

   If the percentage difference between the current MaxAvgBw and the
   current bandwidth reservation is greater than or less than or equal
   to the threshold percentage, the LSP bandwidth is adjusted to the
   current bandwidth demand (MaxAvgBw) (as long as the difference in the
   bandwidth is at least or above the Minimum-Threshold).

As written ("greater than or less than or equal to"), this is always
true, which cannot be what was intended.  Probably we should reword to
just "greater than" and talk about the percentage absolute difference,
i.e., 100 * abs(MaxAvgBw - CurrentRes) / CurrentRes .


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

As a general note, I'm not sure the phrase "calculated bandwidth to be
adjusted" is a great fit for how it's currently being used.
Specifically, the "calculation" in question seems to just be taking the
maximum of a sliding window of per-timeslice average bandwidth values,
and this seems like a sufficiently small amount of computation that I'd
be comfortable describing it as a "measurement", for "measured bandwidth
as input to path calculation".  That is, this doesn't seem like it comes
close to the amount of computation involved in actually computing paths,
so I keep misreading what information is being sent where.

Section 1

   Over time, based on the varying traffic pattern, an LSP established
   with a certain bandwidth may require to adjust the bandwidth reserved
   in the network dynamically.  The head-end Label Switch Router (LSR)
   monitors the actual bandwidth demand of the established LSP and
   periodically computes new bandwidth.  The head-end LSR adjusts the
   bandwidth reservation of the LSP based on the computed bandwidth
   automatically.  This feature is commonly referred to as Auto-
   Bandwidth.  The Auto-Bandwidth feature is described in detail in
   Section 4 of this document.

>From just this text, I can't tell if this is describing the mechanisms of
this document or an existing Auto-Bandwidth feature that is related to
the mechanisms of this document.

   In the model considered in this document, the PCC (head-end of the
   LSP) collects the traffic rate samples flowing through the LSP and
   calculates the new adjusted bandwidth.  The PCC reports the
   calculated bandwidth to be adjusted to the PCE.  This is similar to
   the Passive stateful PCE model, while the Passive stateful PCE uses
   path request/reply mechanism, the Active stateful PCE uses
   report/update mechanism.  [...]

nit: I think the punctuation in this last sentence needs some tweaking,
as there looks to be a comma splice at present.
(It's also unclear how much we need to talk about Passive stateful PCE,
since just two paragraphs ago we say we focus on Active stateful PCE in
this document.)

   This document defines the PCEP extensions needed to support Auto-
   Bandwidth feature in a Active stateful PCE model where the LSP
   bandwidth to be adjusted is calculated on the PCC (head-end of the
   LSP). The use of PCE to calculate the bandwidth to be adjusted is out
   of scope of this document.

Maybe I'm just confused about active vs. passive in general, but I
thought that active stateful PCE involved the PCE "making the decisions"
with the PCC just providing input data.  But here we're saying that the
PCC is making the bandwidth decision, even though the rest of the LSP
computation remains on the PCE?  Or is "calculate the bandwidth to be
adjusted" referring to "evaluate the expression max_{time periods}(avg.
bandwidth per time period)" (as opposed to "compute the bandwidth
adjustment to make)?

Section 2.3

Are we assuming any relationship between the Up-Adjustment-Threshold and
the Overflow-Threshold?

      traffic demand.  If the percentage or absolute difference between
      the current MaxAvgBw and the current bandwidth reservation of the
      LSP is greater than or equal to the threshold value, the overflow
      condition is set to be met.  The LSP bandwidth is adjusted to the

nit: I think s/set to be met/said to be met/ ?  (this appears twice)

   Minimum-Threshold:  The increase or decrease of the LSP bandwidth
      should be at least or above the minimum-threshold represented as
      an absolute bandwidth value before the bandwidth adjustment for
      the LSP is made.  This threshold can be seen as a suppression
      threshold that is used along with a percentage threshold to avoid
      unnecessary auto-bandwidth adjustments and re-signaling of the LSP
      at low bandwidth values.

I can't tell from this text if this threshold is a (LSP) bandwidth
value/measurement or a bandwidth offset between configured and measured
values.  (The subsequent protocol element definitions are more clear
that it's the latter.)

Section 3

The column headings in Table 1 refer to who initiates the LSP in
question, right?

   The PCEP speaker supporting this document must have a mechanism to
   advertise the automatic bandwidth adjustment capability for both PCC-
   Initiated and PCE-Initiated LSPs.

Just checking on "must" vs. "MUST" here.

   o  It is required to identify and inform the PCC, which LSPs are
      enabled with Auto-Bandwidth feature.  Not all LSPs in some
      deployments would like their bandwidth to be dependent on the
      real-time bandwidth usage but be constant as set by the operator.

nit: I suggest NEW:

%  o  It is required to identify and inform the PCC which LSPs have
%     enabled the Auto-Bandwidth feature.  Not all LSPs in some
%     deployments would like their bandwidth to be dependent on the
%     real-time bandwidth usage; for some LSPs leaving the bandwidth
%     constant as set by the operator is preferred.

Section 4.2

   When the Auto-Bandwidth feature is enabled, the measured traffic rate
   is periodically sampled at each Sample-Interval (which can be
   configured by an operator and the default value as 5 minutes) by the
   PCC, when the PCC is the head-end node of the LSP.  The traffic rate

Do we cover what happens in the case when the PCC is not the head-end
node of the LSP?

Section 5.1

   o  The PCEP speaker that does not recognize the extensions defined in
      this document sends the PCErr message with error-type 2
      (capability not supported) as per Section 6.9 in [RFC5440].

This refers to when the actual new messages (e.g.,
AUTO-BANDWIDTH-ATTRIBUTES TLV) are being used, not merely advertising
the Auto-Bandwidth Capability TLV in the OPEN object, right?  That is,
this is descriptive language of what the core protocol spec mandates as
error handling in the presence of unrecognized input?

Section 5.1.1

nit: in the previous section we captialized this as "Auto-Bandwidth
Capability", but here (including the section heading) it is
"AUTO-BANDWIDTH-CAPABILITY".  Should they be consistent?

Section 5.2

   Future specification can define additional sub-TLVs.

I see that we see later that unrecognized sub-TLVs are ignored, but
would there be a case where we want to use a bit in the
AUTO-BANDWIDTH-CAPABILITY flags to negotiate support for a sub-TLV
(i.e., to confirm that the peer will understand it, instead of just
sending it and not knowing if it is understood)?

Section 5.2.1

It sounds like we should ignore values larger than 7 days as well
(rather than erroring out)?

Section 5.2.2, 5.2.3

I'd like to see a little bit more detail specified of the interaction
between (regular) Adjustment-{Interval,Threshold} and
Down-Adjustment-{Interval,Threshold}.  We do have this text about (e.g.)
"[t]he Adjustment-Interval sub-TLV specifies the time interval for both
upward and downward trend", which conveys the general intent, but I'm
not sure I have all the details about there being one default value,
that applies to both, and changes to the (regular) parameter continue to
apply to both, until an explicit Down- variant is seen, at which point
the control of the two parameters diverge permanently and they must
always be separately controlled.

Section 5.2.3.1

   o  Adjustment-Threshold: The absolute Adjustment-Threshold bandwidth
      value, encoded in IEEE floating point format (see

nit: I'd suggest that this is a bandwidth *difference* value (or offset),
pedantically speaking.  (The following paragraph is quite clear on the
actual usage, though, so this is nit-level.)  This comment applies to
basically all the threshold fields in the document, but I'll just make
it once, here.

Section 5.2.3.2

   o  Reserved: SHOULD be set to zero on transmission and MUST be
      ignored on receipt.

I think the formulation of "MUST be set to zero on transmission and
SHOULD be ignored on receipt" is more common.

   o  Percentage: The Adjustment-Threshold value (7 bits), encoded in
      percentage (an integer from 1 to 100).  The value 0 is considered
      to be invalid.  The default value is 5 percent.

I assume that the question of whether to allow sub-percentage
granularity came up, and don't want to revisit the WG's decision.  But
please confirm that it was discussed.

Also, are values over 100 also "invalid"?  What should the recipient do
upon receipt of an invalid value?

   If the percentage difference between the current MaxAvgBw and the
   current bandwidth reservation is greater than or less than or equal
   to the threshold percentage, the LSP bandwidth is adjusted to the
   current bandwidth demand (MaxAvgBw) (as long as the difference in the
   bandwidth is at least or above the Minimum-Threshold).

I'd suggest rewording to make the two criteria equal peers (e.g., "If X
and Y") rather than relegating the minimum-threshold to a parenthetical
that's easy to ignore.

Section 5.2.3.3

It would probably be good to reiterate here (and at other similar
locations) that this "is used to decide" only when there's a need for
different behavior in the upward and downard directions.

Also, nit-level, I think "overrides" is more appropriate than
"overwrites", since the non-Down- variant remains active for its
separate role.

Section 5.2.4.2

      per second.  The default maximum-bandwidth value is not set.

I guess effectively the default is FLOAT_MAX, inherited from RFC 5440.

Section 5.2.5.1

   o  Overflow-Threshold: The absolute Overflow-Threshold bandwidth
      value, encoded in IEEE floating point format (see
      [IEEE.754.1985]), expressed in bytes per second.  Refer to Section
      3.1.2 of [RFC3471] for a table of commonly used values.  If the
      increase of the current MaxAvgBw from the current bandwidth
      reservation is greater than or equal to the threshold value, the
      overflow condition is met.

nit: I'd suggest to s/increase/difference/, since (IIUC) the operation
here is "observed bandwidth remains above the (reservation plus)
threshold for <count> times", not "observed bandwidth increases in
successive reporting intervals".  This assumes that the
Adjustment-Interval is larger than <count> times the reporting interval,
of course.

Section 5.2.5.3

   o  Underflow-Threshold: The absolute Underflow-Threshold bandwidth
      value, encoded in IEEE floating point format (see
      [IEEE.754.1985]), expressed in bytes per second.  Refer to Section
      3.1.2 of [RFC3471] for a table of commonly used values.  If the
      decrease of the current MaxAvgBw from the current bandwidth
      reservation is greater than or equal to the threshold value, the
      underflow condition is met.

As above, I suggest s/decrease/difference/ (perhaps with an additional
indication that the magnitude of the difference is to be used, not the
signed difference).

Section 5.7

I'm not sure I understand why there's a need for an
auto-bandwidth-specific "overload" notification, as opposed to having
such messages suppressed as part of a more general overload condition.
But that doesn't mean there's not a need, of course, as I'm not very
familiar with this area!

Section 6.6

   An implementation MAY allow a limit to be placed on the rate of auto-
   bandwidth related messages sent by a PCEP speaker and received by a
   peer.  An implementation MAY also allow sending a notification when a
   PCEP speaker is overwhelmed or the rate of messages reach a
   threshold.

I don't want to revisit a WG consensus outcome, but I could see "SHOULD"
for "allow sending a notification", noting that it is "SHOULD allow
sending" and not "SHOULD send".

Section 7

   This document defines AUTO-BANDWIDTH-CAPABILITY TLV and AUTO-
   BANDWIDTH-ATTRIBUTES sub-TLVs which do not add any new security
   concerns beyond those already discussed in [RFC8231] and [RFC8281]

I'd suggest to qualify this as "do not add any substantial new security
concerns".  We did just in the previous section talk about how this
functionality could increase computational and operational load, which,
taken to an extreme, can affect availability and could be considered a
security consideration.

We could also perhaps give guidance about setting the min and max
bandwidth sub-TLVs to constrain automatic adjustment to be within a
known range and thus bound the extent of second-order effects on the
rest of the MPLS-TE domain.

Section 8.2

It might be good to say within what registry the sub-registry being
created is to be located.

Also, the last paragraph could perhaps be rephrased as "the initial
contents of the registry are empty, with all bits unassigned".

Section 9.2

I think that the RFC 7525 and 8253 references fall under the RECOMMENDED
guidance in Section 7, in which case per
https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/
they should be classified as Normative references.  (It's probably also
good to cite RFC 7525 as BCP 195 and not just RFC 7525.)

The IEEE.754 reference is also normative since we use its wire encoding
(but it could practically be considered "well-known" at this point,
given its ubiquity).