draft-ietf-rtgwg-spf-uloop-pb-statement

Chris Bowers <chrisbowers.ietf@gmail.com> Mon, 16 April 2018 20:02 UTC

Return-Path: <chrisbowers.ietf@gmail.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 30F521289B0; Mon, 16 Apr 2018 13:02:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rp_CzamtozPk; Mon, 16 Apr 2018 13:02:28 -0700 (PDT)
Received: from mail-yw0-x22f.google.com (mail-yw0-x22f.google.com [IPv6:2607:f8b0:4002:c05::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C24D81241F3; Mon, 16 Apr 2018 13:02:24 -0700 (PDT)
Received: by mail-yw0-x22f.google.com with SMTP id c9so4283131ywb.3; Mon, 16 Apr 2018 13:02:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=0N0THjsg62vAsHYlvEZgzS0JJMJHrA9WW01BtbxAtyk=; b=odSj7Z2h4aZJKr3V95zPusps5Z/lLC5nvu0wpULXiWlRxi+UwOGpsVufhN6aav4k1S ZXgZCm2EANreshbIiBS+NEVm06Z8S7r3mY3UWkaFnrYORMt7PMR1T9nuelMRlK8DrJlN 8M5BjBC83xSSqVSvl2ZHdLqQu+6SUsbD8GQ6KcjycNnVWvnU1liZnezVCbA6zDfdC0hy HzieZ/9zTHzYzv1fnRgPIy/12Bz3rs9CNjHDlDBl6P8SBMGgcM3kkOKs3l2bUFq0YA5H QEDm/ugjx/fZMwJSZtZdWaEbHcVQzs5EA4QhBZuAlSrUEd7hl3dZMX/sutR1urzM4N2d kFeA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=0N0THjsg62vAsHYlvEZgzS0JJMJHrA9WW01BtbxAtyk=; b=NrnNDYpCNwR/ApCBDdtuqNeYVEH3pZ7lrgFJgrIPX20JuiphKdE6XOuNlExWex/7fZ aXRemdMwsfqiiWomHTIseQENeuYGJxL3SyPD4X7Wp1gJgQ1xQ8PATe/hdsojHGksOBH9 G/psz2TjIkStme11M/lh4LyIhYOeRbxnXnS5yvR8JicZQBoleOrEcQ3axrkIjmY9I3RM hKeGh2sg7xfa6NeNUnWoBq+nVJlIO4LTjfiAYSjMLQq4VJEXHIZv2nYfSImleOn+JluL /waQuyAueL8weSoidEbnshl2LTL5HHForRqPxqV4ytXpR0n3M43eXSwzcjfg8cMCMGuJ TRnQ==
X-Gm-Message-State: ALQs6tBBZj7B5iW/pefo0JWodEPvXMiKGpNBiE+CGOQEB8KcjUmMG4dy 8GpUiHNErK1yjk/Bd+L4LwvW4JGvbody6JRmD2prXw==
X-Google-Smtp-Source: AIpwx4+38fGcpqwUeQGbVpsGVSg7os9xxKXwEkGBieKb8wBdlRs6SHrjE6a8wZaS7ftGB0JkOkZlPzBQdKbqdawfjr0=
X-Received: by 10.129.46.151 with SMTP id u145mr2397165ywu.341.1523908943373; Mon, 16 Apr 2018 13:02:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a25:bf81:0:0:0:0:0 with HTTP; Mon, 16 Apr 2018 13:02:02 -0700 (PDT)
From: Chris Bowers <chrisbowers.ietf@gmail.com>
Date: Mon, 16 Apr 2018 15:02:02 -0500
Message-ID: <CAHzoHbtneDuiWLpTy+pA1zYRGxQJLRx_jPz2wKLiwpN3_vZ5mQ@mail.gmail.com>
Subject: draft-ietf-rtgwg-spf-uloop-pb-statement
To: draft-ietf-rtgwg-spf-uloop-pb-statement@ietf.org, RTGWG <rtgwg@ietf.org>
Content-Type: multipart/alternative; boundary="001a114081cc3d67c80569fcb32b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/NLThRm6Jz1JDC_VKXTrNsozPl1M>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Apr 2018 20:02:31 -0000

As part of doing the shepherd write-up for this document, I did a review of
the draft.

My comments are shown below as a diff on
draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt.

They can also be viewed at:
https://github.com/cbowers/outgoing-feedback-on-ietf-drafts-2018/commit/c1c5018f857e9c7c0f4123c3de1e87041178e387

Thanks,
Chris

=============

diff --git a/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt
b/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt
index 353ce3c..3dff746 100644
--- a/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt
+++ b/draft-ietf-rtgwg-spf-uloop-pb-statement-06.txt
@@ -21,7 +21,16 @@ Abstract

    In this document, we are trying to analyze the impact of using
    different Link State IGP implementations in a single network in
-   regards of micro-loops.  The analysis is focused on the SPF triggers
+   regards of micro-loops.
+
+=======
+[CB]
+   In this document, we are trying to analyze the impact of using
+   different Link State IGP implementations in a single network, with
+   respect to micro-loops.
+
+========
+   The analysis is focused on the SPF triggers
    and SPF delay algorithm.

 Requirements Language
@@ -95,13 +104,39 @@ Table of Contents
    Link State IGP protocols are based on a topology database on which an
    SPF (Shortest Path First) algorithm like Dijkstra is implemented to
    find the optimal routing paths.
-
+
+   =====
+  [CB] proposed modified text since the Shortest Path First algorithm and
+   Djikstra algorithm are essentially synonomous.  Also propose to use
+   "consistent set of non-looping routing paths", since shortest path
routing
+   is often not optimal from a traffic engineering perspective.
+
+   [proposed text]
+   Link State IGP protocols are based on a topology database on which the
+   SPF (Shortest Path First) algorithm is run to
+   find a consistent set of non-looping routing paths.
+
+   =====
+
    Specifications like IS-IS ([RFC1195]) propose some optimizations of
    the route computation (See Appendix C.1) but not all the
    implementations are following those not mandatory optimizations.

+============
+[CB]  [proposed text]
+but not all implementations follow those non-mandatory
+optimizations.
+=============
+
    We will call "SPF trigger", the events that would lead to a new SPF
    computation based on the topology.
+
+============
+[CB]  [proposed text]
+   We will call "SPF triggers", the events that would lead to a new SPF
+   computation based on the topology.
+=============
+

    Link State IGP protocols, like OSPF ([RFC2328]) and IS-IS
    ([RFC1195]), are using multiple timers to control the router behavior
@@ -118,11 +153,27 @@ Internet-Draft                spf-microloop
      January 2018

    Some of those timers are standardized in protocol specification, some
    are not especially the SPF computation related timers.
+
+============
+[CB] [proposed text]
+   Some of those timers are standardized in protocol specification, while
some
+   are not.  The SPF computation related timers have generally remained
+   unspecified.
+=============

    For non standardized timers, implementations are free to implement it
    in any way.  For some standardized timer, we can also see that rather
    than using static configurable values for such timer, implementations
    may offer dynamically adjusted timers to help controlling the churn.
+
+============
+[CB] In the dicussion above, it is unclear about what the meaning of
"timer" is.
+Is it the numerical value of a timer?  Is it the trigger conditions and
logic
+for a timer to start or be reset?  Is the the action taken when the timer
expires?
+Perhaps the text could clarified by referring to "timer behavior" and
"timer values"
+
+=============
+

    We will call "SPF delay", the timer that exists in most
    implementations that specifies the required delay before running SPF
@@ -138,6 +189,17 @@ Internet-Draft                spf-microloop
      January 2018
    Some micro-loop mitigation techniques have been defined by IETF (e.g.
    [RFC6976], [I-D.ietf-rtgwg-uloop-delay]) but are not implemented due
    to complexity or are not providing a complete mitigation.
+
+==========
+[CB]
+This paragraph needs to be clearer.
+[proposed text]
+   Two micro-loop mitigation techniques have been defined by the IETF.
+   [RFC6976] has not been widely implemented, presumably due to the
complexity
+   of the technique.  [I-D.ietf-rtgwg-uloop-delay] has been implemented.
+   However, it does not prevent all micro-loops that can occur
+   for a given topology and failure scenario.
+==========

    In multi-vendor networks, using different implementations of a link
    state protocol may favor micro-loops creation during the convergence
@@ -185,17 +247,24 @@ Internet-Draft                spf-microloop
      January 2018
    will forward the traffic to C through B, but as B as not converged
    yet, B will loop back traffic to A, leading to a micro-loop.

+========
+[CB]
+Figure 1 and figure 4 are essentially the same topology, but the nodes
+have different names.  I think it would be much better for the reader of
this
+document to consolidate the two figures into a single figure.
+========
+
    The micro-loop appears due to the asynchronous convergence of nodes
    in a network when an event occurs.

-   Multiple factors (and combination of these factors) may increase the
+   Multiple factors (or a combination of these factors) may increase the
    probability for a micro-loop to appear:

    o  the delay of failure notification: the more B is advised of the
       failure later than A, the more a micro-loop may have a chance to
       appear.

-   o  the SPF delay: most of the implementations supports a delay for
+   o  the SPF delay: most implementations support a delay for
       the SPF computation to try to catch as many events as possible.
       If A uses an SPF delay timer of x msec and B uses an SPF delay
       timer of y msec and x < y, B would start converging after A
@@ -204,8 +273,8 @@ Internet-Draft                spf-microloop
    January 2018
    o  the SPF computation time: mostly a matter of CPU power and
       optimizations like incremental SPF.  If A computes its SPF faster
       than B, there is a chance for a micro-loop to appear.  CPUs are
-      today faster enough to consider SPF computation time as
-      negligeable (order of msec in a large network).
+      today fast enough to consider SPF computation time as
+      negligible (on the order of milliseconds in a large network).

    o  the SPF computation order: an SPF trigger can be common to
       multiple IGP areas or levels (e.g., IS-IS Level1/Level2) or for
@@ -215,8 +284,8 @@ Internet-Draft                spf-microloop
    January 2018
       done in A and B for each area/level/topology/SPF-algorithm is
       different, there is a possibility for a micro-loop to appear.

-   o  the RIB and FIB prefix insertion speed or ordering: highly
-      implementation dependant.
+   o  the RIB and FIB prefix insertion speed or ordering.  This is highly
+      dependent on the implementation.



@@ -225,22 +294,21 @@ Litkowski, et al.         Expires July 28, 2018
          [Page 4]
 Internet-Draft                spf-microloop                 January 2018


-   This document will focus on analysis SPF delay (and associated
-   triggers).
+   This document will focus on analysis of the SPF delay behavior and the
associated
+   triggers.

 3.  SPF trigger strategies

-   Depending of the change advertised in LSP/LSA, the topology may be
+   Depending on the change advertised in an LSPDU or LSA, the topology may
be
    affected or not.  An implementation may avoid running the SPF
    computation (and may only run IP reachability computation instead) if
-   the advertised change is not affecting topology.
+   the advertised change does not affect the topology.

    Different strategies exists to trigger the SPF computation:

-   1.  An implementation may always run a full SPF whatever the change
-       to process.
+   1.  An implementation may always run a full SPF for any type of change.

-   2.  An implementation may run a full SPF only when required: e.g. if
+   2.  An implementation may run a full SPF only when required.  For
example, if
        a link fails, a local node will run an SPF for its local LSP
        update.  If the LSP from the neighbor (describing the same
        failure) is received after SPF has started, the local node can
@@ -250,26 +318,28 @@ Internet-Draft                spf-microloop
      January 2018
    3.  If the topology does not change, an implementation may only
        recompute the IP reachability.

-   As pointed in Section 1, SPF optimizations are not mandatory in
-   specifications, leading to multiple strategies to be implemented.
+   As noted in Section 1, SPF optimizations are not mandatory in
+   specifications.  This has led to the implementation of
+   different strategies.

 4.  SPF delay strategies

    Implementations of link state routing protocols use different
-   strategies to delay the SPF computation.  We usually see the
-   following:
+   strategies to delay the SPF computation.  The two most
+   common SPF delay behaviors are the following.

-   1.  Two steps delay.
+   1.  Two phase delay.

    2.  Exponential backoff delay.

-   Those behavior will be explained in the next sections.
+   These behaviors are described in the following sections.

-4.1.  Two steps SPF delay
+4.1.  Two phase SPF delay

-   The SPF delay is managed by four parameters:
+   For the two phase SPF delay, the SPF delay is managed by four
parameters:

-   o  Rapid delay: amount of time to wait before running SPF.
+   o  Rapid delay: amount of time to wait before running SPF, after the
+   initial SPF trigger event.



@@ -281,13 +351,13 @@ Litkowski, et al.         Expires July 28, 2018
          [Page 5]
 Internet-Draft                spf-microloop                 January 2018


-   o  Rapid runs: amount of consecutive SPF runs that can use the rapid
-      delay.  When the amount is exceeded the delay moves to the slow
+   o  Rapid runs: the number of consecutive SPF runs that can use the rapid
+      delay.  When the number is exceeded, the delay moves to the slow
       delay value .

    o  Slow delay: amount of time to wait before running SPF.

-   o  Wait time: amount of time to wait without events before going back
+   o  Wait time: amount of time to wait without receiving SPF trigger
events before going back
       to the rapid delay.

    Example: Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec,
@@ -308,7 +378,9 @@ Internet-Draft                spf-microloop
    January 2018
            |  |   |  | || |            |
                            < wait time >

-                   Figure 2 - Two steps delay algorithm
+                   Figure 2 - Two phase delay algorithm
+
+

 4.2.  Exponential backoff

@@ -394,13 +466,20 @@ Internet-Draft                spf-microloop
      January 2018


    for delaying PRC.  We consider that E is using a SPF trigger strategy
-   that always compute Full SPF and exponential backoff strategy for SPF
+   that always computes a Full SPF for any change,  and uses the
exponential backoff strategy for SPF
    delay (start=150ms, inc=150ms, max=1s)

    We also consider the following sequence of events (note : the time
    scale does not intend to represent a real router time scale where
    jitters are introduced to all timers) :

+==========
+[CB]
+This note about jitter and time scale (or timeline) is not clear.  I
suggest describing
+it in more detail or deleting it.
+==========
+
+
    o  t0=0 ms: a prefix is declared down in the network.  We consider
       this event to happen at time=0.

@@ -487,12 +566,12 @@ Internet-Draft                spf-microloop
      January 2018
                     Route computation event time scale

    In the table above, we can see that due to discrepancies in the SPF
-   management, after multiple events (of a different type), the values
-   of the SPF delay are completely misaligned between nodes leading to
-   long micro-loops creation.
+   management, after multiple events of a different type, the values
+   of the SPF delay are completely misaligned between node S and node E,
+   leading to the creation of micro-loops.

-   The same issue can also appear with only single type of events as
-   displayed below:
+   The same issue can also appear with only a single type of event as
+   shown below:

    +--------+--------------------+------------------+------------------+
    |  Time  |   Network Event    | Router S events  | Router E events  |
@@ -587,6 +666,28 @@ Internet-Draft                spf-microloop
      January 2018

 6.  Proposed work items

+===============
+[CB]
+Since we are publishing this document after the SPF backoff algorithm
+draft is published, I think the list of three proposed work items below
will be
+confusing.  Someone reading this RFC will wonder why the
+SPF backoff algorithm RFC (which will have an earlier RFC number)
+doesn't satisfy the list of proposed work items.
+
+Perhaps this section should be renamed something like
+"Benefits of standardized SPF delay behavior", and the list of proposed
+work items should be removed.
+
+It may also make sense to explicitly say that the
+SPF backoff algorithm draft/RFC is a solution that
+satisfies this problem statement.
+And that we are publishing the document in order to
+capture the reasoning that led to that draft.  Text to this
+effect should probably go in the introduction, instead
+of this section.
+
+===============
+
    In order to enhance the current Link State IGP behavior, authors
    would encourage working on standardization of some behaviours.

@@ -603,14 +704,23 @@ Internet-Draft                spf-microloop
      January 2018

    Using the same event sequence as in figure 2, we may expect fewer
    and/or shorter micro-loops using standardized implementations.
+
+===========
+[CB] I think the text should refer to one of the previous tables and not
Figure 2.
+Figure 2 shows the two step delay algorithm.
+===========

    +--------+--------------------+------------------+------------------+
    |  Time  |   Network Event    | Router S events  | Router E events  |
    +--------+--------------------+------------------+------------------+
    |  t0=0  |    Prefix DOWN     |                  |                  |
    |  10ms  |                    | Schedule PRC (in | Schedule SPF (in |
-
-
+
+===========
+[CB]
+It seems like there is a typo here.  Presumably router E should schedule a
+PRC (not an SPF) at 10ms in this table.
+===========

 Litkowski, et al.         Expires July 28, 2018                [Page 11]
 ^L
@@ -677,13 +787,48 @@ Internet-Draft                spf-microloop
      January 2018
    +--------+--------------------+------------------+------------------+

                     Route computation event time scale
-
+
+=============
+[CB]
+I think the term "time scale" throughout this document is not the right
one.
+Perhaps the term "timeline" would be better or the phrase "sequence of
events".
+=============
+[CB]
+There are several different tables with the same caption
+"Route computation event time scale".
+Regardless of the replacement term for "time scale", it would be helpful
to make a
+distinction between the tables with each caption.  For example, this last
+table could have a caption like "Route computation when S and E use the
+same standardized behavior".
+
+==========
    As displayed above, there could be some other parameters like router
    computation power, flooding timers that may also influence micro-
    loops.  In Figure 4, we consider E to be a bit slower than S, leading
-   to micro-loop creation.  Despite of this, we expect that by aligning
+   to micro-loop creation.
+
+=================
+[CB]
+There is nothing in Figure 4 that shows that that E is slower than S.
+Perhaps it would be clearer to say something like:
+"In all of the
+examples in this document comparing the SPF timer behavior of
+router S and router E, we have made router E a bit slower than
+router S.  This can lead to microloops even when both S and E use
+a common standardized SPF behavior.
+=================
+
+
+   Despite of this, we expect that by aligning
    implementations at least on SPF trigger and SPF delay, service
    provider may reduce the number and the duration of micro-loops.
+===================
+[CB]
+"Despite of this" should read "In spite of this" or "Despite this".
+Or in this case "However" might be better.
+
+s/service provider/service providers/
+==================

 7.  Security Considerations