Re: [Rift] AD Review of draft-ietf-rift-rift-12 (Part 1)

Alvaro, after meetings here answer inline

On Fri, Jan 15, 2021 at 8:56 PM Alvaro Retana <aretana.ietf@gmail.com>
wrote:

>
>
>
> Part 1 includes the introductory text through the end of §4.1 (Overview).
>
> While I appreciate the overview of the topologies and the protocol, I
> think that this extended introductory material is at times too
> long/wordy and complex (for the average reader, sometimes being forced
> to make assumptions or jump ahead), but also incomplete.  To be more
> specific:
>
> - The description of the general topology (§4.1.2) gets in to significant
>   detail and complexity, including the way in which the concepts are
> depicted
>   in the figures.  (ASCII art is not the best, you might want to take
>   advantage of using SVI.)  I noticed that none of the figures/sections in
>   §4.1.2 are referenced elsewhere (but the simple Figures 2 and 3 are),
> which
>   makes me think about the value of the in-depth treatment if it will not
> be
>   explicitly considered later on.
>
>    In several places I was under the impression that a DC design guide was
>    being presented.  Which brings up the question: does RIFT require the
>    topologies to be exactly as the ones described to operate correctly?
>
> - My main expectation of the overview was to get a high-level idea of the
>   operation of RIFT, but that is not done there.  Besides a quick mention
> in
>   §4.1.1 and some text in the introduction, the focus of the overview is on
>   fallen leaves/dissaggregation.  I understand that may be a significant
>   issue/feature, but it shouldn't be the dominating topic in the overview.
>   Maybe some of the other pieces are more "well-known" (neighbor discovery,
>   flooding, etc.), but even ZTP (even if optional) is not mentioned.
>

Alvaro, after meeting of authors here multi-dimensional answer here how we
will
address that

a) Jordan will extend the document with SVG figures (it seems SVG finally
works). FSMs will add
  SVG figures which we already have but couldn't make to work before.
b) We will flatten the ToC in front and add a reader's guide to allow
people which
   parts they have to read. We have the "what part you need to implement"
in section
   but will probably
   move that up higher up in the doc.
c) The abstract will be shortened as you suggest and we will
 put that into the intro (what does the protocol do and provide)
d) I'll rewrite into third voice
d) all your comments/nits are fine & will be taken care of except the
forward references to sections in the glossary. IMO this will simply muddle
the draft and
@ certain point we will get calls for a "index" with terms and sections
which RFC does not
support and IME never served any purpose in a book. Glossary is simply
glossary, go
over terms, read the defintion, some of it may not be clear, to be held
close when reading
the following section then.

-- tony

>
>
> In general, I don't think that the deep-/complex treatment is
> necessary.  You may still decide to keep it (see specific comments
> inline below), but I think it will represent a significant distraction
> for other reviewers.
>
>
> Thanks!
>
> Alvaro.
>
> [1] https://datatracker.ietf.org/doc/ad/alvaro.retana
>
>
> [Line numbers from idnits.]
>
>
> ...
> 17      Abstract
>
> 19         This document defines a specialized, dynamic routing protocol
> for
> 20         Clos and fat-tree network topologies optimized towards
> minimization
> 21         of configuration and operational complexity.  The protocol
>
> [opinion] While this is a nice Abstract, I think that it is too long
> and is not completely reflected in the Introduction.  Personally, I
> consider the first paragraph enough for an Abstract.  It would be very
> nice if the list below was instead moved to the Introduction with
> pointers to where these protocol characteristics are
> specified/explained.
>
>
> 23         o  deals with no configuration, fully automated construction of
> fat-
> 24            tree topologies based on detection of links,
>
> 26         o  minimizes the amount of routing state held at each level,
>
> 28         o  automatically prunes and load balances topology flooding
> exchanges
> 29            over a sufficient subset of links,
>
> 31         o  supports automatic disaggregation of prefixes on link and
> node
> 32            failures to prevent black-holing and suboptimal routing,
>
> 34         o  allows traffic steering and re-routing policies,
>
> 36         o  allows loop-free non-ECMP forwarding,
>
> 38         o  automatically re-balances traffic towards the spines based on
> 39            bandwidth available and finally
>
> 41         o  provides mechanisms to synchronize a limited key-value
> data-store
> 42            that can be used after protocol convergence to e.g.
>  bootstrap
> 43            higher levels of functionality on nodes.
>
>
> ...
> 77      Table of Contents
>
> 79         1.  Authors . . . . . . . . . . . . . . . . . . . . . . . . . .
> .   6
> 80         2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . .
> .   6
> 81           2.1.  Requirements Language . . . . . . . . . . . . . . . . .
> .   8
> 82         3.  Reference Frame . . . . . . . . . . . . . . . . . . . . . .
> .   8
> 83           3.1.  Terminology . . . . . . . . . . . . . . . . . . . . . .
> .   8
> 84           3.2.  Topology  . . . . . . . . . . . . . . . . . . . . . . .
> .  13
> 85         4.  RIFT: Routing in Fat Trees  . . . . . . . . . . . . . . . .
> .  15
> 86           4.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . .
> .  16
> 87             4.1.1.  Properties  . . . . . . . . . . . . . . . . . . . .
> .  16
> 88             4.1.2.  Generalized Topology View . . . . . . . . . . . . .
> .  17
> 89               4.1.2.1.  Terminology . . . . . . . . . . . . . . . . . .
> .  17
> 90               4.1.2.2.  Clos as Crossed Crossbars . . . . . . . . . . .
> .  18
> 91             4.1.3.  Fallen Leaf Problem . . . . . . . . . . . . . . . .
> .  28
> 92             4.1.4.  Discovering Fallen Leaves . . . . . . . . . . . . .
> .  30
> 93             4.1.5.  Addressing the Fallen Leaves Problem  . . . . . . .
> .  31
> 94           4.2.  Specification . . . . . . . . . . . . . . . . . . . . .
> .  32
> 95             4.2.1.  Transport . . . . . . . . . . . . . . . . . . . . .
> .  33
> 96             4.2.2.  Link (Neighbor) Discovery (LIE Exchange)  . . . . .
> .  33
> 97               4.2.2.1.  LIE FSM . . . . . . . . . . . . . . . . . . . .
> .  36
>
> [nit] I don't think we need all this detail in the TOC.  Maybe
> limiting the entries to 2 or 3 levels is enough (e.g. 4.2 or 4.2.2).
>
>
> ...
> 256     1.  Authors
>
> 258        This work is a product of a list of individuals which are all
> to be
> 259        considered major contributors independent of the fact whether
> their
> 260        name made it to the limited boilerplate author's list or not.
>
> [minor] Please move this section to one called "Contributors" and
> place it after the Acknowledgments.  Only the people not on the front
> page should be listed there.
>
> https://tools.ietf.org/html/rfc7322#section-4.11
>
>
> ...
> 273     2.  Introduction
>
> 275        Clos [CLOS] and Fat-Tree [FATTREE] topologies have gained
> prominence
> 276        in today's networking, primarily as result of the paradigm shift
> 277        towards a centralized data-center based architecture that is
> poised
> 278        to deliver a majority of computation and storage services in the
> 279        future.  Today's current routing protocols were geared towards a
> 280        network with an irregular topology and low degree of
> connectivity
> 281        originally but given they were the only available options,
> 282        consequently several attempts to apply those protocols to Clos
> have
> 283        been made.  Most successfully BGP [RFC4271] [RFC7938] has been
> 284        extended to this purpose, not as much due to its inherent
> suitability
> 285        but rather because the perceived capability to easily modify
> BGP and
> 286        the immanent difficulties with link-state [DIJKSTRA] based
> protocols
> 287        to optimize topology exchange and converge quickly in large
> scale
> 288        densely meshed topologies.  The incumbent protocols precondition
> 289        normally extensive configuration or provisioning during bring
> up and
> 290        re-dimensioning.  This tends to be viable only for a set of
> 291        organizations with according networking operation skills and
> budgets.
> 292        For many IP fabric builders a desirable protocol would be one
> that
> 293        auto-configures itself and deals with failures and
> mis-configurations
> 294        with a minimum of human intervention only.  Such a solution
> would
> 295        allow local IP fabric bandwidth to be consumed in a 'standard
> 296        component' fashion, i.e. provision it much faster and operate
> it at
> 297        much lower costs than today, much like compute or storage is
> consumed
> 298        already.
>
> [nit] s/Fat-Tree/Fat Tree/g    To be consistent with the terminology
> section.
>
>
> ...
> 318        For the visually oriented reader, Figure 1 presents a first
> level
> 319        simplified view of the resulting information and routes on a
> RIFT
> 320        fabric.  The top of the fabric is holding in its link-state
> database
> 321        the nodes below it and the routes to them.  In the second row
> of the
> 322        database table we indicate that partial information of other
> nodes in
> 323        the same level is available as well.  The details of how this is
> 324        achieved will be postponed for the moment.  When we look at the
> 325        "bottom" of the fabric, the leaves, we see that the topology is
> 326        basically empty and they only hold a load balanced default
> route to
> 327        the next level under normal conditions.
>
> [nit] s/holding...the nodes below/holding...information about the nodes
> below
>
>
> [style nit] Some portions of the text are written in first person ("we
> indicate").  Personally, in this type of documents I prefer to not see
> that treatment ("the table indicates").  This is just a personal
> preference, a nit.  No need to take any action -- unless you really
> want to. ;-)
>
>
> [minor] "details of how this is achieved will be postponed for the
> moment."  Sure, this is just the Introduction.  A pointer to where the
> details are would be very nice.
>
>
> [nit] s/and they only hold a load balanced default route to the next
> level under normal conditions./and, under normal conditions, they only
> hold a load balanced default route to the next level.
>
>
> 329        The balance of this document details a dedicated IP fabric
> routing
> 330        protocol, fills in the specification details and ultimately
> includes
> 331        resulting security considerations.
>
> [] As I mentioned above, moving the list from the Abstract to the
> Introduction would be beneficial.  Given that this is a long document,
> providing some type of roadmap/reader's guide (based on that list, for
> example) would be great!
>
>
> ...
> 357     2.1.  Requirements Language
>
> 359        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
> NOT",
> 360        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
> this
> 361        document are to be interpreted as described in RFC 8174
> [RFC8174].
>
> [major] Use the template exactly as written in rfc8174.
>
>
> 363     3.  Reference Frame
>
> 365     3.1.  Terminology
>
> [minor] Where possible/appropriate, please add forward references to
> the sections where the terms are further specified.
>
>
> 367        This section presents the terminology used in this document.
> It is
> 368        assumed that the reader is thoroughly familiar with the terms
> and
> 369        concepts used in OSPF [RFC2328] and IS-IS
> [ISO10589-Second-Edition],
> 370        [ISO10589] as well as the according graph theoretical concepts
> of
> 371        shortest path first (SPF) [DIJKSTRA] computation and DAGs.
>
> [minor] The two references to ISO10589 look the same to me.  Do we need
> both?
>
>
> ...
> 383        Directed Acyclic Graph (DAG):  A finite directed graph with no
> 384           directed cycles (loops).  If links in Clos are considered as
> 385           either being all directed towards the top or vice versa,
> each of
> 386           such two graphs is a DAG.
>
> [nit] s/in Clos/in a Clos
>
>
> 388        Folded Spine-and-Leaf:  In case Clos fabric input and output
> stages
> 389           are analogous, the fabric can be "folded" to build a
> "superspine"
> 390           or top which we will call Top of Fabric (ToF) in this
> document.
>
> [nit] s/In case Clos/In case the Clos
>
>
> [minor] This term is only used in this section.  Does it really need
> to be defined?  I'm wondering if the terminology can be simplified.
>
>
> ...
> 401        Superspine vs. Aggregation and Spine vs. Edge/Leaf:
> 402           Traditional level names in 5-stages folded Clos for Level 2,
> 1 and
> 403           0 respectively.  We normalize this language to talk about
> top-of-
> 404           fabric (ToF), top-of-pod (ToP) and leaves.
>
> [minor] Instead of adding an entry (even if the name uses "vs.", it is
> not a comparison) for these names just to reference the normalized
> names, which are also defined later, just mention these traditional
> names in those entries.
>
>
> 406        Zero Touch Provisioning (ZTP):  Optional RIFT mechanism which
> allows
> 407           to derive node levels automatically based on minimum
> configuration
> 408           (only ToF property has to be provisioned on according nodes).
>
> [?] I can't parse the text in parenthesis.
>
>
> 410        Point of Delivery (PoD):  A self-contained vertical slice or
> subset
> 411           of a Clos or Fat Tree network containing normally only level
> 0 and
> 412           level 1 nodes.  A node in a PoD communicates with nodes in
> other
> 413           PoDs via the Top-of-Fabric.  We number PoDs to distinguish
> them
> 414           and use PoD #0 to denote "undefined" PoD.
>
> [minor] "level 0 and level 1"   The definition of level doesn't
> mention numbers -- maybe add something there to point at the fact that
> level 0 and a leaf are equivalent (position-wise).
>
>
> ...
> 429        Leaf:  A node without southbound adjacencies.  Its level is 0
> (except
> 430           cases where it is deriving its level via ZTP and is running
> 431           without LEAF_ONLY which will be explained in Section 4.2.7).
>
> [minor] s/(except...Section 4.2.7)./(see Section 4.2.7).
>
>
> 433        Top-of-fabric Plane or Partition:  In large fabrics
> top-of-fabric
> 434           switches may not have enough ports to aggregate all switches
> south
> 435           of them and with that, the ToF is 'split' into multiple
> 436           independent planes.  Introduction and Section 4.1.2 explains
> the
> 437           concept in more detail.  A plane is subset of ToF nodes that
> see
> 438           each other through south reflection or E-W links.
>
> [minor] "Introduction..."   I didn't see related text there.
>
>
> [nit] s/is subset/is a subset
>
>
> 440        Radix:  A radix of a switch is basically number of switching
> ports it
> 441           provides.  It's sometimes called fanout as well.
>
> [nit] s/A radix of a switch is basically number of switching ports it
> provides./The number of switching ports it provides.
>
>
> ...
> 460        East-West Link:  A link between two nodes at the same level.
> East-
> 461           West links are normally not part of Clos or "fat-tree"
> topologies.
>
> [minor] s/East-West Link/East-West (E-W) Link
>
>
> [minor] "...normally not part of Clos or "fat-tree" topologies."   But
> they are used by RIFT in several places.  To avoid confusion maybe the
> last sentence is not needed.
>
>
> ...
> 476        South Reflection:  Often abbreviated just as "reflection" it
> defines
> 477           a mechanism where South Node TIEs are "reflected" from the
> level
> 478           south back up north to allow nodes in the same level without
> E-W
> 479           links to "see" each other's node TIEs.
>
> [nit] s/"reflection" it/"reflection", it
>
>
> [minor] Please expand TIE.  I know the definition is in the very next
> paragraph, but it is good practice to expand on first use.
>
>
> ...
> 490        Node TIE:  This stands as acronym for a "Node Topology
> Information
> 491           Element" that contains all adjacencies the node discovered
> and
> 492           information about node itself.  Node TIE should NOT be
> confused
> 493           with a North TIE since "node" defines the type of TIE rather
> than
> 494           its direction.
>
> [nit] s/This stands as acronym for a "Node Topology Information
> Element" that contains/An acronym for a "Node Topology Information
> Element", which contains
>
>
> [nit] s/about node/about the node
>
>
> [minor] s/NOT/not/g   This is not one of the rfc2119 keywords, so it
> should not be capitalized.  I know you're doing it for emphasis, but
> it will be eventually changed -- so let's take care of it now.
>
>
> ...
> 501        Key Value TIE:  A South TIE that is carrying a set of key value
> pairs
> 502           [DYNAMO].  It can be used to distribute information in the
> 503           southbound direction within the protocol.
>
> [minor] "Key Value TIE" is not used anywhere else in the document.
> Also, the definition talks about a South TIE carrying (only? -- that's
> what the definition sounds like) key value pairs...but §4.2.3.2
> mentions other information and even North TIEs carrying key value
> pairs.
>
>
> 505        TIDE:  Topology Information Description Element, equivalent to
> CSNP
> 506           in ISIS.
>
> [minor] Please expand CSNP.
>
>
> 508        TIRE:  Topology Information Request Element, equivalent to PSNP
> in
> 509           ISIS.  It can both confirm received and request missing TIEs.
>
> [minor] Please expand PSNP.
>
>
> 511        De-aggregation/Disaggregation:  Process in which a node decides
> to
> 512           advertise more specific prefixes Southwards, either
> positively to
> 513           attract the corresponding traffic, or negatively to repel it.
> 514           Disaggregation is performed to prevent black-holing and
> suboptimal
> 515           routing to the more specific prefixes.
>
> [nit] "De-aggregation/Disaggregation"  It would be very nice if you
> settled on one word.  Disaggregation seems to be used the most, but
> dis-aggregation also shows up a couple of times.
>
>
> ...
> 521        Flood Repeater (FR):  A node can designate one or more
> northbound
> 522           neighbor nodes to be flood repeaters.  The flood repeaters
> are
> 523           responsible for flooding northbound TIEs further north.
> They are
> 524           similar to MPR in OSLR.  The document sometimes calls them
> flood
> 525           leaders as well.
>
> [minor] Please expand both MPR and OLSR.
>
>
> [minor] Also, please add a reference.  I see that MPR/OLSR are only
> mentioned once more (§4.2.3.9), and wonder if we even need to make
> reference to them.  The first paragraph in §4.2.3.9 already pretty
> much covers the intent of an MPR.
>
>
> 527        Bandwidth Adjusted Distance (BAD):  Each RIFT node can
> calculate the
> 528           amount of northbound bandwidth available towards a node
> compared
> 529           to other nodes at the same level and can modify the route
> distance
> 530           accordingly to allow for the lower level to adjust their load
> 531           balancing towards spines.
>
> [minor] A reference to §4.3.6.1 would be very nice.
>
>
> 533        Overloaded:  Applies to a node advertising `overload` attribute
> as
> 534           set.  The semantics closely follow the meaning of the same
> 535           attribute in [ISO10589-Second-Edition].
>
> [nit] s/advertising `overload` attribute/advertising the `overload`
> attribute
>
>
> [minor] There is no overload attribute in ISO10589, just an Overload
> Bit.  Also, §4.3.1 (please add a reference) calls it the overload bit.
>
>
> ...
> 540        Three-Way Adjacency:  RIFT tries to form a unique adjacency
> over an
> 541           interface and exchange local configuration and necessary ZTP
> 542           information.  An adjacency is only advertised in node TIEs
> and
> 543           used for computations after it achieved three-way state,
> i.e. both
> 544           routers reflected each other in LIEs including relevant
> security
> 545           information.  LIEs before three-way state is reached may
> carry ZTP
> 546           related information already.
>
> [minor] s/tries to form a unique adjacency/forms a unique adjacency
>
>
> [nit] s/and exchange local/and exchanges local
>
>
> [nit] s/after it achieved three-way state/after the three-way state is
> achieved
>
>
> [minor] Note that three-way, threeway and three way are all used in
> different places.  Please be consistent.
>
>
> ...
> 554        Neighbor:  Once a three-way adjacency has been formed a
> neighborship
> 555           relationship contains the neighbor's properties.  Multiple
> 556           adjacencies can be formed to a remote node via parallel
> interfaces
> 557           but such adjacencies are NOT sharing a neighbor structure.
> Saying
> 558           "neighbor" is thus equivalent to saying "a three-way
> adjacency".
>
> [] How is load balancing achieved through parallel links between the
> same pair of routers?  Just putting this comment here so I don't
> forget it later.
>
>
> ...
> 566        Shortest-Path First (SPF):  A well-known graph algorithm
> attributed
> 567           to Dijkstra that establishes a tree of shortest paths from a
> 568           source to destinations on the graph.  We use SPF acronym due
> to
> 569           its familiarity as general term for the node reachability
> 570           calculations RIFT can employ to ultimately calculate routes
> of
> 571           which Dijkstra algorithm is one.
>
> 573        North SPF (N-SPF):  A reachability calculation that is
> progressing
> 574           northbound, as example SPF that is using South Node TIEs
> only.
> 575           Normally it progresses a single hop only and installs default
> 576           routes.
>
> 578        South SPF (S-SPF):  A reachability calculation that is
> progressing
> 579           southbound, as example SPF that is using North Node TIEs
> only.
>
> [minor] Please add a reference to where the specific algorithm used by
> RIFT is specified.
>
>
> ...
> 585     3.2.  Topology
> 586                    ^ N      +--------+          +--------+
> 587     Level 2        |        |ToF   21|          |ToF   22|
> 588                E <-*-> W    ++-+--+-++          ++-+--+-++
> 589                    |         | |  | |            | |  | |
> 590                  S v      P111/2  P121/2         | |  | |
> 591                              ^ ^  ^ ^            | |  | |
> 592                              | |  | |            | |  | |
> 593               +--------------+ |  +-----------+  | |  |
> +---------------+
> 594               |                |    |         |  | |  |
>   |
> 595              South +-----------------------------+ |  |
>   ^
> 596               |    |           |    |         |    |  |
> All TIEs
> 597               0/0  0/0        0/0   +-----------------------------+
>   |
> 598               v    v           v              |    |  |           |
>   |
> 599               |    |           +-+    +<-0/0----------+           |
>   |
> 600               |    |             |    |       |    |              |
>   |
> 601             +-+----++ optional +-+----++     ++----+-+
> ++-----++
> 602     Level 1 |       | E/W link |       |     |       |           |
>   |
> 603             |Spin111+----------+Spin112|     |Spin121|
> |Spin122|
> 604             +-+---+-+          ++----+-+     +-+---+-+
> ++---+--+
> 605               |   |             |   South      |   |              |   |
> 606               |   +---0/0--->-----+ 0/0        |   +----------------+ |
> 607              0/0                | |  |         |                  | | |
> 608               |   +---<-0/0-----+ |  v         |   +--------------+ | |
> 609               v   |               |  |         |   |                | |
> 610             +-+---+-+          +--+--+-+     +-+---+-+
>  +---+-+-+
> 611     Level 0 |       |  (L2L)   |       |     |       |          |
>   |
> 612             |Leaf111+~~~~~~~~~~+Leaf112|     |Leaf121|
>  |Leaf122|
> 613             +-+-----+          +-+---+-+     +--+--+-+
>  +-+-----+
> 614               +                  +    \        /   +              +
> 615               Prefix111   Prefix112    \      /   Prefix121
>  Prefix122
> 616                                       multi-homed
> 617                                         Prefix
> 618             +---------- PoD 1 ---------+     +---------- PoD 2
> ---------+
>
> 620                   Figure 2: A Three Level Spine-and-Leaf Topology
> 621                         .+--------+  +--------+  +--------+  +--------+
> 622                         .|ToF   A1|  |ToF   B1|  |ToF   B2|  |ToF   A2|
> 623                         .++-+-----+  ++-+-----+  ++-+-----+  ++-+-----+
> 624                         . | |         | |         | |         | |
> 625                         . | |         | |         | +---------------+
> 626                         . | |         | |         |           | |   |
> 627                         . | |         | +-------------------------+ |
> 628                         . | |         |           |           | | | |
> 629                         . | +-----------------------+         | | | |
> 630                         . |           |           | |         | | | |
> 631                         . |           | +---------+ | +---------+ | |
> 632                         . |           | |           | |       |   | |
> 633                         . | +---------------------------------+   | |
> 634                         . | |         | |           | |           | |
> 635                         .++-+-----+  ++-+-----+  +--+-+---+  +----+-+-+
> 636                         .|Spine111|  |Spine112|  |Spine121|  |Spine122|
> 637                         .+-+---+--+  ++----+--+  +-+---+--+  ++---+---+
> 638                         .  |   |      |    |       |   |      |   |
> 639                         .  |   +--------+  |       |   +--------+ |
> 640                         .  |          | |  |       |          | | |
> 641                         .  |   -------+ |  |       |   +------+ | |
> 642                         .  |   |        |  |       |   |        | |
> 643                         .+-+---+-+   +--+--+-+   +-+---+-+  +---+-+-+
> 644                         .|Leaf111|   |Leaf112|   |Leaf121|  |Leaf122|
> 645                         .+-------+   +-------+   +-------+  +-------+
>
> 647                       Figure 3: Topology with Multiple Planes
>
> 649        We will use topology in Figure 2 (called commonly a fat
> tree/network
> 650        in modern IP fabric considerations [VAHDAT08] as homonym to the
> 651        original definition of the term [FATTREE]) in all further
> 652        considerations.  This figure depicts a generic "single plane
> fat-
> 653        tree" and the concepts explained using three levels apply by
> 654        induction to further levels and higher degrees of connectivity.
> 655        Further, this document will deal also with designs that provide
> only
> 656        sparser connectivity and "partitioned spines" as shown in
> Figure 3
> 657        and explained further in Section 4.1.2.
>
> [minor] The first sentence introduces another source to define fat
> tree, which is not mentioned in the Introduction nor in the
> Terminology.  This is not a huge deal, but it would be nice to keep
> consistency throughout.  IOW, include the new reference somewhere in
> the first couple of sections, settle on one, or simply just don't add
> a new reference.
>
>
> [minor] For completeness, it would be nice to explain that the figures
> are incomplete: for example, Figure 2 shows only some of the TIEs,
> Figure 3 shows none of them, etc...
>
>
> [] BTW, SVI graphics are now supported in xmltorfcv3.  Some of the
> figures might be easier to visualize that way than using ASCII art.
>
>
> 659     4.  RIFT: Routing in Fat Trees
>
> 661        We present here a detailed outline of a protocol optimized for
> 662        Routing in Fat Trees (RIFT) that in most abstract terms has many
> 663        properties of a modified link-state protocol
> 664        [RFC2328][ISO10589-Second-Edition] when distributing information
> 665        northbound and distance vector [RFC4271] protocol when
> distributing
> 666        information southbound.  While this is an unusual combination,
> it
> 667        does quite naturally exhibit the desirable properties we seek.
>
> [minor] s/detailed outline/detailed specification
>
>
> [nit] s/and distance vector/and a distance vector
>
>
> [] The references to OSPF/ISIS/BGP seem superfluous because those
> documents don't define generic link-state or distance vector protocols
> -- in fact, many would argue that BGP is a path vector protocol.  Just
> my opinion.  I would be very happy if the references are not included.
> I wonder if there are generic references that can be used instead of
> specific ones (for information purposes).
>
>
> 669     4.1.  Overview
>
> 671     4.1.1.  Properties
>
> 673        The most singular property of RIFT is that it floods flat
> link-state
> 674        information northbound only so that each level obtains the full
> 675        topology of levels south of it.  Link-State information is,
> with some
> 676        exceptions, never flooded East-West or back South again.
> Exceptions
> 677        like south reflection is explained in detail in Section 4.2.5.1
> and
> 678        east-west flooding at ToF level in multi-plane fabrics is
> outlined in
> 679        Section 4.1.2.  In southbound direction, the protocol operates
> like a
> 680        "fully summarizing, unidirectional" path vector protocol or
> rather a
> 681        distance vector with implicit split horizon.  Routing
> information,
> 682        normally just the default route, propagates one hop south and
> is 're-
> 683        advertised' by nodes at next lower level.  However, RIFT uses
> 684        flooding in the southern direction as well to avoid the
> overhead of
> 685        building an update per adjacency.  We omit describing the
> East-West
> 686        direction for the moment.
>
> [minor] What is "flat link-state information"?  It looks like this is
> the only place where "flat" is used.  Maybe s/flat/
>
>
> [nit] s/In southbound direction/In the southbound direction
>
>
> [] "...the protocol operates like a "fully summarizing,
> unidirectional" path vector protocol or rather a distance vector with
> implicit split horizon."  I hope that the operation is specified
> elsewhere, and that the document doesn't depend on these descriptions.
> Personal opinion: simple and direct language may serve you better.
>
>
> 688        Those information flow constraints create not only an
> anisotropic
> 689        protocol (i.e. the information is not distributed "evenly" or
> 690        "clumped" but summarized along the N-S gradient) but also a
> "smooth"
> 691        information propagation where nodes do not receive the same
> 692        information from multiple directions at the same time.
> Normally,
> 693        accepting the same reachability on any link, without
> understanding
> 694        its topological significance, forces tie-breaking on some kind
> of
> 695        distance metric.  And such tie-breaking leads ultimately in
> hop-by-
> 696        hop forwarding to shortest paths only.  In contrast to that,
> RIFT,
> 697        under normal conditions, does not need to tie-break same
> reachability
> 698        information from multiple directions.  Its computation
> principles
> 699        (south forwarding direction is always preferred) leads to
> valley-free
> 700        forwarding behavior.  And since valley free routing is
> loop-free, it
> 701        can use all feasible paths which is another highly desirable
> property
> 702        if available bandwidth should be utilized to the maximum extent
> 703        possible.
>
> [] "anisotropic"   This is my word of the day.  I learned a new one! :-)
>
>
> [nit] s/tie-break same/tie-break the same
>
>
> [minor] "valley-free"  Reference?
>
>
> 705        To account for the "northern" and the "southern" information
> split
> 706        the link state database is partitioned accordingly into "north
> 707        representation" and "south representation" TIEs.  In simplest
> terms
> 708        the North TIEs contain a link state topology description of
> lower
> 709        levels and and South TIEs carry simply default routes towards
> the
> 710        level above.  This oversimplified view will be refined
> gradually in
> 711        following sections while introducing protocol procedures and
> state
> 712        machines at the same time.
>
> [nit] s/in following/in the following
>
>
> 714     4.1.2.  Generalized Topology View
>
> 716        This section will shed some light on the topologies RIFT
> addresses,
> 717        including multi plane fabrics and their implications.  Readers
> that
> 718        are only interested in single plane designs, i.e. all
> top-of-fabric
> 719        nodes being topologically equal and initially connected to all
> the
> 720        switches at the level below them, can skip the rest of Section
> 4.1.2
> 721        and resulting Section 4.2.5.2 as well.
>
> [minor] "Readers...can skip the rest of Section 4.1.2 and resulting
> Section 4.2.5.2 as well."  I can see how a reader can skip a part of
> the overview, but §4.2* is where the specification is.  Are you saying
> that §4.2.5.2 doesn't have to be implemented/supported in some cases?
> Are there other sections that are also not needed in some cases?  Does
> this result in the ability to implement subsets of RIFT to support
> specific topologies?  Where is that discussed?
>
>
> ...
> 737     4.1.2.1.  Terminology
> ...
> 746        K: Denotes the number of ports in radix of a switch pointing
> north or
> 747           south.  Further, K_LEAF denotes number of ports pointing
> south,
> 748           i.e. towards leaves, and K_TOP for number of ports pointing
> north
> 749           towards a higher spine level.  To simplify the visual aids,
> 750           notations and further considerations, K will be mostly set to
> 751           Radix/2.
>
> [minor] Radix is defined in §3.1 as the number of ports.  s/Denotes
> the number of ports in radix of a switch/Denotes the radix of a switch
>
>
> ...
> 757        N: Denote the number of independent ToF planes in a topology.
>
> [nit] s/Denote/Denotes
>
>
> ...
> 766     4.1.2.2.  Clos as Crossed Crossbars
>
> 768        The typical topology for which RIFT is defined is built of P
> number
> 769        of PoDs and connected together by S number of ToF nodes.  A PoD
> node
> 770        has K number of ports (also called Radix).  We consider half of
> them
> 771        (K=Radix/2) as connecting host devices from the south, and the
> other
> 772        half connecting to interleaved PoD Top-Level switches to the
> north.
> 773        Ratio K can be chosen differently without loss of generality
> when
> 774        port speeds differ or the fabric is oversubscribed but K=R/2
> allows
> 775        for more readable representation whereby there are as many ports
> 776        facing north as south on any intermediate node.  We represent a
> node
> 777        hence in a schematic fashion with ports "sticking out" to its
> north
> 778        and south rather than by the usual real-world front faceplate
> designs
> 779        of the day.
>
> [nit] s/Ratio K can be chosen differently/The K ratio can be chosen
> differently
>
>
> [minor] "K=R/2"  R is defined in §4.1.2.1 as the redundancy, not the radix.
>
>
> 781        Figure 4 provides a view of a leaf node as seen from the north,
> i.e.
> 782        showing ports that connect northbound.  For lack of a better
> symbol,
> 783        we have chosen to use the "o" as ASCII visualisation of a single
> 784        port.  In this example, K_LEAF has 6 ports.  Observe that the
> number
> 785        of PoDs is not related to Radix unless the ToF Nodes are
> constrained
> 786        to be the same as the PoD nodes in a particular deployment.
>
> [minor] "showing ports that connect northbound...K_LEAF has 6 ports"
> The ports that connect north are K_TOP.
>
>
> 788            Top view
> 789             +---+
> 790             |   |
> 791             | o |     e.g., Radix = 12, K_LEAF = 6
> 792             |   |
> 793             | o |
> 794             |   |      -------------------------
> 795             | o ------- Physical Port (Ethernet) ----+
> 796             |   |      -------------------------     |
> 797             | o |                                    |
> 798             |   |                                    |
> 799             | o |                                    |
> 800             |   |                                    |
> 801             | o |                                    |
> 802             |   |                                    |
> 803             +---+                                    |
>
> 805               ||             ||      ||      ||      ||      ||      ||
> 806             +----+
> +------------------------------------------------+
> 807             |    |       |
>    |
> 808             +----+
> +------------------------------------------------+
> 809               ||             ||      ||      ||      ||      ||      ||
> 810                   Side views
>
> 812                           Figure 4: A Leaf Node, K_LEAF=6
>
> 814        The Radix of a PoD's top node may be different than that of the
> leaf
> 815        node.  Though, more often than not, a same type of node is used
> for
> 816        both, effectively forming a square (K*K).  In general case, we
> could
> 817        have switches with K_TOP southern ports on nodes at the top of
> the
> 818        PoD which are not necessarily the same as K_LEAF.  For
> instance, in
> 819        the representations below, we pick a 6 port K_LEAF and a 8 port
> 820        K_TOP.  In order to form a crossbar, we need K_TOP Leaf Nodes as
> 821        illustrated in Figure 5.
>
> [nit] s/In general case/In the general case
>
>
> [minor] "K_TOP southern ports"  Aren't K_TOP the ports pointing north?
>  The description is confusing because the terminology from the last
> section is not used in the same way -- the description mixes the
> terminology with the number represented.  For example, "K_TOP Leaf
> Nodes" doesn't make sense if the terminology is strictly applied,
> where K_TOP is the "number of ports pointing north".  Also (if I
> understood Figure 4 correctly), each node below has 6 K_TOP ports --
> presumably the node at the top has 8 K_LEAF ports.
>
>
> 823                  +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 824                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 825                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 826                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 827                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 828                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 829                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 830                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 831                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 832                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 833                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 834                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 835                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 836                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 837                  +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
>
> 839                      Figure 5: Southern View of a PoD, K_TOP=8
>
> 841        As further visualized in Figure 6 the K_TOP Leaf Nodes are fully
> 842        interconnected with the K_LEAF PoD-top nodes, providing
> connectivity
> 843        that can be represented as a crossbar when "looked at" from the
> 844        north.  The result is that, in the absence of a failure, a
> packet
> 845        entering the PoD from the north on any port can be routed to
> any port
> 846        in the south of the PoD and vice versa.  And that is precisely
> why it
> 847        makes sense to talk about a "switching matrix".
>
> [minor] "K_TOP Leaf Nodes are fully interconnected with the K_LEAF
> PoD-top nodes"  Same comment about the terminology...   I only see one
> "PoD top Node" with one connection to a switch, not a full
> interconnect.
>
>
> [minor] The figure also doesn't show the connection between the
> switches (if any)...and I'm not sure what the "connectors" (?) on the
> switch at the top/bottom are (there seem to be more of them than
> ports).
>
>
> 849                                           E<-*->W
>
> 851               +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 852               |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 853             +--------------------------------------------------------+
> 854             |   o      o      o      o      o      o      o      o   |
> 855             +--------------------------------------------------------+
> 856             +--------------------------------------------------------+
> 857             |   o      o      o      o      o      o      o      o   |
> 858             +--------------------------------------------------------+
> 859             +--------------------------------------------------------+
> 860             |   o      o      o      o      o      o      o      o   |
> 861             +--------------------------------------------------------+
> 862             +--------------------------------------------------------+
> 863             |   o      o      o      o      o      o      o      o   |
> 864             +--------------------------------------------------------+
> 865             +--------------------------------------------------------+
> 866             |   o      o      o      o      o      o      o      o
> |<-+
> 867             +--------------------------------------------------------+
>  |
> 868             +--------------------------------------------------------+
>  |
> 869             |   o      o      o      o      o      o      o      o   |
>  |
> 870             +--------------------------------------------------------+
>  |
> 871               |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
>  |
> 872               +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
>  |
> 873                          ^
>  |
> 874                          |
>  |
> 875                          |     ----------        ---------------------
>  |
> 876                          +----- Leaf Node        PoD top Node (Spine)
> --+
> 877                                ----------        ---------------------
>
> 879                 Figure 6: Northern View of a PoD's Spines, K_TOP=8
>
> 881        Side views of this PoD is illustrated in Figure 7 and Figure 8.
>
> 883                           Connecting to Spine
>
> 885           ||      ||      ||      ||      ||      ||      ||      ||
> 886
>  +----------------------------------------------------------------+   N
> 887       |                    PoD top Node seen sideways
>  |   ^
> 888
>  +----------------------------------------------------------------+   |
> 889           ||      ||      ||      ||      ||      ||      ||      ||
>     *
> 890         +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
>     |
> 891         |    |  |    |  |    |  |    |  |    |  |    |  |    |  |    |
>     v
> 892         +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
>     S
> 893           ||      ||      ||      ||      ||      ||      ||      ||
>
> 895                                Connecting to Client nodes
>
> 897                   Figure 7: Side View of a PoD, K_TOP=8, K_LEAF=6
>
> [minor] I count 8 connections to the south in the top node...and just
> one on the switches below it.
>
>
> 899                        Connecting to Spine
>
> 901               ||      ||      ||      ||      ||      ||
> 902             +----+  +----+  +----+  +----+  +----+  +----+
>  N
> 903             |    |  |    |  |    |  |    |  |    |  |   PoD top Nodes
>   ^
> 904             +----+  +----+  +----+  +----+  +----+  +----+
>  |
> 905               ||      ||      ||      ||      ||      ||
>  *
> 906           +------------------------------------------------+
>  |
> 907           |              Leaf seen sideways                |
>  v
> 908           +------------------------------------------------+
>  S
> 909               ||      ||      ||      ||      ||      ||
>
> 911                        Connecting to Client nodes
>
> [minor] A leaf doesn't have southbound ports/adjacencies.  What is
> this leaf connected to?
>
>
> 913         Figure 8: Other Side View of a PoD, K_TOP=8, K_LEAF=6, 90o
> turn in
> 914                                      E-W Plane
>
> [minor] In this case I count a leaf with 6 northbound interfaces.
>
>
> [minor] "90o turn in E-W Plane"  I don't know what that is.
>
>
> 916        As next step, let us observe that a resulting PoD can be
> abstracted
> 917        as a bigger node with a number K of K_POD= K_TOP * K_LEAF, and
> the
> 918        design can recurse.
>
> [minor] K is already defined as the number of ports (§4.1.2.1).
>
>
> [minor] Lost again.  If the PoD is abstracted as a single node, then
> it would have K_TOP + K_LEAF nodes, not sure where the "*" comes from
> or what is trying to denote.
>
>
> 920        It will be critical at this point that, before progressing
> further,
> 921        the concept and the picture of "crossed crossbars" is clear.
> Else,
> 922        the following considerations might be difficult to comprehend.
>
> [] The concept is clear to me -- I don't find the explanation and the
> corresponding pictures specially helpful.
>
>
> ...
> 929        This topology is also referred to as a single plane
> configuration and
> 930        is quite popular due to its simplicity.  In order to reach a 1:1
> 931        connectivity ratio between the ToF and the leaves, it results
> that
> 932        there are K_TOP ToF nodes, because each port of a ToP node
> connects
> 933        to a different ToF node, and K_LEAF ToP nodes for the same
> reason.
> 934        Consequently, it will take (P * K_LEAF) ports on a ToF node to
> 935        connect to each of the K_LEAF ToP nodes of the P PoDs, as shown
> in
> 936        Figure 9.
>
> [minor] "there are K_TOP ToF nodes...and K_LEAF ToP nodes"  As with
> other places, the terminology is not used as defined earlier.  K_*
> refer to the number of ports in a specific switch, so their use is
> relative to that switch.  In this case each ToP has K_TOP links, and
> each ToF has K_LEAF links.  The use without the reference point is
> confusing.
>
>
> [minor] "(P * K_LEAF)"  This calculation is clear once one realizes
> that the previous discussion was for the number of ports per PoD, not
> total (as the definition of K_* suggests).
>
>
>
> 938             [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] <-----+
> 939              |   |   |   |   |   |   |   |        |
> 940           [=================================]     |     -----------
> 941              |   |   |   |   |   |   |   |        +----- Top-of-Fabric
> 942             [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]       +----- Node
>  -------+
> 943                                                   |     -----------
>     |
> 944                                                   |
>     v
> 945             +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ <-----+
>    +-+
> 946             | | | | | | | | | | | | | | | |
>    | |
> 947           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 948           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
> -------------------------  | |
> 949           [ |o| |o| |o| |o| |o| |o| |o| |o<--- Physical Port
> (Ethernet)  | |
> 950           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
> -------------------------  | |
> 951           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 952           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 953             | | | | | | | | | | | | | | | |
>    | |
> 954           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 955           [ |o| |o| |o| |o| |o| |o| |o| |o| ]      --------------
>    | |
> 956           [ |o| |o| |o| |o| |o| |o| |o| |o| ] <---  PoD top level
>    | |
> 957           [ |o| |o| |o| |o| |o| |o| |o| |o| ]       node (Spine)  ---+
>   | |
> 958           [ |o| |o| |o| |o| |o| |o| |o| |o| ]      --------------    |
>   | |
> 959           [ |o| |o| |o| |o| |o| |o| |o| |o| ]                        |
>   | |
> 960             | | | | | | | | | | | | | | | |  -+           +-   +-+   v
>   | |
> 961           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |           |  --| |--[
> ]--| |
> 962           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |   -----   |  --| |--[
> ]--| |
> 963           [ |o| |o| |o| |o| |o| |o| |o| |o| ] +--- PoD ---+  --| |--[
> ]--| |
> 964           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |   -----   |  --| |--[
> ]--| |
> 965           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |           |  --| |--[
> ]--| |
> 966           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |           |  --| |--[
> ]--| |
> 967             | | | | | | | | | | | | | | | |  -+           +-   +-+
>   | |
> 968             +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+
>    +-+
>
> 970           Figure 9: Fabric Spines and TOFs in Single Plane Design, 3
> PoDs
>
> [minor] I believe you when you say that this figure shows how "it will
> take (P * K_LEAF) ports on a ToF node to connect to each of the K_LEAF
> ToP nodes of the P PoDs", but the drawing is not straight forward to
> interpret.  Among other reasons because there seem to be 3 different
> connection types/interpretations/? to the ToF -- a different one for
> each PoD.
>
>
> 972        The top view can be collapsed into a third dimension where the
> hidden
> 973        depth index is representing the PoD number.  We can then show
> one PoD
> 974        as a class of PoDs and hence save one dimension in our
> 975        representation.  The Spine Node expands in the depth and the
> vertical
> 976        dimensions, whereas the PoD top level Nodes are constrained, in
> 977        horizontal dimension.  A port in the 2-D representation
> represents
> 978        effectively the class of all the ports at the same position in
> all
> 979        the PoDs that are projected in its position along the depth
> axis.
> 980        This is shown in Figure 10.
>
> [] Do we really need this extra representation?
>
>
> ...
> 1003       As simple as single plane deployment is it introduces a limit
> due to
> 1004       the bound on the available radix of the ToF nodes that has to
> be at
> 1005       least P * K_LEAF.  Nevertheless, we will see that a distinct
> 1006       advantage of a connected or non-partitioned Top-of-Fabric is
> that all
> 1007       failures can be resolved by simple, non-transitive, positive
> 1008       disaggregation (i.e. nodes advertising more specific prefixes
> with
> 1009       the default to the level below them that is however not
> propagated
> 1010       further down the fabric) as described in Section 4.2.5.1 . In
> other
> 1011       words; non-partitioned ToF nodes can always reach nodes below or
> 1012       withdraw the routes from PoDs they cannot reach unambiguously.
> And
> 1013       with this, positive disaggregation can heal all failures and
> still
> 1014       allow all the ToF nodes to see each other via south reflection.
> 1015       Disaggregation will be explained in further detail in Section
> 4.2.5.
>
> [nit] s/deployment is it introduces/deployment is, it introduces
>
>
> 1017       In order to scale beyond the "single plane limit", the
> Top-of-Fabric
> 1018       can be partitioned by a N number of identically wired planes
> where N
> 1019       is an integer divider of K_LEAF.  The 1:1 ratio and the desired
> 1020       symmetry are still served, this time with (K_TOP * N) ToF
> nodes, each
> 1021       of (P * K_LEAF / N) ports.  N=1 represents a non-partitioned
> Spine
> 1022       and N=K_LEAF is a maximally partitioned Spine.  Further, if R
> is any
> 1023       integer divisor of K_LEAF, then N=K_LEAF/R is a feasible number
> of
> 1024       planes and R a redundancy factor.  If proves convenient for
> 1025       deployments to use a radix for the leaf nodes that is a power
> of 2 so
> 1026       they can pick a number of planes that is a lower power of 2.
> The
> 1027       example in Figure 11 splits the Spine in 2 planes with a
> redundancy
> 1028       factor R=3, meaning that there are 3 non-intersecting paths
> between
> 1029       any leaf node and any ToF node.  A ToF node must have, in this
> case,
> 1030       at least 3*P ports, and be directly connected to 3 of the 6
> PoD-ToP
> 1031       nodes (spines) in each PoD.
>
> [nit] s/by a N number/by an N number
>
>
> [minor] "(K_TOP * N) ToF nodes, each of (P * K_LEAF / N) ports"
> Again, the use of the terminology without a reference assumes a
> specific interpretation by the reader.
>
>
> [minor] "if R is any integer divisor of K_LEAF, then N=K_LEAF/R is a
> feasible number of planes and R a redundancy factor."  Please expand
> on the meaning of the redundancy factor.
>
>
> [minor] "6 PoD-ToP nodes"  I count 8.
>
>
> 1033            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 1034          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1035          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1036          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1037          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1038          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1039          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1040          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1041          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1042          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1043            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
>
> 1045          Plane 1
> 1046         ----------- . ------------ . ------------ . ------------ .
> --------
> 1047          Plane 2
>
> 1049            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 1050          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1051          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1052          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1053          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1054          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1055          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1056          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1057          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1058          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1059            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 1060                     ^
> 1061                     |
> 1062                     |      ----------------
> 1063                     +----- Top-of-Fabric node
> 1064                            "across" depth
> 1065                            ----------------
>
> 1067        Figure 11: Northern View of a Multi-Plane ToF Level, K_LEAF=6,
> N=2
>
> 1069       At the extreme end of the spectrum it is even possible to fully
> 1070       partition the spine with N = K_LEAF and R=1, while maintaining
> 1071       connectivity between each leaf node and each Top-of-Fabric
> node.  In
> 1072       that case the ToF node connects to a single Port per PoD, so it
> 1073       appears as a single port in the projected view represented in
> 1074       Figure 12.  The number of ports required on the Spine Node is
> more or
> 1075       equal to P, the number of PoDs.
>
> [minor] "more or equal to P"  ??
>
>
> ...
> 1121    4.1.3.  Fallen Leaf Problem
> ...
> 1140       In a maximally partitioned fabric, the redundancy factor is R=
> 1, so
> 1141       any breakage in the fabric may cause one or more fallen leaves.
> 1142       However, not all cases require disaggregation.  The following
> cases
> 1143       do not require particular action in such scenario:
>
> [major] A quick look at §4.2.5.1 doesn't explicitly mention how a node
> considers the redundancy factor...but that may be included in the "DAG
> computation" mentioned in the first step.  I'm putting this comment
> here so I don't forget later...
>
>
> 1145          If a southern link on a node goes down, then connectivity
> through
> 1146          that node is lost for all nodes south of it.  There is no
> need to
> 1147          disaggregate since the connectivity to this node is lost for
> all
> 1148          spine nodes in a same fashion.
>
> 1150          If a ToF Node goes down, then northern traffic towards it is
> 1151          routed via alternate ToF nodes in the same plane and there
> is no
> 1152          need to disaggregate routes.
> ...
> 1159          If the breakage is the last northern link from a ToP node to
> a ToF
> 1160          node going down, then the fallen leaf problem affects only
> The ToF
> 1161          node, and the connectivity to all the nodes in the PoD is
> lost
> 1162          from that ToF node.  This can be observed by other ToF nodes
> 1163          within the plane where the ToP node is located and positively
> 1164          disaggregated within that plane.
>
> [nit] s/only The ToF/only the ToF
>
>
> 1166       On the other hand, there is a need to disaggregate the routes to
> 1167       Fallen Leaves in a transitive fashion, all the way to the other
> 1168       leaves in the following cases:
>
> [] Without having seen the specific mechanism, this overview is hard to
> digest.
>
>
> 1170       o  If the breakage is the last northern link from a leaf node
> within
> 1171          a plane (there is only one such link in a maximally
> partitioned
> 1172          fabric) that goes down, then connectivity to all unicast
> prefixes
> 1173          attached to the leaf node is lost within the plane where the
> link
> 1174          is located.  Southern Reflection by a leaf node, e.g.,
> between ToP
> 1175          nodes, if the PoD has only 2 levels, happens in between
> planes,
> 1176          allowing the ToP nodes to detect the problem within the PoD
> where
> 1177          it occurs and positively disaggregate.  The breakage can be
> 1178          observed by the ToF nodes in the same plane through the North
> 1179          flooding of TIEs from the ToP nodes.  The ToF nodes however
> need
> 1180          to be aware of all the affected prefixes for the negative,
> 1181          possibly transitive disaggregation to be fully effective
> (i.e.  a
> 1182          node advertising in control plane that it cannot reach a
> certain
> 1183          more specific prefix than default whereas such
> disaggregation must
> 1184          in extreme condition propagate further down southbound).  The
> 1185          problem can also be observed by the ToF nodes in the other
> planes
> 1186          through the flooding of North TIEs from the affected leaf
> nodes,
> 1187          together with non-node North TIEs which indicate the affected
> 1188          prefixes.  To be effective in that case, the positive
> 1189          disaggregation must reach down to the nodes that make the
> plane
> 1190          selection, which are typically the ingress leaf nodes.  The
> 1191          information is not useful for routing in the intermediate
> levels.
>
> [nit] s/in control plane/in the control plane
>
>
> [nit] s/in extreme condition/in the extreme condition
>
>
> 1193       o  If the breakage is a ToP node in a maximally partitioned
> fabric -
> 1194          in which case it is the only ToP node serving the plane in
> that
> 1195          PoD - goes down, then the connectivity to all the nodes in
> the PoD
> 1196          is lost within the plane where the ToP node is located.
> 1197          Consequently, all leaves of the PoD fall in this plane.
> Since the
> 1198          Southern Reflection between the ToF nodes happens only
> within a
> 1199          plane, ToF nodes in other planes cannot discover fallen
> leaves in
> 1200          a different plane.  They also cannot determine beyond their
> local
> 1201          plane whether a leaf node that was initially reachable has
> become
> 1202          unreachable.  As the breakage can be observed by the ToF
> nodes in
> 1203          the plane where the breakage happened, the ToF nodes in the
> plane
> 1204          need to be aware of all the affected prefixes for the
> negative
> 1205          disaggregation to be fully effective.  The problem can also
> be
> 1206          observed by the ToF nodes in the other planes through the
> flooding
> 1207          of North TIEs from the affected leaf nodes, if there are
> only 3
> 1208          levels and the ToP nodes are directly connected to the leaf
> nodes,
> 1209          and then again it can only be effective it is propagated
> 1210          transitively to the leaf, and useless above that level.
>
> [nit] s/fabric -...- goes down,/fabric -...-,
>
>
> 1212       For the sake of easy comprehension let us roll the abstractions
> back
> 1213       into a simple example and observe that in Figure 3 the loss of
> link
> 1214       Spine 122 to Leaf 122 will make Leaf 122 a fallen leaf for
> Top-of-
> 1215       Fabric plane B.  Worse, if the cabling was never present in
> first
> 1216       place, plane B will not even be able to know that such a fallen
> leaf
> 1217       exists.  Hence partitioning without further treatment results
> in two
> 1218       grave problems:
>
> [] "For the sake of easy comprehension...Figure 3..."  Finally!
> Hmmm...sorry...I mean, it is a little ironic that after all the new
> terminology, detailed descriptions and figures, the clearer
> explanation uses the simplest drawing.
>
>
> [nit] s/in first place/in the first place
>
>
> 1220       o  Leaf 111 trying to route to Leaf 122 MUST choose Spine 111 in
> 1221          plane A as its next hop since plane B will inevitably
> blackhole
> 1222          the packet when forwarding using default routes or do
> excessive
> 1223          bow tying.  This information must be in its routing table.
>
> [major] s/MUST/must   This is not a Normative statement, just a
> statement of fact (inside an example).
>
>
> 1225       o  Any kind of "flooding" or distance vector trying to deal
> with the
> 1226          problem by distributing host routes will be able to converge
> only
> 1227          using paths through leaves.  The flooding of information on
> Leaf
> 1228          122 would have to go up to Top-of-Fabric A and then
> "loopback"
> 1229          over other leaves to ToF B leading in extreme cases to
> traffic for
> 1230          Leaf 122 when presented to plane B taking an "inverted
> fabric"
> 1231          path where leaves start to serve as TOFs, at least for the
> 1232          duration of a protocol's convergence.
>
> [] "Any kind of "flooding" or distance vector..."  I can guess the
> meaning, but it would be better that I don't have to.   Maybe
> something like: "Any advertisement..."
>
>
> [minor] "information on Leaf 122"  s/on/ about (?), or maybe from. ??
>
>
> 1234    4.1.4.  Discovering Fallen Leaves
>
> 1236       As illustrated later, and without further proof, the way to
> deal with
> 1237       fallen leaves in multi-plane designs, when aggregation is used,
> is
> 1238       that RIFT requires all the ToF nodes to share the same north
> topology
> 1239       database.  This happens naturally in single plane design by the
> means
> 1240       of northbound flooding and south reflection but needs additional
> 1241       considerations in multi-plane fabrics.  To satisfy this RIFT, in
> 1242       multi-plane designs, relies at the ToF level on ring
> interconnection
> 1243       of switches in multiple planes.  Other solutions are possible
> but
> 1244       they either need more cabling or end up having much longer
> flooding
> 1245       paths and/or single points of failure.
>
> [minor] "As illustrated later..."  Where?
>
>
> [] "and without further proof"  I hope this is at least specified at
> that later point.
>
>
> [nit] s/To satisfy this RIFT, in multi-plane designs, relies/To
> satisfy this need in multi-plane designs, RIFT relies
>
>
> 1247       In detail, by reserving two ports on each Top-of-Fabric node it
> is
> 1248       possible to connect them together by interplane bi-directional
> rings
> 1249       as illustrated in Figure 13.  The rings will be used to
> exchange full
> 1250       north topology information between planes.  All ToFs having same
> 1251       north topology allows by the means of transitive, negative
> 1252       disaggregation described in Section 4.2.5.2 to efficiently fix
> any
> 1253       possible fallen leaf scenario.  Somewhat as a side-effect, the
> 1254       exchange of information fulfills the ask to present full view
> of the
> 1255       fabric topology at the Top-of-Fabric level, without the need to
> 1256       collate it from multiple points by additional complexity of
> 1257       technologies like [RFC7752].
>
> [nit] s/fulfills the ask to present full view/fulfills the requirement
> to have a full view
>
>
> [] "..., without the need to collate it from multiple points by
> additional complexity of technologies like [RFC7752]."  This last
> phrase is unnecessary: because carrying RIFT information in BGP-LS is
> not defined, and more importantly, there's no need to criticize other
> technology to make RIFT look better.
>
>
> 1259               +---+  +---+  +---+  +---+  +---+  +---+  +--------+
> 1260               |   |  |   |  |   |  |   |  |   |  |   |  |        |
> 1261               |      |      |      |      |      |      |        |
> 1262             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1263           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1264           | | o |  | o |  | o |  | o |  | o |  | o |  | o | |    |
> Plane A
> 1265           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1266             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1267              |      |      |      |      |      |      |         |
> 1268             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1269           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1270           | | o |  | o |  | o |  | o |  | o |  | o |  | o | |    |
> Plane B
> 1271           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1272             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1273               |      |      |      |      |      |      |        |
> 1274                                   ...                            |
> 1275               |      |      |      |      |      |      |        |
> 1276             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1277           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1278           | | o |  | o |  | o |  | o |  | o |  | o |  | o | |    |
> Plane X
> 1279           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1280             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1281               |      |      |      |      |      |      |        |
> 1282               |   |  |   |  |   |  |   |  |   |  |   |  |        |
> 1283               +---+  +---+  +---+  +---+  +---+  +---+  +--------+
> 1284        Rings    1      2      3      4      5      6      7
>
> 1286         Figure 13: Connecting Top-of-Fabric Nodes Across Planes by
> Rings
>
> [minor] Is that one ring per plane, multiple rings per plane or a big
> ring for all the planes?  The drawing is not clear to me. :-(
>
>
> 1288    4.1.5.  Addressing the Fallen Leaves Problem
>
> 1290       One consequence of the "Fallen Leaf" problem is that some
> prefixes
> 1291       attached to the fallen leaf become unreachable from some of the
> ToF
> 1292       nodes.  RIFT proposes two methods to address this issue, the
> positive
> 1293       and the negative disaggregation.  Both methods flood South TIEs
> to
> 1294       advertise the impacted prefix(es).
>
> [nit] s/RIFT proposes two methods/RIFT defines two methods
>
>
> [End of Review - Part 1]
>
> _______________________________________________
> RIFT mailing list
> RIFT@ietf.org
> https://www.ietf.org/mailman/listinfo/rift
>