[Rift] Fwd: AD Review of draft-ietf-rift-rift-12 (Part 1)

Tony Przygienda <tonysietf@gmail.com> Mon, 10 May 2021 21:09 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A8B963A2B2E for <rift@ietfa.amsl.com>; Mon, 10 May 2021 14:09:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Level:
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eqOe9uZcHTXC for <rift@ietfa.amsl.com>; Mon, 10 May 2021 14:09:41 -0700 (PDT)
Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D7A883A2B31 for <rift@ietf.org>; Mon, 10 May 2021 14:09:40 -0700 (PDT)
Received: by mail-io1-xd2b.google.com with SMTP id d24so6294215ios.2 for <rift@ietf.org>; Mon, 10 May 2021 14:09:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=UkvBsLLVMOlTcQspI8DrGxpni4LByBFyrNibwRod3TE=; b=iWjdXmVAsJuFUiMBcC2jA5LjZ+4jEqJ/k7XZuqZzIAGB68x7N35WR7HQjd0KxTx4TC jryiVZw9NKujbgNiPed95GjPT3NFYY4eveAZqESnIHl851u4fZT00Sf/26YAY/1EjtJJ tsxYpOYdi8rc4D+y7UA5pOTPt9EAL/t5nldpO/WfW8RkiWwwwX7EcsK3EOTmyWY5svoD jYhQuiKKWka5Wu5xhwispNEB/MVK6Mzb1lXzC7aLQ6Dcofw2H7wVxopOzFsdZ5k3WC3b mQKJ2BqhGwqAvOk3eVH65Lz25Du3IpPqrd36mnaVkUEnDsHSMAvXAGr4mTFIGqmkryzG nz2A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=UkvBsLLVMOlTcQspI8DrGxpni4LByBFyrNibwRod3TE=; b=T3Q1tOoqDbA1M0F5fR8/mOgfdYvNTu61Vrp1aer5nIvgUZTeljNSh/zBos+7gIWTmZ NSzvI+9NpXxuaHxGVfPI1Xs8LWZGHLg+wub0wf/6Tw9rk7Q5iawrJNkfEmv6K8nvAIrT Ls1U39fXt4RXv0NI7cpbYplws1QgHdaDdgoiY1kenz+WMGUSANreCzwfQMkDv1KzKVFX 17mhkh/G6fMOYyLSXBqsz9TEX0rvsTUT5YrlyisWVkSf7nUmKkaPi0wn6IjiUHEUqu1V 1iZ5D1D+LxViseUXfYaNu6WDYDB5Smb6YgAA0YPpej3I9D5LlGVTUb7N1WrLpsRp4cqt EJHg==
X-Gm-Message-State: AOAM5322awwTpl/bNpjLAo4OdnUTq7xbjgAbZPHzEJqzFDdeWw/teR+Q /rBNCsNwuP31bNQuTNFo/sKnruRm3U2xVhwMEwtxYoAeRvL59bYo
X-Google-Smtp-Source: ABdhPJwX1Ig3O3XgorXIXSi4sc7o0kQE2CiW/+DEcSTUzHXTsInVUDGk1iOXpMryz2eXihh1WmoGdzaXau3aFBebg5A=
X-Received: by 2002:a6b:d30c:: with SMTP id s12mr19831157iob.191.1620680978431; Mon, 10 May 2021 14:09:38 -0700 (PDT)
MIME-Version: 1.0
References: <CAMMESsxTQnUDMGRiLPhB+Ci090xkE7Ea9HLC8E4SLQ7rv+qFnQ@mail.gmail.com> <CA+wi2hO+uOz4ubANSyyqYg0uGRtCYeMF1JXBJziw-taqrNDa0Q@mail.gmail.com>
In-Reply-To: <CA+wi2hO+uOz4ubANSyyqYg0uGRtCYeMF1JXBJziw-taqrNDa0Q@mail.gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Mon, 10 May 2021 23:09:01 +0200
Message-ID: <CA+wi2hO+F8iGPei-Fcy81G71TCkhFDG0bC_eBY-=cHHBWj8Dzw@mail.gmail.com>
To: rift@ietf.org
Content-Type: multipart/alternative; boundary="0000000000000385e605c2003223"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/gZ8olgZoQLVDEMpzFYTyt5MAEqs>
Subject: [Rift] Fwd: AD Review of draft-ietf-rift-rift-12 (Part 1)
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 May 2021 21:09:53 -0000

for posterity ...

---------- Forwarded message ---------
From: Tony Przygienda <tonysietf@gmail.com>
Date: Mon, May 10, 2021 at 11:08 PM
Subject: Re: [Rift] AD Review of draft-ietf-rift-rift-12 (Part 1)
To: Alvaro Retana <aretana.ietf@gmail.com>


detailed comments inline, version -13 will incorporate

thanks

-- tony


On Fri, Jan 15, 2021 at 8:56 PM Alvaro Retana <aretana.ietf@gmail.com>
wrote:

> Dear authors:
>
> This is Part 1 of my review of this document.  Given that it is so
> long, in order to make progress and not completely block the rest of
> my publication queue [1], I decided to provide comments in parts.
> Please address each set of comments as you can and update the document
> accordingly -- I also expect a reply to this message, especially to
> any points/comments that you might not agree with or want to discuss
> further.  That will make my review of any changes easier/faster.  I
> will wait for a reply before looking at any document updates.
>
> OTOH, even while interleaving my reading with other documents, I won't
> wait for each part to be addressed before starting on the next one.
> But I may return to parts I've already reviewed as I get further into
> the document.
>
>
> Part 1 includes the introductory text through the end of §4.1 (Overview).
>
> While I appreciate the overview of the topologies and the protocol, I
> think that this extended introductory material is at times too
> long/wordy and complex (for the average reader, sometimes being forced
> to make assumptions or jump ahead), but also incomplete.  To be more
> specific:
>

reader's guide upfront should help that, i.e. guide readers depending on
what they need to correct section, tell them what needs reading


>
> - The description of the general topology (§4.1.2) gets in to significant
>   detail and complexity, including the way in which the concepts are
> depicted
>   in the figures.  (ASCII art is not the best, you might want to take
>   advantage of using SVI.)  I noticed that none of the figures/sections in
>   §4.1.2 are referenced elsewhere (but the simple Figures 2 and 3 are),
> which
>   makes me think about the value of the in-depth treatment if it will not
> be
>   explicitly considered later on.
>

Jordan is preparing very nice SVGs. We will retain the ASCII form (it's
actually easier on the mind _once_ one understands it is the feeling
by most authors) while the SVG will allow an "intuitive first
understanding".

We will check figure references and make sure they are all refered to
after we have the negative section reordered (example front) and SVG done.


>
>    In several places I was under the impression that a DC design guide was
>    being presented.  Which brings up the question: does RIFT require the
>    topologies to be exactly as the ones described to operate correctly?
>

AfAIS your input here is contradictory. There is a desire for "operational
description" and
on the other hand no desire for "DC design guide". Both are largely
in the same category, operational considerations have a lot to do with
network topology normally. We try to introduce
the reader to the CLOS as fundamental concept (since we don't want to
assume readers know this, in fact most reviewers needed it), on the other
hand not to start talk DC architectures too extensively since this is a
protocol spec so the variants are  treated in section 6.3. I adjourned
it since it was mildly behind with addition of horizontal links &
leaf-2-leaf procedures which are introduced gradually later in the spec
to not make the text overly complicated from start on.

Generally, other fabric topologies are not considered since they remain
largely research topics @ this point in time or are more amenable for
NUMA computattion and not networking (e.g. hypercube).

Overall, based on agreement with authroos, all operational considerations
are removed from the main spec and moved to applicability/operational draft
now.


>
> - My main expectation of the overview was to get a high-level idea of the
>   operation of RIFT, but that is not done there.  Besides a quick mention
> in
>   §4.1.1 and some text in the introduction, the focus of the overview is on
>   fallen leaves/dissaggregation.  I understand that may be a significant
>   issue/feature, but it shouldn't be the dominating topic in the overview.
>   Maybe some of the other pieces are more "well-known" (neighbor discovery,
>   flooding, etc.), but even ZTP (even if optional) is not mentioned.
>
>
> In general, I don't think that the deep-/complex treatment is
> necessary.  You may still decide to keep it (see specific comments
> inline below), but I think it will represent a significant distraction
> for other reviewers.
>

Yes, a section will be introduced describing upfront what the protocol
provides while removing it from the abstract and the reader's guide
will indicate what needs to be read for what purpose so the negative
disaggregation can be omitted by people using single plane only.

However, given the complexity of negative disaggration this section
is sorely needed AFAIS


>
>
> Thanks!
>
> Alvaro.
>
> [1] https://datatracker.ietf.org/doc/ad/alvaro.retana
>
>
> [Line numbers from idnits.]
>
>
> ...
> 17      Abstract
>
> 19         This document defines a specialized, dynamic routing protocol
> for
> 20         Clos and fat-tree network topologies optimized towards
> minimization
> 21         of configuration and operational complexity.  The protocol
>
> [opinion] While this is a nice Abstract, I think that it is too long
> and is not completely reflected in the Introduction.  Personally, I
> consider the first paragraph enough for an Abstract.  It would be very
> nice if the list below was instead moved to the Introduction with
> pointers to where these protocol characteristics are
> specified/explained.
>

done.

* removed any ospf/bgp references
* moved abstract piece detailing what the protocol provides into
introduction



>
>
> 23         o  deals with no configuration, fully automated construction of
> fat-
> 24            tree topologies based on detection of links,
>
> 26         o  minimizes the amount of routing state held at each level,
>
> 28         o  automatically prunes and load balances topology flooding
> exchanges
> 29            over a sufficient subset of links,
>
> 31         o  supports automatic disaggregation of prefixes on link and
> node
> 32            failures to prevent black-holing and suboptimal routing,
>
> 34         o  allows traffic steering and re-routing policies,
>
> 36         o  allows loop-free non-ECMP forwarding,
>
> 38         o  automatically re-balances traffic towards the spines based on
> 39            bandwidth available and finally
>
> 41         o  provides mechanisms to synchronize a limited key-value
> data-store
> 42            that can be used after protocol convergence to e.g.
>  bootstrap
> 43            higher levels of functionality on nodes.
>
>
> ...
> 77      Table of Contents
>
> 79         1.  Authors . . . . . . . . . . . . . . . . . . . . . . . . . .
> .   6
> 80         2.  Introduction  . . . . . . . . . . . . . . . . . . . . . . .
> .   6
> 81           2.1.  Requirements Language . . . . . . . . . . . . . . . . .
> .   8
> 82         3.  Reference Frame . . . . . . . . . . . . . . . . . . . . . .
> .   8
> 83           3.1.  Terminology . . . . . . . . . . . . . . . . . . . . . .
> .   8
> 84           3.2.  Topology  . . . . . . . . . . . . . . . . . . . . . . .
> .  13
> 85         4.  RIFT: Routing in Fat Trees  . . . . . . . . . . . . . . . .
> .  15
> 86           4.1.  Overview  . . . . . . . . . . . . . . . . . . . . . . .
> .  16
> 87             4.1.1.  Properties  . . . . . . . . . . . . . . . . . . . .
> .  16
> 88             4.1.2.  Generalized Topology View . . . . . . . . . . . . .
> .  17
> 89               4.1.2.1.  Terminology . . . . . . . . . . . . . . . . . .
> .  17
> 90               4.1.2.2.  Clos as Crossed Crossbars . . . . . . . . . . .
> .  18
> 91             4.1.3.  Fallen Leaf Problem . . . . . . . . . . . . . . . .
> .  28
> 92             4.1.4.  Discovering Fallen Leaves . . . . . . . . . . . . .
> .  30
> 93             4.1.5.  Addressing the Fallen Leaves Problem  . . . . . . .
> .  31
> 94           4.2.  Specification . . . . . . . . . . . . . . . . . . . . .
> .  32
> 95             4.2.1.  Transport . . . . . . . . . . . . . . . . . . . . .
> .  33
> 96             4.2.2.  Link (Neighbor) Discovery (LIE Exchange)  . . . . .
> .  33
> 97               4.2.2.1.  LIE FSM . . . . . . . . . . . . . . . . . . . .
> .  36
>
> [nit] I don't think we need all this detail in the TOC.  Maybe
> limiting the entries to 2 or 3 levels is enough (e.g. 4.2 or 4.2.2).
>

pruned


>
>
> ...
> 256     1.  Authors
>
> 258        This work is a product of a list of individuals which are all
> to be
> 259        considered major contributors independent of the fact whether
> their
> 260        name made it to the limited boilerplate author's list or not.
>
> [minor] Please move this section to one called "Contributors" and
> place it after the Acknowledgments.  Only the people not on the front
> page should be listed there.
>
> https://tools.ietf.org/html/rfc7322#section-4.11


done


>
>
>
> ...
> 273     2.  Introduction
>
> 275        Clos [CLOS] and Fat-Tree [FATTREE] topologies have gained
> prominence
> 276        in today's networking, primarily as result of the paradigm shift
> 277        towards a centralized data-center based architecture that is
> poised
> 278        to deliver a majority of computation and storage services in the
> 279        future.  Today's current routing protocols were geared towards a
> 280        network with an irregular topology and low degree of
> connectivity
> 281        originally but given they were the only available options,
> 282        consequently several attempts to apply those protocols to Clos
> have
> 283        been made.  Most successfully BGP [RFC4271] [RFC7938] has been
> 284        extended to this purpose, not as much due to its inherent
> suitability
> 285        but rather because the perceived capability to easily modify
> BGP and
> 286        the immanent difficulties with link-state [DIJKSTRA] based
> protocols
> 287        to optimize topology exchange and converge quickly in large
> scale
> 288        densely meshed topologies.  The incumbent protocols precondition
> 289        normally extensive configuration or provisioning during bring
> up and
> 290        re-dimensioning.  This tends to be viable only for a set of
> 291        organizations with according networking operation skills and
> budgets.
> 292        For many IP fabric builders a desirable protocol would be one
> that
> 293        auto-configures itself and deals with failures and
> mis-configurations
> 294        with a minimum of human intervention only.  Such a solution
> would
> 295        allow local IP fabric bandwidth to be consumed in a 'standard
> 296        component' fashion, i.e. provision it much faster and operate
> it at
> 297        much lower costs than today, much like compute or storage is
> consumed
> 298        already.
>
> [nit] s/Fat-Tree/Fat Tree/g    To be consistent with the terminology
> section.
>

done, this was the only instance

capitalized all



>
> ...
> 318        For the visually oriented reader, Figure 1 presents a first
> level
> 319        simplified view of the resulting information and routes on a
> RIFT
> 320        fabric.  The top of the fabric is holding in its link-state
> database
> 321        the nodes below it and the routes to them.  In the second row
> of the
> 322        database table we indicate that partial information of other
> nodes in
> 323        the same level is available as well.  The details of how this is
> 324        achieved will be postponed for the moment.  When we look at the
> 325        "bottom" of the fabric, the leaves, we see that the topology is
> 326        basically empty and they only hold a load balanced default
> route to
> 327        the next level under normal conditions.
>
> [nit] s/holding...the nodes below/holding...information about the nodes
> below
>

done



>
>
> [style nit] Some portions of the text are written in first person ("we
> indicate").  Personally, in this type of documents I prefer to not see
> that treatment ("the table indicates").  This is just a personal
> preference, a nit.  No need to take any action -- unless you really
> want to. ;-)
>

Pascal did work on that, he can comment

>
>
> [minor] "details of how this is achieved will be postponed for the
> moment."  Sure, this is just the Introduction.  A pointer to where the
> details are would be very nice.
>

I added on every mechanism provided some forward refs in introduction.

the reader's guide should be very helpful here.


>
>
> [nit] s/and they only hold a load balanced default route to the next
> level under normal conditions./and, under normal conditions, they only
> hold a load balanced default route to the next level.
>

done


>
>
> 329        The balance of this document details a dedicated IP fabric
> routing
> 330        protocol, fills in the specification details and ultimately
> includes
> 331        resulting security considerations.
>
> [] As I mentioned above, moving the list from the Abstract to the
> Introduction would be beneficial.  Given that this is a long document,
> providing some type of roadmap/reader's guide (based on that list, for
> example) would be great!
>

done

as said, Jordan is doing a readers' guide.


>
>
> ...
> 357     2.1.  Requirements Language
>
> 359        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
> NOT",
> 360        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
> this
> 361        document are to be interpreted as described in RFC 8174
> [RFC8174].
>
> [major] Use the template exactly as written in rfc8174.
>

done


>
>
> 363     3.  Reference Frame
>
> 365     3.1.  Terminology
>
> [minor] Where possible/appropriate, please add forward references to
> the sections where the terms are further specified.
>

Alvaro, the purpose here is a glossary, not an index.

Every term is
defined before use in the document text and if someone keeps on reading and
stumbles they can refer back.
IMO forward references serves no
discernible purpose.
Reading a glossary and then jump to the
section using it and reading document that way to understand  is something
I never saw before.


>
> 367        This section presents the terminology used in this document.
> It is
> 368        assumed that the reader is thoroughly familiar with the terms
> and
> 369        concepts used in OSPF [RFC2328] and IS-IS
> [ISO10589-Second-Edition],
> 370        [ISO10589] as well as the according graph theoretical concepts
> of
> 371        shortest path first (SPF) [DIJKSTRA] computation and DAGs.
>
> [minor] The two references to ISO10589 look the same to me.  Do we need
> both?
>

As you sugged I removed all references to other routing protocols.


>
>
> ...
> 383        Directed Acyclic Graph (DAG):  A finite directed graph with no
> 384           directed cycles (loops).  If links in Clos are considered as
> 385           either being all directed towards the top or vice versa,
> each of
> 386           such two graphs is a DAG.
>
> [nit] s/in Clos/in a Clos
>


done


>
> 388        Folded Spine-and-Leaf:  In case Clos fabric input and output
> stages
> 389           are analogous, the fabric can be "folded" to build a
> "superspine"
> 390           or top which we will call Top of Fabric (ToF) in this
> document.
>
> [nit] s/In case Clos/In case the Clos
>

done


>
>
> [minor] This term is only used in this section.  Does it really need
> to be defined?  I'm wondering if the terminology can be simplified.
>

not that I know of. CLOS looks very different from a folded clos and
without
the concept of folding there is no "top of CLOS", just input and output.
AFAIS
this is minimally needed for someone who never saw those things to
understand
what we talk about.

This is used in further 2-3 glossary terms as well.


>
>
> ...
> 401        Superspine vs. Aggregation and Spine vs. Edge/Leaf:
> 402           Traditional level names in 5-stages folded Clos for Level 2,
> 1 and
> 403           0 respectively.  We normalize this language to talk about
> top-of-
> 404           fabric (ToF), top-of-pod (ToP) and leaves.
>
> [minor] Instead of adding an entry (even if the name uses "vs.", it is
> not a comparison) for these names just to reference the normalized
> names, which are also defined later, just mention these traditional
> names in those entries.
>

updated the vs.

The entry is here based on discussions where people had hard time to map
their glossary simply into the RIFT language.



>
>
> 406        Zero Touch Provisioning (ZTP):  Optional RIFT mechanism which
> allows
> 407           to derive node levels automatically based on minimum
> configuration
> 408           (only ToF property has to be provisioned on according nodes).
>
> [?] I can't parse the text in parenthesis.
>

updated

"

Optional RIFT mechanism which allows to derive node levels automatically
based on minimum configuration.
Such a mininum configuration consists solely of
ToFs being configured as such.
"




>
>
> 410        Point of Delivery (PoD):  A self-contained vertical slice or
> subset
> 411           of a Clos or Fat Tree network containing normally only level
> 0 and
> 412           level 1 nodes.  A node in a PoD communicates with nodes in
> other
> 413           PoDs via the Top-of-Fabric.  We number PoDs to distinguish
> them
> 414           and use PoD #0 to denote "undefined" PoD.
>
> [minor] "level 0 and level 1"   The definition of level doesn't
> mention numbers -- maybe add something there to point at the fact that
> level 0 and a leaf are equivalent (position-wise).
>

yes, good catch. in fact level 0 is always leaf but a leaf does NOT have to
be level 0 in RIFT.

updated level definiton and leaf definition to clarify.



>
>
> ...
> 429        Leaf:  A node without southbound adjacencies.  Its level is 0
> (except
> 430           cases where it is deriving its level via ZTP and is running
> 431           without LEAF_ONLY which will be explained in Section 4.2.7).
>
> [minor] s/(except...Section 4.2.7)./(see Section 4.2.7).
>

done as following in level definition

"

RIFT counts levels from top-of-fabric (ToF) down.

Level 0 always implies a leaf in RIFT but a leaf does not have to be level 0.

Level in RIFT can be configured or automatically derive its level via ZTP
as explained in <xref target="ZTP"/>.
"




>
>
> 433        Top-of-fabric Plane or Partition:  In large fabrics
> top-of-fabric
> 434           switches may not have enough ports to aggregate all switches
> south
> 435           of them and with that, the ToF is 'split' into multiple
> 436           independent planes.  Introduction and Section 4.1.2 explains
> the
> 437           concept in more detail.  A plane is subset of ToF nodes that
> see
> 438           each other through south reflection or E-W links.
>
> [minor] "Introduction..."   I didn't see related text there.
>

removed introduction



>
>
> [nit] s/is subset/is a subset
>
>
> 440        Radix:  A radix of a switch is basically number of switching
> ports it
> 441           provides.  It's sometimes called fanout as well.
>
> [nit] s/A radix of a switch is basically number of switching ports it
> provides./The number of switching ports it provides.
>

done


>
>
> ...
> 460        East-West Link:  A link between two nodes at the same level.
> East-
> 461           West links are normally not part of Clos or "fat-tree"
> topologies.
>
> [minor] s/East-West Link/East-West (E-W) Link
>

done



>
>
> [minor] "...normally not part of Clos or "fat-tree" topologies."   But
> they are used by RIFT in several places.  To avoid confusion maybe the
> last sentence is not needed.
>

well, yes, we originally tried to limit RIFT to not support E-W but bunch
customers & IETF participants were very insistent, hence RIFT supports
E-W albeit strictly speaking clos/fat-tree do not have those.


>
>
> ...
> 476        South Reflection:  Often abbreviated just as "reflection" it
> defines
> 477           a mechanism where South Node TIEs are "reflected" from the
> level
> 478           south back up north to allow nodes in the same level without
> E-W
> 479           links to "see" each other's node TIEs.
>
> [nit] s/"reflection" it/"reflection", it
>

done



>
>
> [minor] Please expand TIE.  I know the definition is in the very next
> paragraph, but it is good practice to expand on first use.
>

done


>
>
> ...
> 490        Node TIE:  This stands as acronym for a "Node Topology
> Information
> 491           Element" that contains all adjacencies the node discovered
> and
> 492           information about node itself.  Node TIE should NOT be
> confused
> 493           with a North TIE since "node" defines the type of TIE rather
> than
> 494           its direction.
>
> [nit] s/This stands as acronym for a "Node Topology Information
> Element" that contains/An acronym for a "Node Topology Information
> Element", which contains
>

done


>
>
> [nit] s/about node/about the node
>

done



>
> [minor] s/NOT/not/g   This is not one of the rfc2119 keywords, so it
> should not be capitalized.  I know you're doing it for emphasis, but
> it will be eventually changed -- so let's take care of it now.
>

replaced whole document non-normative NOT with *not* and AND with *and*


>
>
> ...
> 501        Key Value TIE:  A South TIE that is carrying a set of key value
> pairs
> 502           [DYNAMO].  It can be used to distribute information in the
> 503           southbound direction within the protocol
>


>
> [minor] "Key Value TIE" is not used anywhere else in the document.
> Also, the definition talks about a South TIE carrying (only? -- that's
> what the definition sounds like) key value pairs...but §4.2.3.2
> mentions other information and even North TIEs carrying key value
> pairs.
>

changed

Key Value (KV) TIE:


Good catch on south, removed

and no, I think the spec is clear it's key-value that you carry around,
what the semantics of those key values are is not in base spec but we have
drafts already talking about that.



>
>
> 505        TIDE:  Topology Information Description Element, equivalent to
> CSNP
> 506           in ISIS.
>
> [minor] Please expand CSNP.
>

I removed all references to ISIS and OSPF per your suggestion that routing
protocol specs do normally not refer to other protocols.


>
>
> 508        TIRE:  Topology Information Request Element, equivalent to PSNP
> in
> 509           ISIS.  It can both confirm received and request missing TIEs.
>
> [minor] Please expand PSNP.
>

removed ISIS references


>
> 511        De-aggregation/Disaggregation:  Process in which a node decides
> to
> 512           advertise more specific prefixes Southwards, either
> positively to
> 513           attract the corresponding traffic, or negatively to repel it.
> 514           Disaggregation is performed to prevent black-holing and
> suboptimal
> 515           routing to the more specific prefixes.
>
> [nit] "De-aggregation/Disaggregation"  It would be very nice if you
> settled on one word.  Disaggregation seems to be used the most, but
> dis-aggregation also shows up a couple of times.
>

changed e'thing to disaggregation. thanks


>
>
> ...
> 521        Flood Repeater (FR):  A node can designate one or more
> northbound
> 522           neighbor nodes to be flood repeaters.  The flood repeaters
> are
> 523           responsible for flooding northbound TIEs further north.
> They are
> 524           similar to MPR in OSLR.  The document sometimes calls them
> flood
> 525           leaders as well.
>
> [minor] Please expand both MPR and OLSR.
>

removed references to MPR and OLSR per your observation that the spec
should not refrence other routing protocols.



>
> [minor] Also, please add a reference.  I see that MPR/OLSR are only
> mentioned once more (§4.2.3.9), and wonder if we even need to make
> reference to them.  The first paragraph in §4.2.3.9 already pretty
> much covers the intent of an MPR.
>

removed


>
>
> 527        Bandwidth Adjusted Distance (BAD):  Each RIFT node can
> calculate the
> 528           amount of northbound bandwidth available towards a node
> compared
> 529           to other nodes at the same level and can modify the route
> distance
> 530           accordingly to allow for the lower level to adjust their load
> 531           balancing towards spines.
>
> [minor] A reference to §4.3.6.1 would be very nice.
>

As I wrote, I consider making glossaries an index a low value undertaking.


>
>
> 533        Overloaded:  Applies to a node advertising `overload` attribute
> as
> 534           set.  The semantics closely follow the meaning of the same
> 535           attribute in [ISO10589-Second-Edition].
>
> [nit] s/advertising `overload` attribute/advertising the `overload`
> attribute
>

done


>
>
> [minor] There is no overload attribute in ISO10589, just an Overload
> Bit.  Also, §4.3.1 (please add a reference) calls it the overload bit.
>

removed isis references

clarified in computation section the semantics of the overload bit.
referenced attribute in according section.


>
> ...
> 540        Three-Way Adjacency:  RIFT tries to form a unique adjacency
> over an
> 541           interface and exchange local configuration and necessary ZTP
> 542           information.  An adjacency is only advertised in node TIEs
> and
> 543           used for computations after it achieved three-way state,
> i.e. both
> 544           routers reflected each other in LIEs including relevant
> security
> 545           information.  LIEs before three-way state is reached may
> carry ZTP
> 546           related information already.
>
> [minor] s/tries to form a unique adjacency/forms a unique adjacency
>

I keep it as it is since it's not guaranteed the adjancency will be
formed.


>
> [nit] s/and exchange local/and exchanges local
>

done


>
>
> [nit] s/after it achieved three-way state/after the three-way state is
> achieved
>

done


>
>
> [minor] Note that three-way, threeway and three way are all used in
> different places.  Please be consistent.
>

uniform all as THREE_WAY in whole document since that's only way FSM allows
to write it. all those different versions emerged AFAIR because every
reviewer had a pony about underscores and whatever not.

same for one and two way


>
> ...
> 554        Neighbor:  Once a three-way adjacency has been formed a
> neighborship
> 555           relationship contains the neighbor's properties.  Multiple
> 556           adjacencies can be formed to a remote node via parallel
> interfaces
> 557           but such adjacencies are NOT sharing a neighbor structure.
> Saying
> 558           "neighbor" is thus equivalent to saying "a three-way
> adjacency".
>
> [] How is load balancing achieved through parallel links between the
> same pair of routers?  Just putting this comment here so I don't
> forget it later.
>

up to an implementation. This is fast path specific and does not need
specification.



>
>
> ...
> 566        Shortest-Path First (SPF):  A well-known graph algorithm
> attributed
> 567           to Dijkstra that establishes a tree of shortest paths from a
> 568           source to destinations on the graph.  We use SPF acronym due
> to
> 569           its familiarity as general term for the node reachability
> 570           calculations RIFT can employ to ultimately calculate routes
> of
> 571           which Dijkstra algorithm is one.
>

> 573        North SPF (N-SPF):  A reachability calculation that is
> progressing
> 574           northbound, as example SPF that is using South Node TIEs
> only.
> 575           Normally it progresses a single hop only and installs default
> 576           routes.
>
> 578        South SPF (S-SPF):  A reachability calculation that is
> progressing
> 579           southbound, as example SPF that is using North Node TIEs
> only.
>
> [minor] Please add a reference to where the specific algorithm used by
> RIFT is specified.
>

as stated previously, those are glossaries, not document index.



>
>
> ...
> 585     3.2.  Topology
> 586                    ^ N      +--------+          +--------+
> 587     Level 2        |        |ToF   21|          |ToF   22|
> 588                E <-*-> W    ++-+--+-++          ++-+--+-++
> 589                    |         | |  | |            | |  | |
> 590                  S v      P111/2  P121/2         | |  | |
> 591                              ^ ^  ^ ^            | |  | |
> 592                              | |  | |            | |  | |
> 593               +--------------+ |  +-----------+  | |  |
> +---------------+
> 594               |                |    |         |  | |  |
>   |
> 595              South +-----------------------------+ |  |
>   ^
> 596               |    |           |    |         |    |  |
> All TIEs
> 597               0/0  0/0        0/0   +-----------------------------+
>   |
> 598               v    v           v              |    |  |           |
>   |
> 599               |    |           +-+    +<-0/0----------+           |
>   |
> 600               |    |             |    |       |    |              |
>   |
> 601             +-+----++ optional +-+----++     ++----+-+
> ++-----++
> 602     Level 1 |       | E/W link |       |     |       |           |
>   |
> 603             |Spin111+----------+Spin112|     |Spin121|
> |Spin122|
> 604             +-+---+-+          ++----+-+     +-+---+-+
> ++---+--+
> 605               |   |             |   South      |   |              |   |
> 606               |   +---0/0--->-----+ 0/0        |   +----------------+ |
> 607              0/0                | |  |         |                  | | |
> 608               |   +---<-0/0-----+ |  v         |   +--------------+ | |
> 609               v   |               |  |         |   |                | |
> 610             +-+---+-+          +--+--+-+     +-+---+-+
>  +---+-+-+
> 611     Level 0 |       |  (L2L)   |       |     |       |          |
>   |
> 612             |Leaf111+~~~~~~~~~~+Leaf112|     |Leaf121|
>  |Leaf122|
> 613             +-+-----+          +-+---+-+     +--+--+-+
>  +-+-----+
> 614               +                  +    \        /   +              +
> 615               Prefix111   Prefix112    \      /   Prefix121
>  Prefix122
> 616                                       multi-homed
> 617                                         Prefix
> 618             +---------- PoD 1 ---------+     +---------- PoD 2
> ---------+
>
> 620                   Figure 2: A Three Level Spine-and-Leaf Topology
> 621                         .+--------+  +--------+  +--------+  +--------+
> 622                         .|ToF   A1|  |ToF   B1|  |ToF   B2|  |ToF   A2|
> 623                         .++-+-----+  ++-+-----+  ++-+-----+  ++-+-----+
> 624                         . | |         | |         | |         | |
> 625                         . | |         | |         | +---------------+
> 626                         . | |         | |         |           | |   |
> 627                         . | |         | +-------------------------+ |
> 628                         . | |         |           |           | | | |
> 629                         . | +-----------------------+         | | | |
> 630                         . |           |           | |         | | | |
> 631                         . |           | +---------+ | +---------+ | |
> 632                         . |           | |           | |       |   | |
> 633                         . | +---------------------------------+   | |
> 634                         . | |         | |           | |           | |
> 635                         .++-+-----+  ++-+-----+  +--+-+---+  +----+-+-+
> 636                         .|Spine111|  |Spine112|  |Spine121|  |Spine122|
> 637                         .+-+---+--+  ++----+--+  +-+---+--+  ++---+---+
> 638                         .  |   |      |    |       |   |      |   |
> 639                         .  |   +--------+  |       |   +--------+ |
> 640                         .  |          | |  |       |          | | |
> 641                         .  |   -------+ |  |       |   +------+ | |
> 642                         .  |   |        |  |       |   |        | |
> 643                         .+-+---+-+   +--+--+-+   +-+---+-+  +---+-+-+
> 644                         .|Leaf111|   |Leaf112|   |Leaf121|  |Leaf122|
> 645                         .+-------+   +-------+   +-------+  +-------+
>
> 647                       Figure 3: Topology with Multiple Planes
>
> 649        We will use topology in Figure 2 (called commonly a fat
> tree/network
> 650        in modern IP fabric considerations [VAHDAT08] as homonym to the
> 651        original definition of the term [FATTREE]) in all further
> 652        considerations.  This figure depicts a generic "single plane
> fat-
> 653        tree" and the concepts explained using three levels apply by
> 654        induction to further levels and higher degrees of connectivity.
> 655        Further, this document will deal also with designs that provide
> only
> 656        sparser connectivity and "partitioned spines" as shown in
> Figure 3
> 657        and explained further in Section 4.1.2.
>
> [minor] The first sentence introduces another source to define fat
> tree, which is not mentioned in the Introduction nor in the
> Terminology.  This is not a huge deal, but it would be nice to keep
> consistency throughout.  IOW, include the new reference somewhere in
> the first couple of sections, settle on one, or simply just don't add
> a new reference.
>

moved the paper references into introduction.


>
>
> [minor] For completeness, it would be nice to explain that the figures
> are incomplete: for example, Figure 2 shows only some of the TIEs,
> Figure 3 shows none of them, etc...
>

done


>
>
> [] BTW, SVI graphics are now supported in xmltorfcv3.  Some of the
> figures might be easier to visualize that way than using ASCII art.
>

SVG is being done for the most complex figures and will be available.


>
>
> 659     4.  RIFT: Routing in Fat Trees
>
> 661        We present here a detailed outline of a protocol optimized for
> 662        Routing in Fat Trees (RIFT) that in most abstract terms has many
> 663        properties of a modified link-state protocol
> 664        [RFC2328][ISO10589-Second-Edition] when distributing information
> 665        northbound and distance vector [RFC4271] protocol when
> distributing
> 666        information southbound.  While this is an unusual combination,
> it
> 667        does quite naturally exhibit the desirable properties we seek.
>
> [minor] s/detailed outline/detailed specification
>

done


>
>
> [nit] s/and distance vector/and a distance vector
>

done


>
>
> [] The references to OSPF/ISIS/BGP seem superfluous because those
> documents don't define generic link-state or distance vector protocols
> -- in fact, many would argue that BGP is a path vector protocol.  Just
> my opinion.  I would be very happy if the references are not included.
> I wonder if there are generic references that can be used instead of
> specific ones (for information purposes).
>

all references to bgp/isis/ospf removed.

I am not aware of "generic link-state protocols" except some research
papers that will not contribute to clarity here AFAIS.


>
>
> 669     4.1.  Overview
>
> 671     4.1.1.  Properties
>
> 673        The most singular property of RIFT is that it floods flat
> link-state
> 674        information northbound only so that each level obtains the full
> 675        topology of levels south of it.  Link-State information is,
> with some
> 676        exceptions, never flooded East-West or back South again.
> Exceptions
> 677        like south reflection is explained in detail in Section 4.2.5.1
> and
> 678        east-west flooding at ToF level in multi-plane fabrics is
> outlined in
> 679        Section 4.1.2.  In southbound direction, the protocol operates
> like a
> 680        "fully summarizing, unidirectional" path vector protocol or
> rather a
> 681        distance vector with implicit split horizon.  Routing
> information,
> 682        normally just the default route, propagates one hop south and
> is 're-
> 683        advertised' by nodes at next lower level.  However, RIFT uses
> 684        flooding in the southern direction as well to avoid the
> overhead of
> 685        building an update per adjacency.  We omit describing the
> East-West
> 686        direction for the moment.
>
> [minor] What is "flat link-state information"?  It looks like this is
> the only place where "flat" is used.  Maybe s/flat/
>

typo, done


>
>
> [nit] s/In southbound direction/In the southbound direction
>

fixed in whole doc


>
> [] "...the protocol operates like a "fully summarizing,
> unidirectional" path vector protocol or rather a distance vector with
> implicit split horizon."  I hope that the operation is specified
> elsewhere, and that the document doesn't depend on these descriptions.
> Personal opinion: simple and direct language may serve you better.
>

simplified and removed language. procedures in the spec when implemented
will ensure this behavior.


>
>
> 688        Those information flow constraints create not only an
> anisotropic
> 689        protocol (i.e. the information is not distributed "evenly" or
> 690        "clumped" but summarized along the N-S gradient) but also a
> "smooth"
> 691        information propagation where nodes do not receive the same
> 692        information from multiple directions at the same time.
> Normally,
> 693        accepting the same reachability on any link, without
> understanding
> 694        its topological significance, forces tie-breaking on some kind
> of
> 695        distance metric.  And such tie-breaking leads ultimately in
> hop-by-
> 696        hop forwarding to shortest paths only.  In contrast to that,
> RIFT,
> 697        under normal conditions, does not need to tie-break same
> reachability
> 698        information from multiple directions.  Its computation
> principles
> 699        (south forwarding direction is always preferred) leads to
> valley-free
> 700        forwarding behavior.  And since valley free routing is
> loop-free, it
> 701        can use all feasible paths which is another highly desirable
> property
> 702        if available bandwidth should be utilized to the maximum extent
> 703        possible.
>
> [] "anisotropic"   This is my word of the day.  I learned a new one! :-)
>

pascal's word


>
> [nit] s/tie-break same/tie-break the same
>

done


>
> [minor] "valley-free"  Reference?
>

added


>
>
> 705        To account for the "northern" and the "southern" information
> split
> 706        the link state database is partitioned accordingly into "north
> 707        representation" and "south representation" TIEs.  In simplest
> terms
> 708        the North TIEs contain a link state topology description of
> lower
> 709        levels and and South TIEs carry simply default routes towards
> the
> 710        level above.  This oversimplified view will be refined
> gradually in
> 711        following sections while introducing protocol procedures and
> state
> 712        machines at the same time.
>
> [nit] s/in following/in the following
>

done



>
>
> 714     4.1.2.  Generalized Topology View
>
> 716        This section will shed some light on the topologies RIFT
> addresses,
> 717        including multi plane fabrics and their implications.  Readers
> that
> 718        are only interested in single plane designs, i.e. all
> top-of-fabric
> 719        nodes being topologically equal and initially connected to all
> the
> 720        switches at the level below them, can skip the rest of Section
> 4.1.2
> 721        and resulting Section 4.2.5.2 as well.
>
> [minor] "Readers...can skip the rest of Section 4.1.2 and resulting
> Section 4.2.5.2 as well."  I can see how a reader can skip a part of
> the overview, but §4.2* is where the specification is.  Are you saying
> that §4.2.5.2 doesn't have to be implemented/supported in some cases?
> Are there other sections that are also not needed in some cases?  Does
> this result in the ability to implement subsets of RIFT to support
> specific topologies?  Where is that discussed?
>


yes, in section

5.  Implementation and Operation: Further Details . . . . . . . . 120
  5.1.  Considerations for Leaf-Only Implementation . . . . . . . 120
  5.2.  Considerations for Spine Implementation . . . . . . . . . 121

this will be further clarified by the reader's guide included

in front of the document.



>
>
> ...
> 737     4.1.2.1.  Terminology
> ...
> 746        K: Denotes the number of ports in radix of a switch pointing
> north or
> 747           south.  Further, K_LEAF denotes number of ports pointing
> south,
> 748           i.e. towards leaves, and K_TOP for number of ports pointing
> north
> 749           towards a higher spine level.  To simplify the visual aids,
> 750           notations and further considerations, K will be mostly set to
> 751           Radix/2.
>
> [minor] Radix is defined in §3.1 as the number of ports.  s/Denotes
> the number of ports in radix of a switch/Denotes the radix of a switch
>

rewritten to

:  Denotes half of the radix of a symmetrical switch, meaning that
   the switch has K ports pointing north and K ports pointing south.



>
>
> ...
> 757        N: Denote the number of independent ToF planes in a topology.
>
> [nit] s/Denote/Denotes
>
>
> ack

...
> 766     4.1.2.2.  Clos as Crossed Crossbars
>
> 768        The typical topology for which RIFT is defined is built of P
> number
> 769        of PoDs and connected together by S number of ToF nodes.  A PoD
> node
> 770        has K number of ports (also called Radix).  We consider half of
> them
> 771        (K=Radix/2) as connecting host devices from the south, and the
> other
> 772        half connecting to interleaved PoD Top-Level switches to the
> north.
> 773        Ratio K can be chosen differently without loss of generality
> when
> 774        port speeds differ or the fabric is oversubscribed but K=R/2
> allows
> 775        for more readable representation whereby there are as many ports
> 776        facing north as south on any intermediate node.  We represent a
> node
> 777        hence in a schematic fashion with ports "sticking out" to its
> north
> 778        and south rather than by the usual real-world front faceplate
> designs
> 779        of the day.
>
> [nit] s/Ratio K can be chosen differently/The K ratio can be chosen
> differently
>

done


>
> [minor] "K=R/2"  R is defined in §4.1.2.1 as the redundancy, not the radix.
>

done, definition corrected. good catch


>
> 781        Figure 4 provides a view of a leaf node as seen from the north,
> i.e.
> 782        showing ports that connect northbound.  For lack of a better
> symbol,
> 783        we have chosen to use the "o" as ASCII visualisation of a single
> 784        port.  In this example, K_LEAF has 6 ports.  Observe that the
> number
> 785        of PoDs is not related to Radix unless the ToF Nodes are
> constrained
> 786        to be the same as the PoD nodes in a particular deployment.
>


>
> [minor] "showing ports that connect northbound...K_LEAF has 6 ports"
> The ports that connect north are K_TOP.
>


clarified in glossary and text



>
>
> 788            Top view
> 789             +---+
> 790             |   |
> 791             | o |     e.g., Radix = 12, K_LEAF = 6
> 792             |   |
> 793             | o |
> 794             |   |      -------------------------
> 795             | o ------- Physical Port (Ethernet) ----+
> 796             |   |      -------------------------     |
> 797             | o |                                    |
> 798             |   |                                    |
> 799             | o |                                    |
> 800             |   |                                    |
> 801             | o |                                    |
> 802             |   |                                    |
> 803             +---+                                    |
>
> 805               ||             ||      ||      ||      ||      ||      ||
> 806             +----+
> +------------------------------------------------+
> 807             |    |       |
>    |
> 808             +----+
> +------------------------------------------------+
> 809               ||             ||      ||      ||      ||      ||      ||
> 810                   Side views
>
> 812                           Figure 4: A Leaf Node, K_LEAF=6
>
> 814        The Radix of a PoD's top node may be different than that of the
> leaf
> 815        node.  Though, more often than not, a same type of node is used
> for
> 816        both, effectively forming a square (K*K).  In general case, we
> could
> 817        have switches with K_TOP southern ports on nodes at the top of
> the
> 818        PoD which are not necessarily the same as K_LEAF.  For
> instance, in
> 819        the representations below, we pick a 6 port K_LEAF and a 8 port
> 820        K_TOP.  In order to form a crossbar, we need K_TOP Leaf Nodes as
> 821        illustrated in Figure 5.
>
> [nit] s/In general case/In the general case
>

done



>
>
> [minor] "K_TOP southern ports"  Aren't K_TOP the ports pointing north?
>

definition says that switches are symmetrical, i.e. a ToP switch will have
K_TOP both towards leaves as well as northbound.


>  The description is confusing because the terminology from the last
> section is not used in the same way -- the description mixes the
> terminology with the number represented.  For example, "K_TOP Leaf
> Nodes" doesn't make sense if the terminology is strictly applied,
> where K_TOP is the "number of ports pointing north".  Also (if I
> understood Figure 4 correctly), each node below has 6 K_TOP ports --
> presumably the node at the top has 8 K_LEAF ports.
>

new definition should clarify taht

K:  To simplify the visual aids, notations and further
   considerations, we assume that the switches are symmetrical, i.e.
   equal number ports point northbound and southbound.  With that
   simplification, K denotes half of the radix of a symmetrical
   switch, meaning that the switch has K ports pointing north and K
   ports pointing south.  K_LEAF (K of a leaf) thus represents both
   the number of access ports in a leaf Node and the maximum number
   of planes in the fabric, whereas K_TOP (K of a ToP) represents the
   number of leaves in the PoD and the number of ports pointing north
   in a ToP Node towards a higher spine level, thus the number of ToF
   nodes in a plane.



>
>
> 823                  +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 824                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 825                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 826                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 827                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 828                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 829                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 830                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 831                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 832                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 833                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 834                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 835                  | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o |
> 836                  |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 837                  +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
>
> 839                      Figure 5: Southern View of a PoD, K_TOP=8
>
> 841        As further visualized in Figure 6 the K_TOP Leaf Nodes are fully
> 842        interconnected with the K_LEAF PoD-top nodes, providing
> connectivity
> 843        that can be represented as a crossbar when "looked at" from the
> 844        north.  The result is that, in the absence of a failure, a
> packet
> 845        entering the PoD from the north on any port can be routed to
> any port
> 846        in the south of the PoD and vice versa.  And that is precisely
> why it
> 847        makes sense to talk about a "switching matrix".
>
> [minor] "K_TOP Leaf Nodes are fully interconnected with the K_LEAF
> PoD-top nodes"  Same comment about the terminology...   I only see one
> "PoD top Node" with one connection to a switch, not a full
> interconnect.
>




>
>
> [minor] The figure also doesn't show the connection between the
> switches (if any)...and I'm not sure what the "connectors" (?) on the
> switch at the top/bottom are (there seem to be more of them than
> ports).
>


next figure visualizes the connections


>
> 849                                           E<-*->W
>
> 851               +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 852               |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
> 853             +--------------------------------------------------------+
> 854             |   o      o      o      o      o      o      o      o   |
> 855             +--------------------------------------------------------+
> 856             +--------------------------------------------------------+
> 857             |   o      o      o      o      o      o      o      o   |
> 858             +--------------------------------------------------------+
> 859             +--------------------------------------------------------+
> 860             |   o      o      o      o      o      o      o      o   |
> 861             +--------------------------------------------------------+
> 862             +--------------------------------------------------------+
> 863             |   o      o      o      o      o      o      o      o   |
> 864             +--------------------------------------------------------+
> 865             +--------------------------------------------------------+
> 866             |   o      o      o      o      o      o      o      o
> |<-+
> 867             +--------------------------------------------------------+
>  |
> 868             +--------------------------------------------------------+
>  |
> 869             |   o      o      o      o      o      o      o      o   |
>  |
> 870             +--------------------------------------------------------+
>  |
> 871               |   |  |   |  |   |  |   |  |   |  |   |  |   |  |   |
>  |
> 872               +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
>  |
> 873                          ^
>  |
> 874                          |
>  |
> 875                          |     ----------        ---------------------
>  |
> 876                          +----- Leaf Node        PoD top Node (Spine)
> --+
> 877                                ----------        ---------------------
>
> 879                 Figure 6: Northern View of a PoD's Spines, K_TOP=8
>
> 881        Side views of this PoD is illustrated in Figure 7 and Figure 8.
>
> 883                           Connecting to Spine
>
> 885           ||      ||      ||      ||      ||      ||      ||      ||
> 886
>  +----------------------------------------------------------------+   N
> 887       |                    PoD top Node seen sideways
>  |   ^
> 888
>  +----------------------------------------------------------------+   |
> 889           ||      ||      ||      ||      ||      ||      ||      ||
>     *
> 890         +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
>     |
> 891         |    |  |    |  |    |  |    |  |    |  |    |  |    |  |    |
>     v
> 892         +----+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
>     S
> 893           ||      ||      ||      ||      ||      ||      ||      ||
>
> 895                                Connecting to Client nodes
>
> 897                   Figure 7: Side View of a PoD, K_TOP=8, K_LEAF=6
>
> [minor] I count 8 connections to the south in the top node...and just
> one on the switches below it.
>

well, it's a side view so all the nodes are in the same plane, I put
"Nodes" instead of "Node"



>
>
> 899                        Connecting to Spine
>
> 901               ||      ||      ||      ||      ||      ||
> 902             +----+  +----+  +----+  +----+  +----+  +----+
>  N
> 903             |    |  |    |  |    |  |    |  |    |  |   PoD top Nodes
>   ^
> 904             +----+  +----+  +----+  +----+  +----+  +----+
>  |
> 905               ||      ||      ||      ||      ||      ||
>  *
> 906           +------------------------------------------------+
>  |
> 907           |              Leaf seen sideways                |
>  v
> 908           +------------------------------------------------+
>  S
> 909               ||      ||      ||      ||      ||      ||
>
> 911                        Connecting to Client nodes
>
> [minor] A leaf doesn't have southbound ports/adjacencies.  What is
> this leaf connected to?
>


leaf is connected to nothing southbound, correct. I removed the bottom row


AFAIS Pascal's representation of CLOS topologies as crossbars of crossbars
allows to abstract cabling completely and talk much more meaningfully about
CLOS and necessary algorithms than the usual isometric piles-of-wires
pictures. Baffling first? yeah, all new, better representations of familiar
things are ;-)


>
>
> 913         Figure 8: Other Side View of a PoD, K_TOP=8, K_LEAF=6, 90o
> turn in
> 914                                      E-W Plane
>
> [minor] In this case I count a leaf with 6 northbound interfaces.
>

because you look @ 8 leafs from the side all covering each other and along
PoD switches with K_TOP=8


>
>
> [minor] "90o turn in E-W Plane"  I don't know what that is.
>

there is a compass in the picture, E-W is obviously perpendicular to that
axis, picture is turned 90 as compared to the previous figure (I added the
previous figure part)


>
>
> 916        As next step, let us observe that a resulting PoD can be
> abstracted
> 917        as a bigger node with a number K of K_POD= K_TOP * K_LEAF, and
> the
> 918        design can recurse.
>
> [minor] K is already defined as the number of ports (§4.1.2.1).
>


K=radix/2


>
>
> [minor] Lost again.  If the PoD is abstracted as a single node, then
> it would have K_TOP + K_LEAF nodes, not sure where the "*" comes from
> or what is trying to denote.
>

A POD has K_TOP leafs with K_LEAF ports each so seen from the top it's a
rectangle with K_POD= K_TOP * K_LEAF and that can be abstracted as single
crossbar with K_POD ports again in a wider context.

Text is pretty clear albeit it needs careful reading, maybe more than once.


>
>
> 920        It will be critical at this point that, before progressing
> further,
> 921        the concept and the picture of "crossed crossbars" is clear.
> Else,
> 922        the following considerations might be difficult to comprehend.
>
> [] The concept is clear to me -- I don't find the explanation and the
> corresponding pictures specially helpful.
>

acknowledged but in that case I'm somewhat surprised you stumbled over the
natural K_TOP * K_LEAF conclusion. AFAIS text is pretty clear and was to
most readers. I do not see much sense in rewriting it over again, the
content will stay the same, it needs reading, maybe more than once and
parses clear.

If you don't like the abstractions, IME there is no other to explain
negative disaggregatyion especially easily.



>
>
> ...
> 929        This topology is also referred to as a single plane
> configuration and
> 930        is quite popular due to its simplicity.  In order to reach a 1:1
> 931        connectivity ratio between the ToF and the leaves, it results
> that
> 932        there are K_TOP ToF nodes, because each port of a ToP node
> connects
> 933        to a different ToF node, and K_LEAF ToP nodes for the same
> reason.
> 934        Consequently, it will take (P * K_LEAF) ports on a ToF node to
> 935        connect to each of the K_LEAF ToP nodes of the P PoDs, as shown
> in
> 936        Figure 9.
>
> [minor] "there are K_TOP ToF nodes...and K_LEAF ToP nodes"  As with
> other places, the terminology is not used as defined earlier.  K_*
> refer to the number of ports in a specific switch, so their use is
> relative to that switch.  In this case each ToP has K_TOP links, and
> each ToF has K_LEAF links.  The use without the reference point is
> confusing.
>


I let Pascal comment here, new definition of K includes.

K_LEAF (K of a leaf) thus represents both the number of access ports
in a leaf Node
and the maximum number of planes in the fabric, whereas K_TOP (K of a
ToP) represents the
number of leaves in the PoD and the number of ports pointing north in
a ToP Node towards
a higher spine level, thus the number of ToF nodes in a plane.




>
>
> [minor] "(P * K_LEAF)"  This calculation is clear once one realizes
> that the previous discussion was for the number of ports per PoD, not
> total (as the definition of K_* suggests).
>

Pascal again


>
>
>
> 938             [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] <-----+
> 939              |   |   |   |   |   |   |   |        |
> 940           [=================================]     |     -----------
> 941              |   |   |   |   |   |   |   |        +----- Top-of-Fabric
> 942             [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]       +----- Node
>  -------+
> 943                                                   |     -----------
>     |
> 944                                                   |
>     v
> 945             +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ <-----+
>    +-+
> 946             | | | | | | | | | | | | | | | |
>    | |
> 947           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 948           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
> -------------------------  | |
> 949           [ |o| |o| |o| |o| |o| |o| |o| |o<--- Physical Port
> (Ethernet)  | |
> 950           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
> -------------------------  | |
> 951           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 952           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 953             | | | | | | | | | | | | | | | |
>    | |
> 954           [ |o| |o| |o| |o| |o| |o| |o| |o| ]
>    | |
> 955           [ |o| |o| |o| |o| |o| |o| |o| |o| ]      --------------
>    | |
> 956           [ |o| |o| |o| |o| |o| |o| |o| |o| ] <---  PoD top level
>    | |
> 957           [ |o| |o| |o| |o| |o| |o| |o| |o| ]       node (Spine)  ---+
>   | |
> 958           [ |o| |o| |o| |o| |o| |o| |o| |o| ]      --------------    |
>   | |
> 959           [ |o| |o| |o| |o| |o| |o| |o| |o| ]                        |
>   | |
> 960             | | | | | | | | | | | | | | | |  -+           +-   +-+   v
>   | |
> 961           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |           |  --| |--[
> ]--| |
> 962           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |   -----   |  --| |--[
> ]--| |
> 963           [ |o| |o| |o| |o| |o| |o| |o| |o| ] +--- PoD ---+  --| |--[
> ]--| |
> 964           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |   -----   |  --| |--[
> ]--| |
> 965           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |           |  --| |--[
> ]--| |
> 966           [ |o| |o| |o| |o| |o| |o| |o| |o| ] |           |  --| |--[
> ]--| |
> 967             | | | | | | | | | | | | | | | |  -+           +-   +-+
>   | |
> 968             +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+
>    +-+
>
> 970           Figure 9: Fabric Spines and TOFs in Single Plane Design, 3
> PoDs
>
> [minor] I believe you when you say that this figure shows how "it will
> take (P * K_LEAF) ports on a ToF node to connect to each of the K_LEAF
> ToP nodes of the P PoDs", but the drawing is not straight forward to
> interpret.  Among other reasons because there seem to be 3 different
> connection types/interpretations/? to the ToF -- a different one for
> each PoD.
>
>
> 972        The top view can be collapsed into a third dimension where the
> hidden
> 973        depth index is representing the PoD number.  We can then show
> one PoD
> 974        as a class of PoDs and hence save one dimension in our
> 975        representation.  The Spine Node expands in the depth and the
> vertical
> 976        dimensions, whereas the PoD top level Nodes are constrained, in
> 977        horizontal dimension.  A port in the 2-D representation
> represents
> 978        effectively the class of all the ports at the same position in
> all
> 979        the PoDs that are projected in its position along the depth
> axis.
> 980        This is shown in Figure 10.
>
> [] Do we really need this extra representation?
>

Pascal to answer



>
>
> ...
> 1003       As simple as single plane deployment is it introduces a limit
> due to
> 1004       the bound on the available radix of the ToF nodes that has to
> be at
> 1005       least P * K_LEAF.  Nevertheless, we will see that a distinct
> 1006       advantage of a connected or non-partitioned Top-of-Fabric is
> that all
> 1007       failures can be resolved by simple, non-transitive, positive
> 1008       disaggregation (i.e. nodes advertising more specific prefixes
> with
> 1009       the default to the level below them that is however not
> propagated
> 1010       further down the fabric) as described in Section 4.2.5.1 . In
> other
> 1011       words; non-partitioned ToF nodes can always reach nodes below or
> 1012       withdraw the routes from PoDs they cannot reach unambiguously.
> And
> 1013       with this, positive disaggregation can heal all failures and
> still
> 1014       allow all the ToF nodes to see each other via south reflection.
> 1015       Disaggregation will be explained in further detail in Section
> 4.2.5.
>
> [nit] s/deployment is it introduces/deployment is, it introduces
>

done


>
>
> 1017       In order to scale beyond the "single plane limit", the
> Top-of-Fabric
> 1018       can be partitioned by a N number of identically wired planes
> where N
> 1019       is an integer divider of K_LEAF.  The 1:1 ratio and the desired
> 1020       symmetry are still served, this time with (K_TOP * N) ToF
> nodes, each
> 1021       of (P * K_LEAF / N) ports.  N=1 represents a non-partitioned
> Spine
> 1022       and N=K_LEAF is a maximally partitioned Spine.  Further, if R
> is any
> 1023       integer divisor of K_LEAF, then N=K_LEAF/R is a feasible number
> of
> 1024       planes and R a redundancy factor.  If proves convenient for
> 1025       deployments to use a radix for the leaf nodes that is a power
> of 2 so
> 1026       they can pick a number of planes that is a lower power of 2.
> The
> 1027       example in Figure 11 splits the Spine in 2 planes with a
> redundancy
> 1028       factor R=3, meaning that there are 3 non-intersecting paths
> between
> 1029       any leaf node and any ToF node.  A ToF node must have, in this
> case,
> 1030       at least 3*P ports, and be directly connected to 3 of the 6
> PoD-ToP
> 1031       nodes (spines) in each PoD.
>
> [nit] s/by a N number/by an N number
>

done


>
>
> [minor] "(K_TOP * N) ToF nodes, each of (P * K_LEAF / N) ports"
> Again, the use of the terminology without a reference assumes a
> specific interpretation by the reader.
>

Pascal


>
>
> [minor] "if R is any integer divisor of K_LEAF, then N=K_LEAF/R is a
> feasible number of planes and R a redundancy factor."  Please expand
> on the meaning of the redundancy factor.
>
>
> [minor] "6 PoD-ToP nodes"  I count 8.
>

the 8 are ToF


>
>
> 1033            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 1034          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1035          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1036          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1037          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1038          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1039          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1040          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1041          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1042          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1043            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
>
> 1045          Plane 1
> 1046         ----------- . ------------ . ------------ . ------------ .
> --------
> 1047          Plane 2
>
> 1049            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 1050          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1051          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1052          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1053          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1054          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1055          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1056          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1057          | | o |  | o |  | o |  | o |  | o |  | o |  | o |  | o | |
> 1058          +-|   |--|   |--|   |--|   |--|   |--|   |--|   |--|   |-+
> 1059            +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
> 1060                     ^
> 1061                     |
> 1062                     |      ----------------
> 1063                     +----- Top-of-Fabric node
> 1064                            "across" depth
> 1065                            ----------------
>
> 1067        Figure 11: Northern View of a Multi-Plane ToF Level, K_LEAF=6,
> N=2
>
> 1069       At the extreme end of the spectrum it is even possible to fully
> 1070       partition the spine with N = K_LEAF and R=1, while maintaining
> 1071       connectivity between each leaf node and each Top-of-Fabric
> node.  In
> 1072       that case the ToF node connects to a single Port per PoD, so it
> 1073       appears as a single port in the projected view represented in
> 1074       Figure 12.  The number of ports required on the Spine Node is
> more or
> 1075       equal to P, the number of PoDs.
>
> [minor] "more or equal to P"  ??
>

corrected to "more than or equal" as in >=

done


>
>
> ...
> 1121    4.1.3.  Fallen Leaf Problem
> ...
> 1140       In a maximally partitioned fabric, the redundancy factor is R=
> 1, so
> 1141       any breakage in the fabric may cause one or more fallen leaves.
> 1142       However, not all cases require disaggregation.  The following
> cases
> 1143       do not require particular action in such scenario:
>
> [major] A quick look at §4.2.5.1 doesn't explicitly mention how a node
> considers the redundancy factor...but that may be included in the "DAG
> computation" mentioned in the first step.  I'm putting this comment
> here so I don't forget later...
>

a node does not need to consider redundancy, the algorithms compute
once over full graph, once only over the plane & everything missing in own
plane is negatively disaggregated. Computation algorithms are given in the
according section.



>
>
> 1145          If a southern link on a node goes down, then connectivity
> through
> 1146          that node is lost for all nodes south of it.  There is no
> need to
> 1147          disaggregate since the connectivity to this node is lost for
> all
> 1148          spine nodes in a same fashion.
>
> 1150          If a ToF Node goes down, then northern traffic towards it is
> 1151          routed via alternate ToF nodes in the same plane and there
> is no
> 1152          need to disaggregate routes.
> ...
> 1159          If the breakage is the last northern link from a ToP node to
> a ToF
> 1160          node going down, then the fallen leaf problem affects only
> The ToF
> 1161          node, and the connectivity to all the nodes in the PoD is
> lost
> 1162          from that ToF node.  This can be observed by other ToF nodes
> 1163          within the plane where the ToP node is located and positively
> 1164          disaggregated within that plane.
>
> [nit] s/only The ToF/only the ToF
>

done



>
>
> 1166       On the other hand, there is a need to disaggregate the routes to
> 1167       Fallen Leaves in a transitive fashion, all the way to the other
> 1168       leaves in the following cases:
>
> [] Without having seen the specific mechanism, this overview is hard to
> digest.
>

Next section explains the mechanism but it's a chicken-egg problem and it's
better to show the problem rather than show the solution first without
explaining the problem AFAIS.


>
>
> 1170       o  If the breakage is the last northern link from a leaf node
> within
> 1171          a plane (there is only one such link in a maximally
> partitioned
> 1172          fabric) that goes down, then connectivity to all unicast
> prefixes
> 1173          attached to the leaf node is lost within the plane where the
> link
> 1174          is located.  Southern Reflection by a leaf node, e.g.,
> between ToP
> 1175          nodes, if the PoD has only 2 levels, happens in between
> planes,
> 1176          allowing the ToP nodes to detect the problem within the PoD
> where
> 1177          it occurs and positively disaggregate.  The breakage can be
> 1178          observed by the ToF nodes in the same plane through the North
> 1179          flooding of TIEs from the ToP nodes.  The ToF nodes however
> need
> 1180          to be aware of all the affected prefixes for the negative,
> 1181          possibly transitive disaggregation to be fully effective
> (i.e.  a
> 1182          node advertising in control plane that it cannot reach a
> certain
> 1183          more specific prefix than default whereas such
> disaggregation must
> 1184          in extreme condition propagate further down southbound).  The
> 1185          problem can also be observed by the ToF nodes in the other
> planes
> 1186          through the flooding of North TIEs from the affected leaf
> nodes,
> 1187          together with non-node North TIEs which indicate the affected
> 1188          prefixes.  To be effective in that case, the positive
> 1189          disaggregation must reach down to the nodes that make the
> plane
> 1190          selection, which are typically the ingress leaf nodes.  The
> 1191          information is not useful for routing in the intermediate
> levels.
>
> [nit] s/in control plane/in the control plane
>

done


>
>
> [nit] s/in extreme condition/in the extreme condition
>
>
> 1193       o  If the breakage is a ToP node in a maximally partitioned
> fabric -
> 1194          in which case it is the only ToP node serving the plane in
> that
> 1195          PoD - goes down, then the connectivity to all the nodes in
> the PoD
> 1196          is lost within the plane where the ToP node is located.
> 1197          Consequently, all leaves of the PoD fall in this plane.
> Since the
> 1198          Southern Reflection between the ToF nodes happens only
> within a
> 1199          plane, ToF nodes in other planes cannot discover fallen
> leaves in
> 1200          a different plane.  They also cannot determine beyond their
> local
> 1201          plane whether a leaf node that was initially reachable has
> become
> 1202          unreachable.  As the breakage can be observed by the ToF
> nodes in
> 1203          the plane where the breakage happened, the ToF nodes in the
> plane
> 1204          need to be aware of all the affected prefixes for the
> negative
> 1205          disaggregation to be fully effective.  The problem can also
> be
> 1206          observed by the ToF nodes in the other planes through the
> flooding
> 1207          of North TIEs from the affected leaf nodes, if there are
> only 3
> 1208          levels and the ToP nodes are directly connected to the leaf
> nodes,
> 1209          and then again it can only be effective it is propagated
> 1210          transitively to the leaf, and useless above that level.
>
> [nit] s/fabric -...- goes down,/fabric -...-,
>

replaced with parenthesis


>
>
> 1212       For the sake of easy comprehension let us roll the abstractions
> back
> 1213       into a simple example and observe that in Figure 3 the loss of
> link
> 1214       Spine 122 to Leaf 122 will make Leaf 122 a fallen leaf for
> Top-of-
> 1215       Fabric plane B.  Worse, if the cabling was never present in
> first
> 1216       place, plane B will not even be able to know that such a fallen
> leaf
> 1217       exists.  Hence partitioning without further treatment results
> in two
> 1218       grave problems:
>
> [] "For the sake of easy comprehension...Figure 3..."  Finally!
> Hmmm...sorry...I mean, it is a little ironic that after all the new
> terminology, detailed descriptions and figures, the clearer
> explanation uses the simplest drawing.
>
>
> [nit] s/in first place/in the first place
>

done


>
>
> 1220       o  Leaf 111 trying to route to Leaf 122 MUST choose Spine 111 in
> 1221          plane A as its next hop since plane B will inevitably
> blackhole
> 1222          the packet when forwarding using default routes or do
> excessive
> 1223          bow tying.  This information must be in its routing table.
>
> [major] s/MUST/must   This is not a Normative statement, just a
> statement of fact (inside an example).
>

yes, done


>
>
> 1225       o  Any kind of "flooding" or distance vector trying to deal
> with the
> 1226          problem by distributing host routes will be able to converge
> only
> 1227          using paths through leaves.  The flooding of information on
> Leaf
> 1228          122 would have to go up to Top-of-Fabric A and then
> "loopback"
> 1229          over other leaves to ToF B leading in extreme cases to
> traffic for
> 1230          Leaf 122 when presented to plane B taking an "inverted
> fabric"
> 1231          path where leaves start to serve as TOFs, at least for the
> 1232          duration of a protocol's convergence.
>
> [] "Any kind of "flooding" or distance vector..."  I can guess the
> meaning, but it would be better that I don't have to.   Maybe
> something like: "Any advertisement..."
>

rewritten


>
>
> [minor] "information on Leaf 122"  s/on/ about (?), or maybe from. ??
>

rewritten


>
>
> 1234    4.1.4.  Discovering Fallen Leaves
>
> 1236       As illustrated later, and without further proof, the way to
> deal with
> 1237       fallen leaves in multi-plane designs, when aggregation is used,
> is
> 1238       that RIFT requires all the ToF nodes to share the same north
> topology
> 1239       database.  This happens naturally in single plane design by the
> means
> 1240       of northbound flooding and south reflection but needs additional
> 1241       considerations in multi-plane fabrics.  To satisfy this RIFT, in
> 1242       multi-plane designs, relies at the ToF level on ring
> interconnection
> 1243       of switches in multiple planes.  Other solutions are possible
> but
> 1244       they either need more cabling or end up having much longer
> flooding
> 1245       paths and/or single points of failure.
>
> [minor] "As illustrated later..."  Where?
>

rewritten and figures refered


>
>
> [] "and without further proof"  I hope this is at least specified at
> that later point.
>


rewritten and figures refered



>
>
> [nit] s/To satisfy this RIFT, in multi-plane designs, relies/To
> satisfy this need in multi-plane designs, RIFT relies
>
>
> 1247       In detail, by reserving two ports on each Top-of-Fabric node it
> is
> 1248       possible to connect them together by interplane bi-directional
> rings
> 1249       as illustrated in Figure 13.  The rings will be used to
> exchange full
> 1250       north topology information between planes.  All ToFs having same
> 1251       north topology allows by the means of transitive, negative
> 1252       disaggregation described in Section 4.2.5.2 to efficiently fix
> any
> 1253       possible fallen leaf scenario.  Somewhat as a side-effect, the
> 1254       exchange of information fulfills the ask to present full view
> of the
> 1255       fabric topology at the Top-of-Fabric level, without the need to
> 1256       collate it from multiple points by additional complexity of
> 1257       technologies like [RFC7752].
>
> [nit] s/fulfills the ask to present full view/fulfills the requirement
> to have a full view
>

rewriten


>
>
> [] "..., without the need to collate it from multiple points by
> additional complexity of technologies like [RFC7752]."  This last
> phrase is unnecessary: because carrying RIFT information in BGP-LS is
> not defined, and more importantly, there's no need to criticize other
> technology to make RIFT look better.
>

it was not meant as critique, all references to other protocols removed.


>
>
> 1259               +---+  +---+  +---+  +---+  +---+  +---+  +--------+
> 1260               |   |  |   |  |   |  |   |  |   |  |   |  |        |
> 1261               |      |      |      |      |      |      |        |
> 1262             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1263           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1264           | | o |  | o |  | o |  | o |  | o |  | o |  | o | |    |
> Plane A
> 1265           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1266             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1267              |      |      |      |      |      |      |         |
> 1268             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1269           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1270           | | o |  | o |  | o |  | o |  | o |  | o |  | o | |    |
> Plane B
> 1271           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1272             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1273               |      |      |      |      |      |      |        |
> 1274                                   ...                            |
> 1275               |      |      |      |      |      |      |        |
> 1276             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1277           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1278           | | o |  | o |  | o |  | o |  | o |  | o |  | o | |    |
> Plane X
> 1279           +-|   |--|   |--|   |--|   |--|   |--|   |--|   |-+    |
> 1280             +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+  +-o-+      |
> 1281               |      |      |      |      |      |      |        |
> 1282               |   |  |   |  |   |  |   |  |   |  |   |  |        |
> 1283               +---+  +---+  +---+  +---+  +---+  +---+  +--------+
> 1284        Rings    1      2      3      4      5      6      7
>
> 1286         Figure 13: Connecting Top-of-Fabric Nodes Across Planes by
> Rings
>
> [minor] Is that one ring per plane, multiple rings per plane or a big
> ring for all the planes?  The drawing is not clear to me. :-(
>

new figure provided with extensive ASCII art ingenuity ...


>
>
> 1288    4.1.5.  Addressing the Fallen Leaves Problem
>
> 1290       One consequence of the "Fallen Leaf" problem is that some
> prefixes
> 1291       attached to the fallen leaf become unreachable from some of the
> ToF
> 1292       nodes.  RIFT proposes two methods to address this issue, the
> positive
> 1293       and the negative disaggregation.  Both methods flood South TIEs
> to
> 1294       advertise the impacted prefix(es).
>
> [nit] s/RIFT proposes two methods/RIFT defines two methods
>

done


>
>
> [End of Review - Part 1]
>
> _______________________________________________
> RIFT mailing list
> RIFT@ietf.org
> https://www.ietf.org/mailman/listinfo/rift
>