Re: AD review of draft-ietf-rtgwg-bgp-pic-18

Ahmed Bashandy <abashandy.ietf@gmail.com> Mon, 31 October 2022 15:35 UTC

Return-Path: <abashandy.ietf@gmail.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A8856C152578; Mon, 31 Oct 2022 08:35:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.139
X-Spam-Level:
X-Spam-Status: No, score=0.139 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=1.242, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R1uyVItEPiSJ; Mon, 31 Oct 2022 08:35:50 -0700 (PDT)
Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 37C9BC1524D9; Mon, 31 Oct 2022 08:35:50 -0700 (PDT)
Received: by mail-pf1-x434.google.com with SMTP id f140so11015784pfa.1; Mon, 31 Oct 2022 08:35:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KQnL0qKadcE1fVqZmIZvGMfQfrPw6cbZSVZKsTtIRzo=; b=hUrLlS9kfXsz+pMedda6flwFgt4tNzQlZDTQS0D25v2w5+oKPm1x6g6aXtzlkBOzLM NHECLU4w51hiq1Gb/p5vqXmWIpJYngcIdv3w2v0gxisEOZEdCquOAFa8WBsvgUTovteL QIVH4jgh4ejafI1cbKiqZ8sq32OPCLafzTU7rVfrByWIRXT7tRhhDbmEtPhM8+Wt/ggU OUmlbbHwViYY/x+DI6d4W+ffB2G5HLX+EQXFE+p9YhiwnZus5iuyw2T4KE4IyzNqgRz7 fPdfx2YtqeV/CcYrvZL/tKfZw1trMhkzx0U2z1Z6DVbWSWRjX0VEOI5pze32guZ/Fnss N0lg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KQnL0qKadcE1fVqZmIZvGMfQfrPw6cbZSVZKsTtIRzo=; b=KHeGKY46tM9s4ay119HDAONbKxT6EM5A3eW6z1BAfoyOtgUKJbdYW8XF0CdGDgoI0V im46j8b0A9dmfRla3iIqzqj5G/dpSfFnXCX5sbE5vBwMPZRRCkGNWPXZqYwa2WQfXh4A BOfU/qoDzZ2Hvpe3FM05css6/AMnWjvJg8ngKy63dz5GfeQ1ccpe3yFyFaHXXesfy4f2 kZlZZdnEgpKmC4MJeqoqYc7MJJyv+yvP6yP+w+u3kTa1qCsnoZ+ozhBQcVcOwjBrWFu1 iAMXv49ovSntoY0IGNdl0PyK6yqy252gWjjKAqPITcxLIcTchweLfyLn5lAIrdoHjejB a0EA==
X-Gm-Message-State: ACrzQf3OqWUoFVqpNrPE1ZKnLZ4duIwHaOYtFiZROARynTpwfpyyz1/Z 18b2Mt5nKY0hta0goObemofwuqDUOBYRU0tVngo=
X-Google-Smtp-Source: AMsMyM5vrAdyk7Qv5QenCyRv1Dz4c+lG5oi/uiClYKWQXHHmJYGwPctMG8iItrBwScbmVfZk1cNJ1y6wFzDHBDeiTHs=
X-Received: by 2002:a63:4d0c:0:b0:46f:932:450a with SMTP id a12-20020a634d0c000000b0046f0932450amr12981605pgb.204.1667230549052; Mon, 31 Oct 2022 08:35:49 -0700 (PDT)
MIME-Version: 1.0
References: <CAMMESsw1bdYc-p7TD7GtHNHa6zrO3Ms20VZX89Hza_JtWpK=Bw@mail.gmail.com> <CAMMESszTT8bRMt1UMS+OvsqePhZDJMHtOJ9sfcUs8zu19rtjsQ@mail.gmail.com>
In-Reply-To: <CAMMESszTT8bRMt1UMS+OvsqePhZDJMHtOJ9sfcUs8zu19rtjsQ@mail.gmail.com>
From: Ahmed Bashandy <abashandy.ietf@gmail.com>
Date: Mon, 31 Oct 2022 08:35:36 -0700
Message-ID: <CAAgvS4pkKBnezbbzGd4uiRezJFEGOWDBShd26v2vR-18tM3a8g@mail.gmail.com>
Subject: Re: AD review of draft-ietf-rtgwg-bgp-pic-18
To: Alvaro Retana <aretana.ietf@gmail.com>
Cc: draft-ietf-rtgwg-bgp-pic@ietf.org, Yingzhen Qu <yingzhen.ietf@gmail.com>, RTGWG <rtgwg@ietf.org>, rtgwg-chairs <rtgwg-chairs@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000a291f605ec565c5a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/jnU3efuusiSZkBwEEpuQlD9Iagc>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Oct 2022 15:35:56 -0000

Sorry for the late reply. I am wrote busy
I will try to address comments in the next 1-2 weeks

Ahmed

On Thu, Sep 8, 2022, 8:46 AM Alvaro Retana <aretana.ietf@gmail.com> wrote:

> Ping!
>
> On August 2, 2022 at 4:23:53 PM, Alvaro Retana (aretana.ietf@gmail.com)
> wrote:
>
>
> Dear authors:
>
> Thank you for documenting this important feature.
>
> After reading the document, I think that it still needs work:
>
> (1) The terminology used is not aligned with rfc4271.  Of major importance
> is
>     the description of the Routing Table, the Decision Process, the use of
> best
>     routes (not paths!), etc.  I pointed out multiple occurrences below,
> buy I
>     need you to check the whole docuent for consistency.
>
> (2) The choice to describe the mechanism using a complex scenario (VPNs,
> etc.)
>     has resulted in the text being more complex than it needs to be.
>
>     It is probably too late for this, but a better approach (especially
> because
>     this is an Informational document) would have been to describe the
>     functionality using a simple IP-only network.  A more elaborate
> example
>     could have then been included in an Appendix.
>
> (3) The text (mainly in §5) related to hardware limitations is interesting
> but
>     also highly speculative.  Given that it is not part of the main
> mechanism,
>     I would suggest putting this information in an appendix.
>
>
> I put detailed comments below.
>
>
> To the Chairs: I couldn't find a request to the idr WG for review.  Did I
> miss it?  Given the subject, we need to give idr the opportunity to
> comment.
>
>
> Thanks!
>
> Alvaro.
>
>
> [Line numbers from idnits.]
>
> ...
> 12 Abstract
>
> 14 In a network comprising thousands of BGP peers exchanging millions of
> 15 routes, many routes are reachable via more than one next-hop. Given
> 16 the large scaling targets, it is desirable to restore traffic after
> 17 failure in a time period that does not depend on the number of BGP
> 18 prefixes. In this document we proposed an architecture by which
> 19 traffic can be re-routed to ECMP or pre-calculated backup paths in a
> 20 timeframe that does not depend on the number of BGP prefixes. The
> 21 objective is achieved through organizing the forwarding data
> 22 structures in a hierarchical manner and sharing forwarding elements
> 23 among the maximum possible number of routes. The proposed technique
> 24 achieves prefix independent convergence while ensuring incremental
> 25 deployment, complete automation, and zero management and provisioning
> 26 effort. It is noteworthy to mention that the benefits of BGP Prefix
> 27 Independent Convergence (BGP-PIC) are hinged on the existence of more
> 28 than one path whether as ECMP or primary-backup.
>
> [style nit] "we proposed"
>
> Please don't write using the first person ("we propose"), use the third
> person instead ("this document proposes").
>
>
> [minor] s/we proposed an architecture/describes an architecture
>
> Other places also say that the document "proposes" -- at this point in the
> process we're past proposals. :-)
>
>
> [minor] Please expand ECMP in the Abstract, and later on first use.
>
>
> [] "The proposed technique achieves prefix independent convergence while
> ensuring incremental deployment, complete automation, and zero management
> and provisioning effort."
>
> Great marketing!  This is a technical document, please avoid selling the
> solution.
>
>
> [minor] Consider breaking the Abstract into two paragraphs.
>
>
>
> ...
> 109 1. Introduction
>
> 111   BGP speakers exchange reachability information about
> 112   prefixes[1][2] and, for labeled address families, namely AFI/SAFI
> 113   1/4, 2/4, 1/128, and 2/128, an edge router assigns local labels to
> 114   prefixes and associates the local label with each advertised
> 115   prefix using technologies such as L3VPN [9], 6PE [10], and
> 116   Softwire [8] using BGP label unicast (BGP-LU) technique[3]. A BGP
> 117   speaker then applies the path selection steps to choose the best
> 118   path. In modern networks, it is not uncommon to have a prefix
> 119   reachable via multiple edge routers. In addition to proprietary
> 120   techniques, multiple techniques have been proposed to allow for
> 121   BGP to advertise more than one path for a given prefix
> 122   [7][12][13], whether in the form of equal cost multipath or
> 123   primary-backup. Another common and widely deployed scenario is
> 124   L3VPN with multi-homed VPN sites with unique Route Distinguisher.
> 125   It is advantageous to utilize the commonality among paths used by
> 126   NLRIs[1] to significantly improve convergence in case of topology
> 127   modifications.
>
> [] I still haven't read the rest of the document...  PIC is a valuable and
> simple technique.  Just from the first sentence (and peeking ahead at some
> of the figures and examples), it seems to me that the description could
> have been significantly simpler. :-(
>
>
> [major] rfc7322 requires that RFCs be referenced as [RFCXXXX], please
> update the citations.
>
>
> [minor nit] Multiple citations don't have spaces around the brackets
> ("[]").
>
>
> [minor] The reference to rfc4271 is the general reference for BGP, no need
> to include a reference to rfc4760 with it.
>
>
> [] "for labeled address families...L3VPN with multi-homed VPN sites with
> unique Route Distinguisher"
>
> The detail about labeled AFs/L3VPN seems unnecessary given that PIC
> doesn't just apply to them.
>
>
> [major] "BGP label unicast (BGP-LU) technique[3]"
>
> Note that rfc8277 doesn't use the terms BGP-LU, "label unicast", or
> "labeled unicast".  I know that BGP-LU is commonly used, but it is not
> defined in any IETF document (that I could find).  Please define it in the
> terminology section.  Also, don't refer to rfc8277 every time you use the
> term.
>
>
> [major] "A BGP speaker then applies the path selection steps to choose the
> best path."
>
> rfc4271 doesn't talk about "path selection" or finding a "best path".  It
> specifies a Decision Process that results in best routes.  Please use
> consistent terminology.
>
>
> [minor] "In addition to proprietary techniques..."
>
> These proprietary techniques are not described anywhere.  Please don't
> mention them -- it will only raise unnecessary questions.
>
>
> [nit] "techniques have been proposed...[7][12][13]"
>
> Add-path and Diverse paths are not just "proposed".
>
>
> [minor] "It is advantageous to utilize the commonality among paths used by
> NLRIs[1] to significantly improve convergence in case of topology
> modifications."
>
> You may have to be more explicit about which commonality -- mention it
> explicitly.
>
>
>
> 129   This document proposes a hierarchical and shared forwarding chain
> 130   organization that allows traffic to be restored to pre-calculated
> 131   alternative equal cost primary path or backup path in a time
> 132   period that does not depend on the number of BGP prefixes. The
> 133   technique relies on internal router behavior that is completely
> 134   transparent to the operator and can be incrementally deployed and
> 135   enabled with zero operator intervention. In other words, once it
> 136   is implemented and deployed on a router, nothing is required from
> 137   the operator to make it work. It is noteworthy to mention that
> 138   this document describes FIB architecture that can be implemented
> 139   in both hardware and/or software.
>
> [nit] s/to pre-calculated/to a pre-calculated
>
>
> [nit] s/describes FIB/describes a FIB
>
>
> [minor] Expand FIB on first mention.
>
>
>
> 141 1.1. Terminology
>
> [major] Please don't deviate (at least) from terminology used in standards
> track specifications.  Please don't redefine terms to mean something
> different in this document.
>
> If using terminology defined elsewhere, please just reference those
> documents, and don't redefine the terms here.
>
>
>
> 143   This section defines the terms used in this document. For ease of
> 144   use, we will use terms similar to those used by L3VPN [9].
>
> [major] Similar?  If you're using the terminology from rfc4364, then
> please just use it.  Redefining terms doesn't make anything easier.
>
> To be clear: remove any terms that are already defined in rfc4364 and make
> the reference Normative.
>
>
>
> 146   o  BGP prefix: A prefix P/m (of any AFI/SAFI) that a BGP speaker
> 147      has a path for.
>
> [major] rfc4271 uses the concept of an "IP prefix" carried in a "route"
> and described by a "path" (from §1.1):
>
>    Route
>       A unit of information that pairs a set of destinations with the
>       attributes of a path to those destinations.  The set of
>       destinations are systems whose IP addresses are contained in one
>       IP address prefix carried in the Network Layer Reachability
>       Information (NLRI) field of an UPDATE message.  The path is the
>       information reported in the path attributes field of the same
>       UPDATE message.
>
>
>
> 149   o  IGP prefix: A prefix P/m (of any AFI/SAFI) that is learnt via
> 150      an Interior Gateway Protocol (IGP), such as OSPF and ISIS. The
> 151      prefix may be learnt directly through the IGP or redistributed
> 152      from other protocol(s).
>
> [major] rfc4271 uses the term "IGP route".
>
>
> [minor] The "P/m" notation is not explained -- and is not used outside of
> this section.
>
>
> [major] "is learnt via an...IGP...may be learnt directly through the IGP
> or redistributed from other protocol(s)"
>
> There's a conflict in the definition: does it come directly from the IGP
> or from something else?  Might might those "other protocol(s)" be?
>
>
>
> 154   o  CE[7]: An external router through which an egress PE can reach
> 155      a prefix P/m.
>
> [minor] Please expand CE.
>
>
> [major] The reference to draft-ietf-idr-best-external seems incorrect;
> that draft doesn't use CE at all.  This seems to be a term that comes from
> RFC4364.
>
>
> [minor] Please expand PE in first use.
>
>
>
> 157   o  Egress PE[7], "ePE": A BGP speaker that learns about a prefix
> 158      through an eBGP peer and chooses that eBGP peer as the next-hop
> 159      for that prefix.
>
> [major] The reference to draft-ietf-idr-best-external seems incorrect;
> that draft doesn't use PE at all.  This seems to be a term that comes from
> RFC4364.
>
>
> [minor] s/eBGP/EBGP/g
> That is how rfc4271 uses it.
>
>
> [minor] Expand EBGP on first use.
>
>
>
> 161   o  Ingress PE, "iPE": A BGP speaker that learns about a prefix
> 162      through a iBGP peer and chooses an egress PE as the next-hop for
> 163      the prefix.
>
> [minor] s/iBGP/IBGP/g
> That is how rfc4271 uses it.
>
>
> [minor] Expand IBGP on first use.
>
>
>
> 165   o  Path: The next-hop in a sequence of nodes starting from the
> 166      current node and ending with the destination node or network
> 167      identified by the prefix. The nodes may not be directly
> 168      connected.
>
> [major] See the definition above from rfc4271.
>
>
> [major] Some of the definitions refer to BGP, but others to the FIB
> characteristics....differentiate.  "FIB Path" vs "BGP Path", for example.
> Note that rfc4271 uses the term "Routing Table" as follows (from §3.2:
> Routing Information Base):
>
>    Routing information that the BGP speaker uses to forward packets (or
>    to construct the forwarding table used for packet forwarding) is
>    maintained in the Routing Table.  The Routing Table accumulates
>    routes to directly connected networks, static routes, routes learned
>    from the IGP protocols, and routes learned from BGP.  Whether a
>    specific BGP route should be installed in the Routing Table, and
>    whether a BGP route should override a route to the same destination
>    installed by another source, is a local policy decision, and is not
>    specified in this document.  In addition to actual packet forwarding,
>    the Routing Table is used for resolution of the next-hop addresses
>    specified in BGP updates (see Section 5.1.3).
>
>
>
> 170   o  Recursive path: A path consisting only of the IP address of the
> 171      next-hop without the outgoing interface. Subsequent lookups are
> 172      necessary to determine the outgoing interface and a directly
> 173      connected next-hop.
>
> [major] rfc4271 already used "recursive route".
>
>
>
> 175   o  Non-recursive path: A path consisting of the IP address of a
> 176      directly connected next-hop and outgoing interface.
>
> [] See above.
>
>
>
> ...
> 181   o  Primary path: A recursive or non-recursive path that can be
> 182      used all the time as long as a walk starting from this path can
> 183      end to an adjacency. A prefix can have more than one primary
> 184      path.
>
> [minor] "can be used all the time"
>
> I'm not sure what you mean here.
>
>
> [minor] The concept of a "walk" hasn't been introduced -- please put a
> pointer to where that is discussed.
>
>
>
> ...
> 189   o  Leaf: A container data structure for a prefix or local label.
> 190      Alternatively, it is the data structure that contains prefix
> 191      specific information.
>
> [] So a leaf can be many things: "prefix or local label...[or] prefix
> specific information".  Hmmm....
>
>
>
> ...
> 198   o  Pathlist: An array of paths used by one or more prefixes to
> 199      forward traffic to destination(s) covered by an IP prefix. Each
> 200      path in the pathlist carries its "path-index" that identifies its
> 201      position in the array of paths. In general, the value of the
> 202      "path-index" stored in path may not necessarily have the same
> 203      value of the location of the path in the pathlist. For example
> 204      the 3rd path may carry path-index value of 1. A pathlist may
> 205      contain a mix of primary and backup paths.
>
> [nit] s/its "path-index"/a "path-index"
>
>
>
> 207   o  OutLabel-List: Each labeled prefix is associated with an
> 208      OutLabel-List. The OutLabel-List is an array of one or more
> 209      outgoing labels and/or label actions where each label or label
> 210      action has 1-to-1 correspondence to a path in the pathlist.
> 211      Label actions[6] are: push the label, pop the label, swap the
> 212      incoming label with the label in the OutLabel-List entry, or
> 213      don't push anything at all in case of "unlabeled". The prefix
> 214      may be an IGP or BGP prefix.
>
> [major] If the label label actions are defined in rfc3031, please just
> point to it -- is the "swap the incoming label with the label in the
> OutLabel-List entry" defined there too?
>
> It seems to me that the action defined here is part of a general action
> taken by the FIB, and not specific only to labels (as in a different
> encapsulation, for example).
>
>
>
> 216   o  Forwarding chain: It is a compound data structure consisting of
> 217      multiple connected block that a forwarding engine walks one
> 218      block at a time to forward the packet out of an interface.
> 219      Section 2.2 explains an example of a forwarding chain.
> 220      Subsequent sections provide additional examples
>
> [nit] s/multiple connected block/multiple connected blocks
>
>
> [nit] s/additional examples/additional examples.
>
>
> [minor] Note that both "forwarding chain" and "forwarding Chain" are used
> throughout the text.  Please be consistent.
>
>
>
> ...
> 229   o  Route: A prefix with one or more paths associated with it.  The
> 230      minimum set of objects needed to construct a route is a leaf
> 231      and a pathlist.
>
> [major] See the comments above about rfc4271 and the FIB in general.
>
>
>
> 233 2. Overview
>
> 235   The idea of BGP-PIC is based on two pillars
>
> [minor] The descriptions below are very thick (hard to read through)
> without more context.  Suggestion: keep the description simple in this
> section and point to where there is more detail.
>
>
>
> 236   o  A shared hierarchical forwarding chain: It is not uncommon to see
> 237      multiple destinations reachable via the same list of next-hops.
> 238      Instead of having a separate list of next-hops for each
> 239      destination, all destinations sharing the same list of next-hops
> 240      can point to a single copy of this list thereby allowing fast
> 241      convergence by making changes to a single shared list of next-
> 242      hops rather than possibly a large number of destinations. Because
> 243      paths in a pathlist may be recursive, a hierarchy is formed
> 244      between pathlist and the resolving prefix whereby the pathlist
> 245      depends on the resolving prefix.
>
> 247   o  A forwarding plane that supports multiple levels of indirection:
> 248      A forwarding chain that starts with a destination and ends with
> 249      an outgoing interface is not a simple flat structure. Instead a
> 250      forwarding entry is constructed via multiple levels of
> 251      dependency. A BGP NLRI uses a recursive next-hop, which in turn
> 252      resolves via an IGP next-hop, which in turn resolves via an
> 253      adjacency consisting of one or more outgoing interface(s) and
> 254      next-hop(s).
>
> [nit] s/Instead a/Instead, a
>
>
> [minor] Are "multiple levels of indirection" and "multiple levels of
> dependency" the same thing?  The latter is used to describe the former.
>
>
> [] Next-hop is also a term that needs to be differentiated.  In this
> paragraph is it used in the context of a BGP route, an IGP route, and a
> local address on the FIB.  See rfc4271.
>
>
>
> 256   Designing a forwarding plane that constructs multi-level forwarding
> 257   chains with maximal sharing of forwarding objects allows rerouting a
> 258   large number of destinations by modifying a small number of objects
> 259   thereby achieving convergence in a time frame that does not depend
> 260   on the number of destinations. For example, if the IGP prefix that
> 261   resolves a recursive next-hop is updated there is no need to update
> 262   the possibly large number of BGP NLRIs that use this recursive next-
> 263   hop.
>
> [] The example is orders of magnitue simpler that the actual explanation.
> BTW, none of the words used ("multi-level forwarding chains with maximal
> sharing of forwarding objects") are mentioned in the description of the
> pillars.
>
>
>
> 265 2.1. Dependency
>
> 267   This section describes the required functionalities in the
> 268   forwarding and control planes to support BGP-PIC described in this
> 269   document.
>
> [nit] s/BGP-PIC described/BGP-PIC as described
>
>
>
> 271 2.1.1. Hierarchical Hardware FIB (Forwarding Information Base)
>
> [major] The Introduction says that "this document describes FIB
> architecture that can be implemented in both hardware and/or software", but
> this section requires a hardware implementation.  ??
>
>
>
> 273   BGP-PIC requires a hierarchical hardware FIB support: for each BGP
> 274   forwarded packet, a BGP leaf is looked up, then a BGP Pathlist is
> 275   consulted, then an IGP Pathlist, then an Adjacency.
>
> [major] "BGP forwarded packet"
>
> I think you mean something along the lines of a packet destined to an
> external destination, or to a destination learned through BGP, or ...  ??
>
>
> [major] The definitions of both a leaf and a pathlist don't assume
> protocol origin.  IOW, the reader has to make significant assumptions to
> understand that you mean "a leaf/pathlist that resulted from
> BGP/IGP-learned information" (or something like that).  If the
> identification of the origin protocol is required, please also say so.
>
>
> [major] "Pathlist" and "Adjancency" are capitalized, but "leaf" isn't.
> Personally, I like capitalizing terms with specific meaning.  Not all
> instances (throughout the document) are treated the same.  Please be
> consistent throughout.
>
>
> [major] The general process above: lookup the destination (BGP leaf),
> which points to possible alternatives (BGP pathlist), which recursively
> resolve...  assumes knowledge of how a FIB is generally built and the
> operations that forwarding a packet requires.  This may not be common
> knowledge to every reader.  Please either explain the assumptions or
> reference a place where the explanation exists already.
>
>
>
> 277   An alternative method consists in "flattening" the dependencies when
> 278   programming the BGP destinations into HW FIB resulting in
> 279   potentially eliminating both the BGP Path-List and IGP Path-List
> 280   consultation. Such an approach decreases the number of memory
> 281   lookups per forwarding operation at the expense of HW FIB memory
> 282   increase (flattening means less sharing thereby less duplication),
> 283   loss of ECMP properties (flattening means less pathlist entropy) and
> 284   loss of BGP-PIC properties.
>
> [minor] HW is mentioned here again...and the concept of programming, which
> goes back to the general knowledge assumptions that you're making.
>
>
> [minor] s/Path-List/Pathlist/g
>
>
> [major] Am I to assume that this "alternative method" shouldn't be
> considered/used, especially if it results in loss of the "BGP-PIC
> properties"??  This section started by mentioning a requirement -- if this
> other approach is not what is required the why even mention it?
>
>
>
> 286 2.1.2. Availability of more than one BGP next-hops
>
> 288   When the primary BGP next-hop fails, BGP-PIC depends on the
> 289   availability of one or more pre-computed and pre-installed secondary
> 290   BGP next-hop(s) in the BGP Pathlist.
>
> [major] "primary BGP next-hop"
>
> At this point in the document you've hinted at the possibility of having
> multiple BGP routes (with potentially different NEXT_HOPs), but the concept
> of "primary BGP next-hop" hasn't been explained.  I'm assuming that by this
> you mean the BGP best route [rfc4271].  (Again, you're making assumptions
> of the knowledge of the reader.)
>
> I can see that this section talks about "secondary next-hops" later.
> Perhaps change the order of the text so that the conditions precede the
> explanation of the actions.
>
>
> [major] "BGP next-hop fails"
>
> rfc4271 talks about the the address in the NEXT_HOP attribute becoming not
> resolvable.  Please use established terminology!
>
>
> [major] "pre-computed and pre-installed...in the BGP Pathlist"
>
> There's an example of "pre-computed" later on, but I couldn't find an
> explanation of "pre-installed...in the BGP Pathlist".
>
>
>
> 292   The existence of a secondary next-hop is clearly required for the
> 293   following reason: a service caring for network availability will
> 294   require two disjoint network connections resulting in two BGP next-
> 295   hops.
>
> [minor] You mean "clearly required" for BGP-PIC, right?  Not for the
> service.
>
>
>
> 297   The BGP distribution of secondary next-hops is available thanks to
> 298   the following BGP mechanisms: Add-Path [12], BGP Best-External [7],
> 299   diverse path [13], and the frequent use in VPN deployments of
> 300   different VPN RD's per PE. Another option to learn multiple BGP
> 301   NH/path is simply to receive IBGP paths from multiple BGP RR
> 302   selection a different path as best. It is noteworthy to mention that
> 303   the availability of another BGP path does not mean that all failure
> 304   scenarios can be covered by simply forwarding traffic to the
> 305   available secondary path. The discussion of how to cover various
> 306   failure scenarios is beyond the scope of this document.
>
> [minor] "BGP NH/path" is a new term.  Even if using "NH" seems obvious,
> please mention it at least once.  Also, you've been talking (incorrectly)
> about BGP paths (instead of routes) or BGP next-hops and now you're
> combining the two.  This is the only instance.
>
>
> [nit] s/is simply to receive/is to receive
>
>
> [nit] s/multiple BGP RR selection/multiple BGP RRs selecting
>
>
> [minor] You might want to reference rfc9107 as an example of "multiple BGP
> RRs selecting a different path as best".
>
>
>
> 308 2.2. BGP-PIC Illustration
>
> 310   To illustrate the two pillars above as well as the platform
> 311   dependency, we will use an example of a simple multihomed L3VPN [9]
> 312   prefix in a BGP-free core running LDP [4] or segment routing over
> 313   MPLS forwarding plane [5].
>
> [nit] s/a simple multihomed/a multihomed
>
> What is "simple" to you may not be simple to others.  Just state the
> facts.
>
>
> [minor] s/L3VPN [9]/L3VPN
> The citation is not needed all the time.
>
>
> [minor] "core running LDP [4] or segment routing over MPLS forwarding
> plane [5]"
>
> This is an example.  To simplify, pick one.
>
>
>
> 315    +--------------------------------+
> 316    |                                |
> 317    |                               ePE2 (IGP-IP1 192.0.2.1, Loopback)
> 318    |                                |  \
> 319    |                                |   \
> 320    |                                |    \
> 321   iPE                               |    CE....VRF "Blue", ASnum 65000
> 322    |                                |    /   (VPN-IP1 198.51.100.0/24)
>
> 323    |                                |   /    (VPN-IP2 203.0.113.0/24)
> 324    |   LDP/Segment-Routing Core     |  /
> 325    |                               ePE1 (IGP-IP2 192.0.2.2, Loopback)
> 326    |                                |
> 327    +--------------------------------+
> 328             Figure 1 VPN prefix reachable via multiple PEs
>
> [nit] s/ASnum/ASN
>
> Expand on first use.
>
>
>
> 330   Referring to Figure 1, suppose the iPE (the ingress PE) receives
> 331   NLRIs for the VPN prefixes VPN-IP1 and VPN-IP2 from two egress PEs,
> 332   ePE1 and ePE2 with next-hop BGP-NH1 and BGP-NH2, respectively.
> 333   Assume that ePE1 advertise the VPN labels VPN-L11 and VPN-L12 while
> 334   ePE2 advertise the VPN labels VPN-L21 and VPN-L22 for VPN-IP1 and
> 335   VPN-IP2, respectively. Suppose that BGP-NH1 and BGP-NH2 are resolved
> 336   via the IGP prefixes IGP-IP1 and IGP-IP2, where each happen to have
> 337   2 equal cost paths with IGP-NH1 and IGP-NH2 reachable via the
> 338   interfaces I1 and I2, respectively. Suppose that local labels
> 339   (whether LDP [4] or segment routing [5]) on the downstream LSRs[4]
> 340   for IGP-IP1 are IGP-L11 and IGP-L12 while for IGP-IP2 are IGP-L21
> 341   and IGP-L22. As such, the routing table at iPE is as follows:
>
> [minor] "receives NLRIs"
>
> I other places you're talked about paths or routes...please be consistent!
>
>
> [minor] "IGP-NH1 and IGP-NH2 reachable via the interfaces I1 and I2"
>
> I'm sure you're talking about interfaces on the iPE, but others may find
> that the picture is lacking detail when compared to the description: the
> core is just a box..
>
>
>
> 343         65000: 198.51.100.0/24
> 344            via ePE1 (192.0.2.1), VPN Label: VPN-L11
> 345            via ePE2 (192.0.2.2), VPN Label: VPN-L21
>
> [minor] You need to explain the notation and, for example, why some
> entries have an ASN and others don't.
>
>
> [minor] It might help if the description included the IP addresses.  For
> example, when mentioning BGP-NH1, indicate that it refers to 192.0.2.1.
>
>
>
> ...
> 362   IP Leaf:    Pathlist:    IP Leaf:           Pathlist:
> 363   --------  +-------+     --------          +----------+
> 364   VPN-IP1-->|BGP-NH1|-->IGP-IP1(BGP NH1)--->|IGP-NH1,I1|--->Adjacency1
> 365     |       |BGP-NH2|-->....      |         |IGP-NH2,I2|--->Adjacency2
> 366     |       +-------+             |         +----------+
> 367     |                             |
> 368     |                             |
> 369     v                             v
> 370   OutLabel-List:                OutLabel-List:
> 371   +----------------------+      +----------------------+
> 372   |VPN-L11 (VPN-IP1, NH1)|      |IGP-L11 (IGP-IP1, NH1)|
> 373   |VPN-L21 (VPN-IP1, NH2)|      |IGP-L12 (IGP-IP1, NH2)|
> 374   +----------------------+      +----------------------+
>
> 376           Figure 2 Shared Hierarchical Forwarding Chain at iPE
>
> 378   The forwarding chain depicted in Figure 2 illustrates the first
> 379   pillar, which is sharing and hierarchy. We can see that the BGP
> 380   pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs
> 381   reachable via ePE1 and ePE2. As such, it is possible to make changes
> 382   to the pathlist without having to make changes to the NLRIs. For
> 383   example, if BGP-NH2 becomes unreachable, there is no need to modify
> 384   any of the possibly large number of NLRIs. Instead only the shared
> 385   pathlist needs to be modified. Likewise, due to the hierarchical
> 386   structure of the forwarding chain, it is possible to make
> 387   modifications to the IGP routes without having to make any changes
> 388   to the BGP NLRIs. For example, if the interface "I2" goes down, only
> 389   the shared IGP pathlist needs to be updated, but none of the IGP
> 390   prefixes sharing the IGP pathlist nor the BGP NLRIs using the IGP
> 391   prefixes for resolution need to be modified.
>
> [minor] "We can see that the BGP pathlist consisting of BGP-NH1 and
> BGP-NH2 is shared by all NLRIs reachable via ePE1 and ePE2."
>
> I only see VPN-IP1 using that pathlist.
>
> I know it is not easy to draw using ASCII, but you may want to provide a
> conceptual figure showing how the leafs and Pathlists relate to each other
> and then describe the contents.  OR  You can also integrate an SVI drawing
> in the XML.
>
>
> [minor] The representation of the Pathlists is not consistent: the first
> one shows just BGP-NHs, while the second one has an extra element.  It may
> be obvious to you that the interface element points at the corresponding
> Adjacencies, but please explain that too.
>
>
> [major] The description is not complete -- it jumps from using one
> Pathlist to the other, but it doesn't cover the recursive nature of the
> resolution: resolving the BGP-NHs in the Pathlist through another leaf.
>
>
> [major] The outlabel-list is not explained anywhere.
>
>
>
> 393   Figure 2 can also be used to illustrate the second BGP-PIC pillar.
> 394   Having a deep forwarding chain such as the one illustrated in Figure
> 395   2 requires a forwarding plane that is capable of accessing multiple
> 396   levels of indirection in order to calculate the outgoing
> 397   interface(s) and next-hops(s). While a deeper forwarding chain
> 398   minimizes the re-convergence time on topology change, there will
> 399   always exist platforms with limited capabilities and hence imposing
> 400   a limit on the depth of the forwarding chain. Section 5 describes
> 401   how to gracefully trade off convergence speed with the number of
> 402   hierarchical levels to support platforms with different
> 403   capabilities.
>
> [minor] "Figure 2 can also be used to illustrate the second BGP-PIC
> pillar."
>
> Ok.  But the rest of the paragraph doesn't explain how that is needed in
> the example.
>
>
>
> 405   Another example using IPv6 addresses can be something like the
> 406   following
>
> 408         65000: 2001:DB8:1::/48
> 409            via ePE1 (192::1), VPN Label: VPN6-L11
> 410            via ePE2 (192::2), VPN Label: VPN6-L21
>
> 412         65000: 2001:DB8:2:/48
> 413            via ePE1 (192::1), VPN Label: VPN6-L12
> 414            via ePE2 (192::2), VPN Label: VPN6-L22
>
> 416         192::1/128
> 417            via Core, Label:  IGP6-L11
> 418            via Core, Label:  IGP6-L12
>
> 420         192::2/128
> 421            via Core, Label:  IGP6-L21
> 422            via Core, Label:  IGP6-L22
>
> 424   The same hierarchical forwarding chain described can be constructed
> 425   for IPv6 addresses/prefixes.
>
> [major] Please use the addresses from rfc3849.
>
>
> [] BTW, sorry to be blunt, but this is a lazy attempt at using IPv6 in an
> example.  You should at least show a figure and maybe a different
> scenario...not just similar numbers.
>
>
>
> 427 3. Constructing the Shared Hierarchical Forwarding Chain
>
> 429   Constructing the forwarding chain is an application of the two
> 430   pillars described in Section 2. This section describes how to
> 431   construct the forwarding chain in hierarchical shared manner.
>
> [nit] s/in hierarchical/in a hierarchical
>
>
>
> 433 3.1. Constructing the BGP-PIC forwarding Chain
>
> 435   The whole process starts when BGP downloads a prefix to FIB. The
> 436   prefix contains one or more outgoing paths. For certain labeled
> 437   prefixes, such as VPN [9] prefixes, each path may be associated with
> 438   an outgoing label and the prefix itself may be assigned a local
> 439   label. The list of outgoing paths defines a pathlist. If such
> 440   pathlist does not already exist, then FIB manager (software or
> 441   hardware entity responsible for managing the FIB) creates a new
> 442   pathlist, otherwise the existing pathlist is used. The BGP prefix is
> 443   added as a dependent of the pathlist.
>
> [major] "BGP downloads a prefix to FIB"
>
> rfc4271 talks about "routes...installed in the Routing Table".  Please, as
> much as possible, use the terminology from established RFCS.  Note that
> this document doesn't define what a FIB is.
>
>
> [major] "The prefix contains one or more outgoing paths."
>
> This may be another terminology issue: a BGP route doesn't contain
> multiple next-hops.  If the routes came from ADD_PATH or ORR, etc., there
> will be multiple BGP routes with a common NLRI and different next-hops.
> IOW, it looks like the Pathlist would be built incrementally as routes for
> the same NLRI are installed.
>
> The end result should be the same.  Terminology matters!
>
>
> [minor] "VPN [9]"
>
> The other references to rfc4364 use "L3VPN", should this one use it too?
>
>
> [major] "If such pathlist does not already exist..."
>
> It should be mentioned somewhere the reason for this Pathlist to already
> exist: there's probably another BGP route using it.  This is where the
> connection to the shared nature of the chain needs to be made.  Otherwise,
> the description sounds as if there might simply be a Pathlist structure
> available...
>
>
> [nit] s/FIB manager/the FIB manager/g
>
>
> [] It might be easier/better/clearer to explain the process as a series of
> steps.  Note that the description above is really a series of steps: BGP
> has a route to install, the route is accepted in the routing table (local
> policy), a leaf for this NLRI exists (or not), the Pathlist exists (or
> not), etc...
>
>
>
> 445   The previous step constructs the upper part of the hierarchical
> 446   forwarding chain. The forwarding chain is completed by resolving the
> 447   paths of the pathlist. A BGP path usually consists of a next-hop.
> 448   The next-hop is resolved by finding a matching IGP prefix.
>
> [major] "The forwarding chain is completed by resolving the paths of the
> pathlist."
>
> The example in the last section shows that this "lower part" (is that the
> right term?) could also include a Pathlist.  The explanation of how the
> "upper part" is constructed didn't mention that the process was to
> "resolve" the BGP NLRI.  As a result, there's a disconnect between what was
> done before and what is assumed here.
>
> Again, a series of steps would serve the purpose better.
>
> Also, be mindful of the use of "resolve" in rfc4271.
>
>
> [major] "A BGP path usually consists of a next-hop."
>
> As opposed to what?  I mean the NEXT_HOP attribute is mandatory, a route
> includes multiple other Path Attributes...  Not sure what the point it.
>
>
> [minor] "The next-hop is resolved by finding a matching IGP prefix."
>
> Yes, as an extreme simplification.  §9.1.2.1/rfc4271 offers a better
> description.
>
>
>
> 450   The end result is a hierarchical shared forwarding chain where the
> 451   BGP pathlist is shared by all BGP prefixes that use the same list of
> 452   paths and the IGP prefix is shared by all pathlists that have a path
> 453   resolving via that IGP prefix. It is noteworthy to mention that the
> 454   forwarding chain is constructed without any operator intervention at
> 455   all.
>
> [minor] "The end result is a hierarchical shared forwarding chain where
> the BGP pathlist is shared..."
>
> As I mentioned above, more emphasis should be places on (when explaining
> the steps) the shared nature of the Pathlists and how its dependents are
> independent of each other too.
>
>
> [minor] "without any operator intervention"
>
> Yes, this was mentioned as an advantage somewhere.  It would be better if
> the document was explicit from the start on the point that it is describing
> a set of abstract structures and procedures to be internally instantiated.
> Just like protocol structures (neighbors, routes, etc.) that are
> transparent to the user.
>
>
>
> ...
> 460 3.2. Example: Primary-Backup Path Scenario
>
> 462   Consider the egress PE ePE1 in the case of the multi-homed VPN
> 463   prefixes in the BGP-free core depicted in Figure 1. Suppose ePE1
> 464   determines that the primary path is the external path but the backup
> 465   path is the iBGP path to the other PE ePE2 with next-hop BGP-NH2.
> 466   ePE2 constructs the forwarding chain depicted in Figure 3. We are
> 467   only showing a single VPN prefix for simplicity. But all prefixes
> 468   that are multihomed to ePE1 and ePE2 share the BGP pathlist.
>
> [minor] "BGP-free core depicted in Figure 1"
>
> It doesn't really make a difference, but it wasn't mentioned before that
> Figure 1 has a BGP-free core.  I'm mentioning this because there are
> inconsistencies in the description.  Also, this is a detail that is not
> important to the description, but that can create confusion for other
> readers and generate unnecessary questions and comments.
>
>
> [minor] "Suppose ePE1 determines..."
>
> It would be great if there was either a pointer to how ePE1 does all this
> (where are backups specified?), or a little bit more explanation of the
> assumptions.
>
>
> [nit] s/external path but the backup/external path, while the backup
>
>
>
> 470                    BGP OutLabel-List
> 471     VPN-L11            +---------+
> 472   (Label-leaf)---+---->|Unlabeled|
> 473                  |     +---------+
> 474                  v     | VPN-L21 |
> 475                  |     | (swap)  |
> 476                  |     +---------+
> 477                  |
> 478                  |                    BGP Pathlist
> 479                  |                   +------------+    Connected route
> 480                  |                   |   CE-NH    |------>(to the CE)
> 481                  |                   |path-index=0|
> 482                  |                   +------------+
> 483                  |                   |  VPN-NH2   |
> 484     VPN-IP1 -----+------------------>|  (backup)  |------>IGP Leaf
> 485   (IP leaf)                          |path-index=1|    (Towards ePE2)
> 486        |                             +------------+
> 487        |
> 488        |           BGP OutLabel-List
> 489        |              +---------+
> 490        +------------->|Unlabeled|
> 491                       +---------+
> 492                       | VPN-L21 |
> 493                       | (push)  |
> 494                       +---------+
>
> 496   Figure 3 : VPN Prefix Forwarding Chain with eiBGP paths on egress PE
>
> [minor] Figure 2 provided an easy-to-follow dependency order: Leaf to
> Pathlist  to Leaf to Pathlist.  The dependencies in Figure 3 are not
> straightforward, please provide a description of the figure before
> explaining the differences with respect to the other example.
>
>
> [major] The definition of Pathlist (§1.1) implies that all its entries can
> be used for forwarding, but the Pathlist above (and this use case) clearly
> indicate that not all the entries in a Pathlist are used in the same way,
> or at the same time.
>
>
>
> 498   The example depicted in Figure 3 differs from the example in Figure
> 499   2 in two main aspects. First, as long as the primary path towards
> 500   the CE (external path) is useable, it will be the only path used for
> 501   forwarding while the OutLabel-List contains both the unlabeled label
> 502   (primary path) and the VPN label (backup path) advertised by the
> 503   backup path ePE2. The second aspect is presence of the label leaf
> 504   corresponding to the VPN prefix. This label leaf is used to match
> 505   VPN traffic arriving from the core. Note that the label leaf shares
> 506   the pathlist with the IP prefix.
>
> [major] "the...path...is useable"
>
> What does "useable" mean?  Please describe this term in the terminology
> section.
>
>
> [major] "primary path towards the CE (external path) is useable, it will
> be the only path used"
>
> The arrow from VPN-IP1 points to the VPN-NH2 entry in the Pathlist, so the
> description doesn't seem to match.
>
>
> [major] "the OutLabel-List"
>
> There are two boxes named "BGP OutLabel-List", and they contain different
> backup actions (swap vs push).
>
>
> [minor] "unlabeled label"
>
> rfc3031 talks about unlabeled packets, not labels.
>
>
> [minor] A third difference is that this figure includes a "path-index" --
> but that is not explained.
>
>
>
> 508 4. Forwarding Behavior
> ...
> 513   When a packet arrives at a router, it matches a leaf. A labeled
> 514   packet matches a label leaf while an IP packet matches an IP leaf.
> 515   The forwarding engines walks the forwarding chain starting from the
> 516   leaf until the walk terminates on an adjacency. Thus when a packet
> 517   arrives, the chain is walked as follows:
>
> [minor] Instead of summarizing, simply start with the steps.
>
>
>
> 519   1. Lookup the leaf based on the destination address or the label at
> 520      the top of the packet.
>
> [major] Lookup how?  How do you know you found the right leaf?
>
> I think rfc3031 has some text related to the forwarding of labeled
> packets.  But I don't know of a reference for IP forwarding, so you should
> at least mention something around a longest-prefix-match.
>
>
>
> 522   2. Retrieve the parent pathlist of the leaf.
>
> [minor] While it is related, the terminology section defines the
> dependency relationship as the leaf being the child of the Pathlist --
> keeping consistency is important.
>
>
>
> 524   3. Pick the outgoing path "Pi" from the list of resolved paths in
> 525      the pathlist. The method by which the outgoing path is picked is
> 526      beyond the scope of this document (e.g. flow-preserving hash
> 527      exploiting entropy within the MPLS stack and IP header). Let the
> 528      "path-index" of the outgoing path "Pi" be "j".
>
> [minor] s/Pick the outgoing path "Pi"/Pick an outgoing path "Pi"
>
>
> [] "Let the "path-index" of the outgoing path "Pi" be "j"."
>
> I'm not sure if you mean that "j is the path-index of Pi" or "assign a
> path-index of j".  If the former, how is the path-index assigned?
>
>
>
> 530   4. If the prefix is labeled, use the "path-index" "j" to retrieve
> 531      the jth label "Lj" stored the jth entry in the OutLabel-List and
> 532      apply the label action of the label on the packet (e.g. for VPN
> 533      label on the ingress PE, the label action is "push"). As
> 534      mentioned in Section 1.1 the value of the "path-index" stored
> 535      in path may not necessarily be the same value of the location of
> 536      the path in the pathlist.
>
> [minor] "retrieve the jth label "Lj" stored the jth entry in the
> OutLabel-List"
>
> This sounds as if there were (at least) j labels in each of (at least) j
> entries in the OutLabel-List.   But I assume that there is really only one
> label (or label stack) in that entry.   ??
>
>
>
> 538   5. Move to the parent of the chosen path "Pi".
>
> 540   6. If the chosen path "Pi" is recursive, move to its parent prefix
> 541      and go to step 2.
>
> 543   7. If the chosen path is non-recursive move to its parent adjacency.
> 544      Otherwise go to the next step.
>
> [minor] These 2 steps say the same thing: "move to the parent of..."Pi"",
> "path "Pi"...move to its parent", and "move to its parent".  I think you
> need to better differentiate (?) the options.
>
>
>
> ...
> 549   Let's apply the above forwarding steps to the forwarding chain
> 550   depicted in Figure 2 in Section 2. Suppose a packet arrives at
> 551   ingress PE iPE from an external neighbor. Assume the packet matches
> 552   the VPN prefix VPN-IP1. While walking the forwarding chain, the
> 553   forwarding engine applies a hashing algorithm to choose the path and
> 554   the hashing at the BGP level yields path 0 while the hashing at the
> 555   IGP level yields path 1. In that case, the packet will be sent out
> 556   of interface I2 with the label stack "IGP-L12,VPN-L11".
>
> [minor] s/path 0...path 1/path-index 0...path-index 1
>
>
> [minor] The example would be better understood if it was presented as
> steps matching the process described above (vs a narrative explanation).
> Also, be specific in which path is the one with each path-index...
>
>
> [major] Figure 2 doesn't contain any indication of a path-index.
>
> 558 5. Handling Platforms with Limited Levels of Hierarchy
>
>
> [] This whole section, which is longer than the text describing the
> "normal" mechanism (!), seems to be more appropriate for an Appendix than
> the main body of the document.  Please move it.
>
>
>
> ...
> 564   o  Being able to reduce the number of hierarchical levels from any
> 565      arbitrary value to a smaller arbitrary value that can be
> 566      supported by the forwarding engine.
>
> [?] What is the worst case in terms of number of levels?  3?  Just
> curious. I'm asking simply because an "arbitrary value" sounds like it
> could be 10s o even 1000s, when in reality it is not more than a handful.
> Are there many platforms out there that wouldn't support 2-3 levels (for
> the number of routes/paths they normally would carry)?
>
>
> [minor] "smaller arbitrary value"
>
> This smaller value is not really arbitrary (as in random), it is
> determined by the platform.  Maybe something along the lines of a "value
> supported by the platform" would be better.
>
>
>
> ...
> 571 5.1. Flattening the Forwarding Chain
> ...
> 584   1. The forwarding engine is now at leaf "R1".
> ...
> 605   9. Now the forwarding engine continues the walk to the parent of
> 606      "Qj".
>
> [minor] This list of steps is just a restatement of the process in the
> last section.  IMO there's no need to repeat the steps here.  Point at the
> 2-level example, and jump directly to the next paragraph.
>
> To reference from the explanation below, "a picture is worth a thousand
> words". ;-)
>
>
>
> ...
> 614   1. FIB manager wants to reduce the number of levels used by "Pi" by
> 615      1.
>
> [nit] This is not really a step...
>
>
>
> ...
> 622   4. FIB manager also extracts the OutLabel-list(R2) associated with
> 623      the leaf "R2". Remember that OutLabel-list(R2) = <L1, L2,...,
> 624      Lm>.
>
> [] New syntax: "OutLabel-list(R2)".
>
>
>
> ...
> 643   It is noteworthy to mention that the labels in the OutLabel-list
> 644   associated with the "flattened" pathlist may be stored in the same
> 645   memory location as the path itself to avoid additional memory
> 646   access. But that is an implementation detail that is beyond the
> 647   scope of this document.
>
> [] Why even mention it then?
>
>
>
> 649   The same steps can be applied to all paths in the pathlist <P1,
> 650   P2,..., Pn> so that all paths are "flattened" thereby reducing the
> 651   number of hierarchical levels by one. Note that that "flattening" a
> 652   pathlist pulls in all paths of the parent paths, a desired feature
> 653   to utilize all ECMP/UCMP paths at all levels. A platform that has a
> 654   limit on the number of paths in a pathlist for any given leaf may
> 655   choose to reduce the number paths using methods that are beyond the
> 656   scope of this document.
>
> [minor] Expand UCMP.
>
>
>
> ...
> 663   Because a flattened pathlist may have an associated OutLabel-list
> 664   the forwarding behavior has to be slightly modified. The
> 665   modification is done by adding the following step right after step 4
> 666   in Section 4.
>
> 668   5. If there is an OutLabel-list associated with the pathlist, then
> 669      if the path "Pi" is chosen by the hashing algorithm, retrieve the
> 670      label at location "i" in that OutLabel-list and apply the label
> 671      action of that label on the packet.
>
> [major] Step 4 says this:
>
>    4. If the prefix is labeled, use the "path-index" "j" to retrieve
>       the jth label "Lj" stored the jth entry in the OutLabel-List and
>       apply the label action of the label on the packet...
>
> Besides using "j" instead of "i" as the path-index, both seem to say the
> same thing.  Am I missing something?
>
>
>
> ...
> 676 5.2. Example: Flattening a forwarding chain.
>
> 678   This example uses a case of inter-AS option C [9] where there are 3
> 679   levels of hierarchy. Figure 4 illustrates the sample topology. To
> 680   force 3 levels of hierarchy, the ASBRs[9] on the ingress domain
> 681   (domain 1) advertise the core routers of the egress domain (domain
> 682   2) to the ingress PE (iPE) via BGP-LU [3] instead of redistributing
> 683   them into the IGP of domain 1. The end result is that the ingress PE
> 684   (iPE) has 2 levels of recursion for the VPN prefixes VPN-IP1 and
> 685   VPN-IP2.
>
> [] Suggestion:
>
> OLD>
>    To force 3 levels of hierarchy, the ASBRs[9] on the ingress
>    domain (domain 1) advertise the core routers of the egress
>    domain (domain 2) to the ingress PE (iPE) via BGP-LU [3]
>    instead of redistributing them into the IGP of domain 1.
>
> NEW>
>    The ASBRs on the ingress domain (Domain 1) use BGP to advertise
>    the core routers (ASBRs and ePEs) of the egress domain (Domain
>    2) to the iPE.
>
>
> [minor] Expand ASBR on first use.  No need to add a reference.
>
>
>
> 687       Domain 1                 Domain 2
> 688   +-------------+          +-------------+
> 689   |             |          |             |
> 690   | LDP/SR Core |          | LDP/SR core |
> 691   |             |          |             |
> 692   |     (192.0.2.4)        |             |
> 693   |         ASBR11-------ASBR21........ePE1(192.0.2.1)
> 694   |             | \      / |   .      .  |\
> 695   |             |  \    /  |    .    .   | \
> 696   |             |   \  /   |     .  .    |  \
> 697   |             |    \/    |      ..     |   \VPN-IP1(198.51.100.0/24)
>
> 698   |             |    /\    |      . .    |   /VRF "Blue" ASn: 65000
> 699   |             |   /  \   |     .   .   |  /
> 700   |             |  /    \  |    .     .  | /
> 701   |             | /      \ |   .       . |/
> 702   iPE        ASBR12-------ASBR22........ePE2 (192.0.2.2)
> 703   |     (192.0.2.5)        |             |\
> 704   |             |          |             | \
> 705   |             |          |             |  \
> 706   |             |          |             |   \VRF "Blue" ASn: 65000
> 707   |             |          |             |   /VPN-IP2(203.0.113.0/24)
> 708   |             |          |             |  /
> 709   |             |          |             | /
> 710   |             |          |             |/
> 711   |         ASBR13-------ASBR23........ePE3(192.0.2.3)
> 712   |     (192.0.2.6)        |             |
> 713   |             |          |             |
> 714   |             |          |             |
> 715   +-------------+          +-------------+
> 716    <===========  <=========  <============
> 717   Advertise ePEx  Advertise   Redistribute
> 718   Using iBGP-LU   ePEx Using    IGP into
> 719                    eBGP-LU        BGP
>
> 721               Figure 4 : Sample 3-level hierarchy topology
>
> [minor] s/ASn/ASN/g
>
>
> [major] "Redistribute IGP into BGP"  ??
>
> I'm sure you don't mean that, but probably more something like
> "redistribute the ePEx routes only"...
>
>
>
> 723   We will make the following assumptions about connectivity
>
> [nit] s/connectivity/connectivity:
>
>
>
> 725   o  In "domain 2", both ASBR21 and ASBR22 can reach both ePE1 and
> 726      ePE2 using the same distance.
>
> [nit] The figure capitalizes "Domain", but the text doesn't.  Please be
> consistent!
>
>
> [minor] By "distance" I assume you mean "IGP metric".  Please use that
> term instead.   Apply to other occurences.
>
>
>
> ...
> 733   We will make the following assumptions about the labels
>
> [nit] s/labels/labels:
>
>
>
> ...
> 740   o  The labels advertised by ASBR11 to iPE using BGP-LU [3] for the
> 741      egress PEs ePE1 and ePE2 are LASBR111(ePE1) and LASBR112(ePE2),
> 742      respectively.
>
> [] Why change the notation of the labels?  This new notation
> ("LASBR111(ePE1)") just makes the explanation more complex. :-(
>
>
>
> ...
> 764         65000: 198.51.100.0/24
> 765            via ePE1 (192.0.2.1), VPN Label: VPN-L11
> 766            via ePE2 (192.0.2.2), VPN Label: VPN-L21
> 767         65000: 203.0.113.0/24
> 768            via ePE2 (192.0.2.2), VPN Label: VPN-L22
> 769            via ePE3 (192.0.2.3), VPN Label: VPN-L32
>
> 771         192.0.2.1/32 (ePE1)
> 772            Via ASBR11, BGP-LU Label: LASBR111(ePE1)
> 773            Via ASBR12, BGP-LU Label: LASBR121(ePE1)
> 774         192.0.2.2/32 (ePE2)
> 775            Via ASBR11, BGP-LU Label: LASBR112(ePE2)
> 776            Via ASBR12, BGP-LU Label: LASBR122(ePE2)
> 777         192.0.2.3/32 (ePE3)
> 778            Via ASBR13, BGP-LU Label: LASBR13(ePE3)
>
> 780         192.0.2.4/32 (ASBR11)
> 781            via Core, Label:  IGP-L11
> 782         192.0.2.5/32 (ASBR12)
> 783            via Core, Label:  IGP-L12
> 784         192.0.2.6/32 (ASBR13)
> 785            via Core, Label:  IGP-L13
>
> [nit] s/Via/via/g
>
>
> [minor] Only some of the addresses are annotated ("192.0.2.1/32
> (ePE1)").  This is helpful, but inconsistent -- both within this example
> and with previous ones.
>
>
>
> ...
> 797   o  The index inside the pathlist entry indicates the label that will
> 798      be picked from the OutLabel-List associated with the child leaf
> 799      if that path is chosen by the forwarding engine hashing function.
>
> [] s/index/path-index
>
>
>
> 801   OutLabel-List                                      OutLabel-List
> 802     For VPN-IP1                                         For VPN-IP2
> 803   +------------+    +--------+           +-------+   +------------+
> 804   |  VPN-L11   |<---| VPN-IP1|           |VPN-IP2|-->|  VPN-L22   |
> 805   +------------+    +---+----+           +---+---+   +------------+
> 806   |  VPN-L21   |        |                    |       |  VPN-L32   |
> 807   +------------+        |                    |       +------------+
> 808                         |                    |
> 809                         V                    V
> 810                    +---+---+            +---+---+
> 811                    | 0 | 1 |            | 0 | 1 |
> 812                    +-|-+-\-+            +-/-+-\-+
> 813                      |    \              /     \
> 814                      |     \            /       \
> 815                      |      \          /         \
> 816                      |       \        /           \
> 817                      v        \      /             \
> 818                 +-----+       +-----+             +-----+
> 819            +----+ ePE1|       |ePE2 +-----+       | ePE3+-----+
> 820            |    +--+--+       +-----+     |       +--+--+     |
> 821            v       |            /         v          |        v
> 822   +--------------+ |           /   +--------------+  | +-------------+
> 823   |LASBR111(ePE1)| |          /    |LASBR112(ePE2)|  | |LASBR13(ePE3)|
> 824   +--------------+ |         /     +--------------+  | +-------------+
> 825   |LASBR121(ePE1)| |        /      |LASBR122(ePE2)|  | OutLabel-List
> 826   +--------------+ |       /       +--------------+  |    For ePE3
> 827   OutLabel-List    |      /        OutLabel-List     |
> 828       For ePE1     |     /           For ePE2        |
> 829                    |    /                            |
> 830                    |   /                             |
> 831                    |  /                              |
> 832                    v /                               v
> 833                +---+---+  Shared Pathlist          +---+  Pathlist
> 834                | 0 | 1 | For ePE1 and ePE2         | 0 |  For ePE3
> 835                +-|-+-\-+                           +-|-+
> 836                  |    \                              |
> 837                  |     \                             |
> 838                  |      \                            |
> 839                  |       \                           |
> 840                  v        \                          v
> 841               +------+    +------+               +------+
> 842           +---+ASBR11|    |ASBR12+--+            |ASBR13+---+
> 843           |   +------+    +------+  |            +------+   |
> 844           v                         v                       v
> 845      +-------+                  +-------+              +-------+
> 846      |IGP-L11|                  |IGP-L12|              |IGP-L13|
> 847      +-------+                  +-------+              +-------+
>
> 849       Figure 5 : Forwarding Chain for hardware supporting 3 Levels
>
> [] The OutLabel-Lists at the bottom are not tagged as such.
>
>
> [] Some down arrows ("v") are missing -- on the non-vertical lines.
>
>
>
> 851   Now suppose the hardware on iPE (the ingress PE) supports 2 levels
> 852   of hierarchy only. In that case, the 3-levels forwarding chain in
> 853   Figure 5 needs to be "flattened" into 2 levels only.
>
> 855   OutLabel-List                                  OutLabel-List
> 856     For VPN-IP1                                    For VPN-IP2
> 857   +------------+    +-------+      +-------+     +------------+
> 858   |  VPN-L11   |<---|VPN-IP1|      | VPN-IP2|--->|  VPN-L22   |
> 859   +------------+    +---+---+      +---+---+     +------------+
> 860   |  VPN-L21   |        |              |         |  VPN-L32   |
> 861   +------------+        |              |         +------------+
> 862                         |              |
> 863                         |              |
> 864                         |              |
> 865          Flattened      |              |  Flattened
> 866          pathlist       V              V   pathlist
> 867                    +===+===+        +===+===+===+     +==============+
> 868           +--------+ 0 | 1 |        | 0 | 0 | 1 +---->|LASBR112(ePE2)|
> 869           |        +=|=+=\=+        +=/=+=/=+=\=+     +==============+
> 870           v          |    \          /   /     \      |LASBR122(ePE2)|
> 871    +==============+  |     \  +-----+   /       \     +==============+
> 872    |LASBR111(ePE1)|  |      \/         /         \    |LASBR13(ePE3) |
> 873    +==============+  |      /\        /           \   +==============+
> 874    |LASBR121(ePE1)|  |     /  \      /             \
> 875    +==============+  |    /    \    /               \
> 876                      |   /      \  /                 \
> 877                      |  /       +  +                  \
> 878                      |  +       |  |                   \
> 879                      |  |       |  |                    \
> 880                      v  v       v  v                     \
> 881                    +------+    +------+              +------+
> 882               +----|ASBR11|    |ASBR12+---+          |ASBR13+---+
> 883               |    +------+    +------+   |          +------+   |
> 884               v                           v                     v
> 885           +-------+                  +-------+              +-------+
> 886           |IGP-L11|                  |IGP-L12|              |IGP-L13|
> 887           +-------+                  +-------+              +-------+
>
> [] Same comments as above.
>
>
>
> 889     Figure 6 : Flattening 3 levels to 2 levels of Hierarchy on iPE
>
> 891   Figure 6 represents one way to "flatten" a 3 levels hierarchy into
> 892   two levels. There are few important points:
>
> [nit] s/are few/are a few
>
>
>
> ...
> 908   o  Let's take a look at the flattened pathlist used by the prefix
> 909      "VPN-IP2", The pathlist associated with the prefix "VPN-IP2" has
> 910      three entries.
>
> [nit] s/"VPN-IP2", The/"VPN-IP2". The
>
>
>
> 912       o The first and second entry have index "0". This is because
> 913         both entries correspond to ePE2. Thus when hashing performed
> 914         by the forwarding engine results in using first or the second
> 915         entry in the pathlist, the forwarding engine will pick the
> 916         correct VPN label "VPN-L22", which is the label advertised by
> 917         ePE2 for the prefix "VPN-IP2".
>
> [nit] s/using first/using the first
>
>
>
> ...
> 952   o  So the packet is forwarded towards the ASBR "ASBR12" and the IGP
> 953      label at the top will be "L12".
>
> [minor] The figure calls the label "IGP-L12", not just "L12".  There are
> other instances of this below.
>
>
>
> ...
> 982 6. Forwarding Chain Adjustment at a Failure
>
> 984   The hierarchical and shared structure of the forwarding chain
> 985   explained in the previous section allows modifying a small number of
> 986   forwarding chain objects to re-route traffic to a pre-calculated
> 987   equal-cost or backup path without the need to modify the possibly
> 988   very large number of BGP prefixes. In this section, we go over
> 989   various core and edge failure scenarios to illustrate how FIB
> 990   manager can utilize the forwarding chain structure to achieve BGP
> 991   prefix independent convergence.
>
> [nit] s/how FIB manager/how FIB the manager
>
>
>
> 993 6.1. BGP-PIC core
>
> [minor] The meaning of "core" and "edge" should be explained somwehre.
>
>
>
> ...
> 1001   When a remote link or node fails, IGP on the ingress PE receives
> 1002   advertisement indicating a topology change so IGP re-converges to
> 1003   either find a new next-hop and/or outgoing interface or remove the
> 1004   path completely from the IGP prefix used to resolve BGP next-hops.
> 1005   IGP and/or LDP download the modified IGP leaves with modified
> 1006   outgoing labels for labeled core.
>
> [nit] s/IGP/the IGP/g
>
>
> [nit] s/receives advertisement/receives an advertisement
>
>
> [nit] s/for labeled core/for the labeled core
>
>
> [] "download"
>
> This is the second time that the "download" concept is used without
> explanation.  See my comment in §3.1.
>
>
>
> 1008   When a local link fails, FIB manager detects the failure almost
> 1009   immediately. The FIB manager marks the impacted path(s) as unusable
> 1010   so that only useable paths are used to forward packets. Hence only
> 1011   IGP pathlists with paths using the failed local link need to be
> 1012   modified. All other pathlists are not impacted. Note that in this
> 1013   particular case there is actually no need even to backwalk to IGP
> 1014   leaves to adjust the OutLabel-Lists because FIB can rely on the
> 1015   path-index stored in the useable paths in the pathlist to pick the
> 1016   right label.
>
> [nit] s/is actually no need even to/is no need to
>
>
> [nit] s/to IGP leaves/to the IGP leaves
>
>
> [major] The term "backwalk" is introduced here, but not explained.
>
>
>
> 1018   It is noteworthy to mention that because FIB manager modifies the
> 1019   forwarding chain starting from the IGP leaves only. BGP pathlists
> 1020   and leaves are not modified. Hence traffic restoration occurs
> within
> 1021   the time frame of IGP convergence, and, for local link failure,
> 1022   assuming a backup path has been precomputed, within the timeframe
> of
> 1023   local detection (e.g. 50ms). Examples of solutions that pre-
> 1024   computing backup paths are IP FRR [16] remote LFA [17], Ti-LFA [15]
> 1025   and MRT [18] or eBGP path having a backup path [11].
>
> [major] The case on the previous paragraph deals with a local link failure
> that results in the routes being "unusable", but this paragraph mentions
> restoration "assuming a backup path has been precomputed" -- the
> restoration timeframe also applies to the case where there is no backup
> path (as explained above).
>
> Note that the next paragraph also talks about a backup route and seems to
> ignore the "unusable" case above.
>
>
> [nit] s/solutions that pre- computing/solutions that can pre-compute
>
>
> [nit] s/Ti-LFA/TI-LFA
>
>
> [minor] Expand FRR, LFA, TI-LFA, and MRT on first use.
>
>
> [nit] s/IP FRR [16] remote LFA...[18] or eBGP/IP FRR [16], remote
> LFA...[18], or eBGP
>
>
> [minor] "eBGP path having a backup path [11]"
>
> I don't think that's the right reference.
>
>
>
> 1027   Let's apply the procedure mentioned in this subsection to the
> 1028   forwarding chain depicted in Figure 2. Suppose a remote link
> failure
> 1029   occurs and impacts the first ECMP IGP path to the remote BGP next-
> 1030   hop. Upon IGP convergence, the IGP pathlist used by the BGP
> next-hop
> 1031   is updated to reflect the new topology (one path instead of two).
> As
> 1032   soon as the IGP convergence is effective for the BGP next-hop
> entry,
> 1033   the new forwarding state is immediately available to all dependent
> 1034   BGP prefixes. The same behavior would occur if the failure was
> local
> 1035   such as an interface going down. As soon as the IGP convergence is
> 1036   complete for the BGP next-hop IGP route, all its BGP depending
> 1037   routes benefit from the new path. In fact, upon local failure, if
> 1038   LFA protection is enabled for the IGP route to the BGP next-hop and
> 1039   a backup path was pre-computed and installed in the pathlist, upon
> 1040   the local interface failure, the LFA backup path is immediately
> 1041   activated (e.g. sub-50msec) and thus protection benefits all the
> 1042   depending BGP traffic through the hierarchical forwarding
> dependency
> 1043   between the routes.
>
> [minor] "Upon IGP convergence... As soon as the IGP convergence is
> effective for the BGP next-hop entry,"
>
> There's some redundancy in the text.  Also, by using different text,
> you're implying that there is a difference between "IGP convergence" and
> "IGP convergence...effective for the BGP next-hop entry".
>
>
> [major] "The same behavior would occur if the failure was local such as an
> interface going down. As soon as the IGP convergence..."
>
> According to the text above (2 or 3 paragraphs before), IGP convergence is
> not necessary in the local failure case.
>
>
>
> ...
> 1050 6.2.1. Adjusting forwarding Chain in egress node failure
>
> [minor] Most of the text talks about an "egress node".  I assume that is
> the same as an "edge node" -- is it?  If so, please be consistent and/or
> explain somewhere that you're referring to the same thing.
>
>
> 1052   When an edge node fails, IGP on neighboring core nodes send route
> 1053   updates indicating that the edge node is no longer reachable. IGP
> 1054   running on the iPE instructs FIB to remove the IP and label leaves
> 1055   corresponding to the failed edge node from FIB. So FIB manager
> 1056   performs the following steps:
>
> [major] "IGP on neighboring core nodes send route updates indicating that
> the edge node is no longer reachable"
>
> §6.1 says that "Node failures are treated as link failures."   Is that the
> same in this section?
>
> The neighbor nodes can't detect that the node is down -- they can only
> detect the link being down.  The rest of the explanation assumes that there
> was a single link connecting the egress node to the core, or that all the
> links were reported as down.  Neither case is explained.
>
> If there is more than one link there shouldn't be a failure in reaching
> the next-hop -- except for the change in the local router(s), similar to a
> core failure.
>
>
>
> ...
> 1073 6.2.2. Adjusting Forwarding Chain on PE-CE link Failure
> ...
> 1082   In the first case, the rest of iBGP peers will remain unaware of
> the
> 1083   link failure and will continue to forward traffic to the edge node
> 1084   until the edge node attached to the failed link withdraws the BGP
> 1085   prefixes. If the destination prefixes are multi-homed to another
> 1086   iBGP peer, say ePE2, then FIB manager on the edge router detecting
> 1087   the link failure applies the following steps (see Figure 3):
>
> [minor] Figure 3 is an example -- it does show a sample chain.  Figure 4
> may be used as a reference for the type of backup described here.
>
>
>
> ...
> 1100       o The label entry in OutLabel-List corresponding to the
> 1101          internal path to backup egress PE has swap action to the
> 1102          label advertised by backup egress PE.
>
> [nit] s/has swap action/has a swap action
>
>
> [nit] s/by backup/by the backup
>
>
>
> 1104       o For an arriving label packet (e.g. VPN), the top label is
> 1105          swapped with the label advertised by backup egress PE and
> the
> 1106          packet is sent towards that backup egress PE.
>
> [nit] s/label packet/labeled packet
>
>
> [major] Even though you're expanding on an example, it is important for
> the text to be precise.
>
> - The label mentioned is "e.g. VPN", but you're been using more specific
> labels -- from figure 3, "VPN-L21"...
>
> - The OutLabel-Lists in Figure 3 show both "VPN-L21 (swap)" and "VPN-L21
> (push)".  The same label is used, only the swap action is mentioned here.
> What am I missing?
>
> - As I mentioned before, the diagram in Figure 3 is not clear.
>
>
>
> 1108   o  For unlabeled traffic, packets are simply redirected towards
> 1109      backup egress PE.
>
> [nit] s/backup/the backups
>
>
> [major] "ackets are simply redirected towards backup egress PE"
>
> How?  The backup ePE is presumably not connected (which is why a label was
> needed)...??
>
>
>
> 1111   In the second case where the edge router uses the IP address of the
> 1112   failed link as the BGP next-hop, the edge router will still perform
> 1113   the previous steps. But, unlike the case of next-hop self, IGP on
> 1114   failed edge node informs the rest of the iBGP peers that IP address
> 1115   of the failed link is no longer reachable. Hence the FIB manager on
> 1116   iBGP peers will delete the IGP leaf corresponding to the IP prefix
> 1117   of the failed link. The behavior of the iBGP peers will be
> identical
> 1118   to the case of edge node failure outlined in Section 6.2.1.
>
> [nit] s/IGP on failed edge node/the IGP on the failed edge node
>
>
> [nit] s/that IP address/that the IP address
>
>
>
> 1120   It is noteworthy to mention that because the edge link failure is
> 1121   local to the edge router, sub-50 msec convergence can be achieved
> as
> 1122   described in [11].
>
> [] But only at the edge router -- not in other places of the network where
> the IGP has to converge.
>
> Please keep the numbers in the Appendix.  IOW, there's no need to claim
> this performance here.
>
>
>
> 1124   Let's try to apply the case of next-hop self to the forwarding
> chain
> 1125   depicted in Figure 3. After failure of the link between ePE1 and
> CE,
> 1126   the forwarding engine will route traffic arriving from the core
> 1127   towards VPN-NH2 with path-index=1. A packet arriving from the core
> 1128   will contain the label VPN-L11 at top. The label VPN-L11 is swapped
> 1129   with the label VPN-L21 and the packet is forwarded towards ePE2.
>
> [minor] This paragraph is out of place: you already tried to explain this
> case a few paragraphs above.  Please move the text and deduplicate.
>
>
>
> 1131 6.3. Handling Failures for Flattened Forwarding Chains
>
> 1133   As explained in the in Section 5 if the number of hierarchy levels
> 1134   of a platform cannot support the native number of hierarchy levels
> 1135   of a recursive forwarding chain, the instantiated forwarding chain
> 1136   is constructed by flattening two or more levels. Hence a 3 levels
> 1137   chain in Figure 5 is flattened into the 2 levels chain in Figure 6.
>
> [nit] s/a 3 levels/the 3-level
>
>
> [nit] s/2 levels/2-level
>
>
>
> 1139   While reducing the benefits of BGP-PIC, flattening one hierarchy
> 1140   into a shallower hierarchy does not always result in a complete
> loss
> 1141   of the benefits of the BGP-PIC. To illustrate this fact suppose
> 1142   ASBR12 is no longer reachable in domain 1. If the platform supports
> 1143   the full hierarchy depth, the forwarding chain is the one depicted
> 1144   in Figure 5 and hence the FIB manager needs to backwalk one level
> to
> 1145   the pathlist shared by "ePE1" and "ePE2" and adjust it. If the
> 1146   platform supports 2 levels of hierarchy, then a useable forwarding
> 1147   chain is the one depicted in Figure 6. In that case, if ASBR12 is
> no
> 1148   longer reachable, the FIB manager has to backwalk to the two
> 1149   flattened pathlists and updates both of them.
>
> 1151   The main observation is that the loss of convergence speed due to
> 1152   the loss of hierarchy depth depends on the structure of the
> 1153   forwarding chain itself. To illustrate this fact, let's take two
> 1154   extremes. Suppose the forwarding objects in level i+1 depend on the
> 1155   forwarding objects in level i. If every object on level i+1 depends
> 1156   on a separate object in level i, then flattening level i into level
> 1157   i+1 will not result in loss of convergence speed. Now let's take
> the
> 1158   other extreme. Suppose "n" objects in level i+1 depend on 1 object
> 1159   in level i. Now suppose FIB flattens level i into level i+1. If a
> 1160   topology change results in modifying the single object in level i,
> 1161   then FIB has to backwalk and modify "n" objects in the flattened
> 1162   level, thereby losing all the benefit of BGP-PIC. Experience shows
> 1163   that flattening forwarding chains usually results in moderate loss
> 1164   of BGP-PIC benefits. Further analysis is needed to corroborate and
> 1165   quantify this statement.
>
> [major] "The main observation is that the loss of convergence speed due to
> the loss of hierarchy depth... Further analysis is needed to corroborate
> and quantify this statement."
>
> The explanation above sounds plausible.  However, if you really don't
> know, and there is a dependency on the structure, and it depends on the
> failure, and there's no alternative (if the platform supports less
> levels)...  Why even include this information?  It feels like the
> equivalent of "hand waving". :-(
>
>
>
> 1167 7. Properties
>
> [minor] This section is really a summary.  I wonder if it would serve a
> better purpose at the start of the document as a "road map".
>
>
>
> 1169 7.1. Coverage
>
> 1171   All the possible failures, except CE node failure, are covered,
> 1172   whether they impact a local or remote IGP path or a local or remote
> 1173   BGP next-hop as described in Section 6. This section provides
> 1174   details for each failure and how the hierarchical and shared FIB
> 1175   structure proposed in this document allows recovery that does not
> 1176   depend on number of BGP prefixes.
>
> [major] "except CE node failure"
>
> Isn't that covered in §6.2.2?
>
>
>
> 1178 7.1.1. A remote failure on the path to a BGP next-hop
>
> 1180   Upon IGP convergence, the IGP leaf for the BGP next-hop is updated
> 1181   upon IGP convergence and all the BGP depending routes leverage the
> 1182   new IGP forwarding state immediately. Details of this behavior can
> 1183   be found in Section 6.1.
>
> [minor] Redundant text: "Upon IGP convergence, the IGP leaf for the BGP
> next-hop is updated upon IGP convergence..."
>
>
>
> 1185   This BGP resiliency property only depends on IGP convergence and is
> 1186   independent of the number of BGP prefixes impacted.
>
> [] "BGP resiliency property"
>
> The rest of the text (and the name of the document!) refers to
> convergence.  It isn't until this section that you change the terminology.
> Please be consistent.
>
>
>
> 1188 7.1.2. A local failure on the path to a BGP next-hop
>
> 1190   Upon LFA protection, the IGP leaf for the BGP next-hop is updated
> to
> 1191   use the precomputed LFA backup path and all the BGP depending
> routes
> 1192   leverage this LFA protection. Details of this behavior can be found
> 1193   in Section 6.1.
>
> [minor] Redundant text: LFA is mentioned 3 times in the first sentence.
>
>
> [major] What about the case where an LFA doesn't exist?
>
>
>
> ...
> 1238 7.3. Automated
>
> 1240   The BGP-PIC solution does not require any operator involvement. The
> 1241   process is entirely automated as part of the FIB implementation.
>
> [] Ok -- it is not as much an automation as it is an implementation
> choice.
>
>
>
> 1243   The salient points enabling this automation are:
>
> [minor] I"m not sure if you're saying that the list below is what enables
> the "automation", or if the "automation" results in that.
>
>
>
> 1245   o  Extension of the BGP Best Path to compute more than one primary
> 1246      ([12]and [13]) or backup BGP next-hop ([7] and [14]).
>
> [major] This functionality is available regardless of the structure of the
> FIB.
>
>
>
> 1248   o  Sharing of BGP Path-list across BGP destinations with same
> 1249      primary and backup BGP next-hop.
>
> [nit] s/same/the same
>
>
>
> 1251   o  Hierarchical indirection and dependency between BGP pathlist and
> 1252      IGP pathlist.
>
> [] The last two bullets are a characteristic of the implementation.
>
>
>
> 1254 7.4. Incremental Deployment
>
> 1256   As soon as one router supports BGP-PIC solution, it benefits from
> 1257   all its benefits (most notably convergence that does not depend in
> 1258   the number of prefixes) without any requirement for other routers
> to
> 1259   support BGP-PIC.
>
> [nit] Reduncant text: "it benefits from all its benefits"
>
>
> [major] The assertion above is true.  However, (in the extreme) a single
> router supporting BGP-PIC doesn't necessarily translate in to better
> overall convergence in the network (it depends on where the router is, the
> failure, etc.).  Also, having a mix of routers could result in transient
> micro loops as the speed of convergence is not the same.
>
>
>
> 1261 8. Security Considerations
>
> 1263   The behavior described in this document is internal functionality
> 1264   to a router that result in significant improvement to convergence
> 1265   time as well as reduction in CPU and memory used by FIB while not
> 1266   showing change in basic routing and forwarding functionality. As
> 1267   such no additional security risk is introduced by using the
> 1268   mechanisms proposed in this document.
>
> [] I can't think of anything else to say here, but you did not show (nor
> do I need you to) "significant...reduction in CPU and memory used".  It
> sounds more like a marketing statement. :-(
>
>
>
> 1270 9. IANA Considerations
>
> 1272   No requirements for IANA
>
> [major] s/.../This document has no IANA actions.
>
>
>
> 1274 10. Conclusions
>
> [] This section is not needed, it contains redundant information.  Please
> remove it.
>
>
>
> ...
> 1283 11. References
>
> [major] The RFC Editor requires that the OID be listed for all references
> [1].  Please update the references -- here's an example of what they should
> look like [2]:
>
>    [RFC4271]  Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border
>               Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271,
>               January 2006, <https://www.rfc-editor.org/info/rfc4271>.
>
> [1] https://www.rfc-editor.org/styleguide/part2/
> [2] https://www.rfc-editor.org/refs/ref4271.txt
>
>
>
> 1285 11.1. Normative References
> ...
> 1293   [3]   E. Rosen, " Carrying Label Information in BGP-4", RFC 8277,
> 1294         October 2017
>
> [minor] This reference can be Informative.
>
>
>
> 1296   [4]   Andersson, L., Minei, I., and B. Thomas, "LDP Specification",
> 1297         RFC 5036, October 2007
>
> [minor] This reference can be Informative.
>
>
>
> 1299   [5]   A. Bashandy, C. Filsfils, S. Previdi, B. Decraene, S.
> 1300         Litkowski, M. Horneffer, R. Shakir, "Segment Routing with
> MPLS
> 1301         data plane", RFC 8660, December 2019
>
> [minor] This reference can be Informative.
>
>
>
> ...
> 1377 Appendix A.                 Perspective
> ...
> 1392   o  BGP Convergence per BGP destination ~ 200usec conservative,
>
> 1394                                          ~ 100usec optimistic
>
> [minor] "BGP Convergence"
>
> The explanation below talks about recalculating the best route -- which is
> different than convergence (which includes sending BGP Updates, etc.).
>
> [EoR -18]
>
>