Re: Document Shepherd Review of draft-ietf-rtgwg-bgp-pic-02

"Ahmed Bashandy (bashandy)" <bashandy@cisco.com> Wed, 23 November 2016 05:41 UTC

Return-Path: <bashandy@cisco.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 70781129401; Tue, 22 Nov 2016 21:41:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -16.018
X-Spam-Level:
X-Spam-Status: No, score=-16.018 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.497, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NochpRZoK9WX; Tue, 22 Nov 2016 21:41:44 -0800 (PST)
Received: from rcdn-iport-7.cisco.com (rcdn-iport-7.cisco.com [173.37.86.78]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7E29A1294A9; Tue, 22 Nov 2016 21:41:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=120965; q=dns/txt; s=iport; t=1479879704; x=1481089304; h=message-id:date:from:mime-version:to:cc:subject: references:in-reply-to; bh=ts8euwOr2TUE9uaHZLHoukCBISJFzILythxSvCzlU2k=; b=cB6PTCvX7GwufcaC3PIFhRhTTZlrzG5K8+jeneQnDo52cHLrRD942W0h wltPvHXGyjPIg6vml9jZJ+xApaOqqy9mfCJODMplVcnBPMxrDexx6FhQa 4hSNIG0tNMiqoNA83T+nwdYlYv+LT4kArsIlFNGXDjVPdKTDvfIAB5HbM c=;
X-IronPort-AV: E=Sophos;i="5.31,684,1473120000"; d="scan'208,217";a="172850616"
Received: from rcdn-core-3.cisco.com ([173.37.93.154]) by rcdn-iport-7.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Nov 2016 05:41:43 +0000
Received: from [10.24.108.139] ([10.24.108.139]) by rcdn-core-3.cisco.com (8.14.5/8.14.5) with ESMTP id uAN5fg2H024816; Wed, 23 Nov 2016 05:41:43 GMT
Message-ID: <58352C16.8030607@cisco.com>
Date: Tue, 22 Nov 2016 21:41:42 -0800
From: "Ahmed Bashandy (bashandy)" <bashandy@cisco.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: Rob Shakir <rjs@rob.sh>, draft-ietf-rtgwg-bgp-pic.all@ietf.org
Subject: Re: Document Shepherd Review of draft-ietf-rtgwg-bgp-pic-02
References: <C58CC6E1-763D-4D2A-A583-E3AFCC30B0F9@rob.sh>
In-Reply-To: <C58CC6E1-763D-4D2A-A583-E3AFCC30B0F9@rob.sh>
Content-Type: multipart/alternative; boundary="------------070509030306020603070601"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/nyFKOI54mdoQQ6oSF_8fot6MeAo>
Cc: rtgwg@ietf.org
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Nov 2016 05:41:47 -0000

Sorry for the late reply.  I have uploaded a new version to address your 
very useful comments:)

https://datatracker.ietf.org/doc/draft-ietf-rtgwg-bgp-pic/


See replies below "#Ahmed"

Thanks a lot

Ahmed



On 8/11/2016 6:39 PM, Rob Shakir wrote:
> Ahmed, Clarence, Pradosh,
>
> I did a review of draft-ietf-rtgwg-bgp-pic-02 as part of the process of doing a shepherd write-up for this document. I have a number of comments which I want to try and address first.
>
> General:
>
> 1) This document feels really difficult to get the concepts from to me. The key intent (in my view) is to describe the way that one can structure a FIB to be able to reduce the number of updates that are required to reconverge. Unfortunately, the extensive use of examples (to me) does not make the core concepts that are being described super-clear. To this end, what I would propose is re-working the overview section. In this section the two “pillars” that are described can be outlined in more detail. For example:
>
> 	BGP PIC is based on two FIB design concepts:
> 	
> 	  1. Rather than implementing a forwarding construct each BGP NLRI has
> 	     a corresponding next-hop property, the FIB is implemented as a series of linked
> 	     entries, whereby many BGP NLRIs can link to a single next-hop entry (or set of
>               next-hop entries).
>
> 	  2. Forwarding entries are represented as a chain - which may have multiple
> 	     levels of hierarchy representing the forwarding dependencies for packets
> 	     matching that criteria, for example, allowing a BGP NLRI to map to a BGP
> 	     next-hop which itself maps to an IGP next-hop, which maps to an interface.
>
> 	Designing the FIB in this manner allows commonalities between different prefixes
> 	to be exploited to improve convergence time. Particularly, where a particular
> 	element is updated (e.g., IGP path to a BGP next-hop), then these changes are
> 	inherited by the other entries which are linked to the updated element (i.e.,
> 	all BGP NLRIs which share the BGP next-hop above).
#Ahmed: I modified the first part of the "overview" section to indicate 
what you mentioned. I used prefix instead of NLRI because the concept of 
BGP-pic is not only applicable to BGP, even though it is called "BGP"-pic:)
>
> This high-level overview then allows a reader to be able parse *what* is really implemented prior to getting stuck into the detailed examples and walk-throughs. Defining this high-level approach would also help with the definitions section, such that there is some context before trying to understand the definitions.
>
> 2) The “Properties” section feels misplaced to me — it seems to be a back reference to the rest of the document, whereas it would feel more natural if it were a description of the benefits of implementing the concepts that are described in the document, and a link to the sections where these are described in more detail.
#Ahmed: I updated the "Properties" section to refer to the individual 
sections as you suggested.
>
> Section 4 has a similar issue, insofar that it now informs us of how to implement some specifics of the actual forwarding behaviour following a number of examples. I’d propose that the layout of the document should be something more like:
>
> 	Intro — why do we need PIC?
#Ahmed: This is what the abstract introduction does. First paragraph 
says that BGP convergence is limited by its serial nature and that we 
have multiple paths in many cases. The second paragraph says BGP-PIC is 
solving this problem by not waiting to till updates reach the place to 
take action
> 	Overview — what is PIC?
> 		a) FIB layout
> 		b) FIB operation with PIC
> 		c) Dealing with limited hierarchy FIBs
#Ahmed: I modified the document as follows
- Section 3 talks about constructing the forwarding chain and 
illustrates it using an example
- Section 4 talks about the forwarding algorithm that uses the 
hierarchical forwarding chain
- Section 5 talks about "flattening" the forwarding chain then applies 
the "flattening" algorithm to an inter-AS example for illustration

> 	Characteristics of PIC - what does PIC require, what does it not require?
#Ahmed: I made the following modification
The two pillars of BGP-PIC is are mentioned at the beginning of the 
"Overview" section. I added Section 2.1 to outline the requirements of 
BGP. Section 2.1 used to be Section 7 on the previous version. I hope 
this outlines the characteristics of BGP-PIC
> 	Examples - demonstrating how PIC deals with various scenarios.
> 		a) Basic example (as per Figure 2)
> 			- Operation with iPE link failure
> 			- Operation with remote link failure in the core
> 			- Operation with ePE failure
> 			- Operation with ePE-CE link failure
> 		b) Optionally have the more complex example here.
#Ahmed: This what we have in Section "6" in the new version (Section 5 
in the old version). There is really no "more complex" scenarios once 
the hierarchical forwarding chain is constructed as described.
>
> 3) Use of the phrase “Prefix Independent Convergence”. I understand that this is an established concept, which is widely implemented, however, if we are trying to be more technically robust then I think that we need to work on how this document describes the advantages of the methods that it proposes. In a network that has 1:1 mapping of prefix to next-hop then the mechanisms that are provided do not really mean that convergence is “prefix independent”. A more robust way to describe this (IMHO) is that convergence time is now related to the number of next-hop updates that need to be made, rather than the number of NLRI. So - in the case that we have our 1:1 mapping of prefix to next-hop, then we describe the number of updates we need to make is still # of prefixes == # of next-hops. However, since we expect prefix to next-hop to be N:1 in most cases, then we decouple of total number of *prefix* entries, but we’re still coupled to the number of next-hops. I’d encourage that if we’re going to have this document as output of the RTG WG then we are a little more robust in describing the characteristics of the solution - even if it does not sound quite so marketing-friendly as “prefix independent”.
#Ahmed: The existence of scenarios where convergence depends on prefixes 
does not mean the algorithm itself depends on the  number of prefixes.
As a general statement, the convergence of BGP-PIC depends on the 
"degree of sharing" as well as the "depth of hierarchy". Exact modeling 
and quantification of the "degree of sharing" and "depth of hierarchy" 
was never a topic of this document. Most likely it requires more 
mathematical modeling that would be a topic for another document that 
would be closer to an academic research paper than an IETF informational RFC
On the other hand, we exactly mentioned what you said in this comment in 
the last paragraph of Section 6.3 in the new version (5.3 in the 
previous version) where we give two extreme abstract examples covering 
both ends of the dependency spectrum
We also show in the same section that flattening (and thus reducing the 
number of levels of hierarchy), reduces, but does not completely 
eliminate, the benefit of BGP-PIC by comparing the number of objects to 
modify in the flattened and normal forwarding chain for the same failure
>
> 4) Examples — in the doc, there is quite a lot of text which describes - in words - what the routing table looks like for certain scenarios. It was my feeling reviewing the document that actually adding this wording makes the examples difficult to understand. Could one rather describe the RIB by writing out what the entries would look like, or putting it in the context of each device. It might be worth trying to use some actual documentation prefixes in the examples, rather than the current approach, which creates some confusion as to what kind of information is being represented by the notation used.
>
> For instance, Figure 2 feels more readable to me if written as follows:
>
>
>             +--------------------------------+
>             |                                |
>             |                               ePE2 (loopback 192.0.2.1)
>             |                                |  \
>             |                                |   \
>             |                                |    \
>            iPE                               |    CE.......VRF "Blue"
>             |                                |    /       (VPN-IP1 - 65000:1:2001:DB8:1::/64)
>             |                                |   /        (VPN-IP2 - 65000:1:2001:DB8:2::/64)
>             |   LDP/Segment-Routing Core     |  /
>             |                               ePE1 (loopback 192.0.2.2)
>             |                                |
>             +--------------------------------+
>
>
> 	iPE’s routing table:
> 	
> 		65000:1:2001:DB8:1::/64
> 			via ePE1 (192.0.2.1), VPN Label: VPN-L11
> 			via ePE2 (192.0.2.2), VPN Label:VPN-L21
>
> 		65000:1:2001:DB8:2::/64
> 			via ePE1 (192.0.2.1), VPN Label: VPN-L12
> 			via ePE2 (192.0.2.2), VPN Label: VPN-L22
>
> 		192.0.2.1/32
> 			via Core, Label: 		IGP-L11
> 			via Core, Label:		IGP-L12
> 	
> 		192.0.2.2/32
> 			via Core, Label:		IGP-L21
> 			via Core, Label:		IGP-L22
>
>
> Tabulating the routing information then makes it much easier to refer to in the subsequent diagram of the FIB layout.
#Ahmed: Thanks a lot for the suggestion. I added the RIB entries in 
Section 2.2 as you suggested except I used IPv4 addresses for the BGP 
NLRIs instead of IPv6 to fit inside the 80 character line with:). I also 
added the the routing table for the flattened forwarding chain example 
shown in figure 4 section 5.2 (Section 3.2 in the previous version)
>
> 5) In Section 3.2 - it seems like we go into a lot of detail of quite a complex forwarding topology to explain the idea of how we can flatten entries together in FIB hierarchy. I would propose that some text is added (ideally in the same section as the proposal in 1)  that describes the fundamentals of this section. It also does not seem clear from the text (or the diagrams) what the “0” and “1” in the pathlist and then subsequent flattened pathlist actually represent? Are these next-hop IP addresses, or are they labels, or something different? (I presume that they are next-hop addresses, but this doesn’t seem super-evident).
The entire idea of flattening has been re-organized and moved to 
separate section (Section 5). Section 5.1 explains the steps. Section 
5.2 applies these steps to an example. The "0" and "1" are path- 
indices. The second bullet right before Figure 5 starts with " The index 
inside the pathlist entry", Also the definition of "pathlist" in the 
"terminology" section says "Each path in the pathlist carries its 
"path-index"  and the step 4 in Section 4 (in the new version) explains 
how that index is used
>
> 6) There are some assertions in this document which I think are quite implementation specific:
> 	- 50msec failure detection of links — one could indicate that this might be a typical time, but nothing in the mechanism that is described in this document guarantees this.
> 	- “Perspective” - 6.2.1 — I think quite a few things here depend on how the implementation works. I would say that this section would be best moved to an appendix as something that demonstrates the potential advantages based on some example implementation and network topology.
#Ahmed: I moved that Section to an appendix and referred to it
> At the moment, this feels like marketing numbers rather than anything that I’d really expect to be in a technical publication from this WG.
>
> Specifics:
>
> 1) OutLabel-List: There is a definition here that says that the array can be of “one or more outgoing labels and/or label actions“, there is no provision within this definition that the label can be unlabelled (or a no-op) - but this is clearly implemented and used in a later example. I presume it should be included in this definition.
#Ahmed: Modified the definition of the outlabel-list as you suggested
>
> 2) The definition of dependency seems to say that an object X is dependent on object Y if Y can’t be deleted without X not being a dependency. It seems to me like we might want to make this definition less self-referential in the text. Perhaps something akin to:
>
> 	One object X is determined to be a dependent of another object Y if a link within
> 	a specified forwarding chain exists between the two objects.
#Ahmed: The definition that you're proposing indicates that if X and Y 
are in any forwarding chain, then they depend on each other because 
there will be link between them. Instead I modified the dependency 
definition to say that X depends on Y if the forwarding engine visits X 
before Y while walking a forwarding chain
>
> 3) Definition of primary path — the definition here states that the path “can be used all the time”. However, we have cases where the primary BGP path cannot be used because the next-hop is unreachable in the IGP (for example). I think the definition stating that path can be used all the time is not correct, and rather one wants to indicate that the primary path is the first usable path within a path list or similar.
#Ahmed: I modified the definition to say "can be used all the time as 
long as a walk starting from this path can end with an adjacency". This 
way it a primary path is a *resolved* path that can be used all the time
>
> 4) Definition of path — the definition here states that the path is the “next hop in a sequence of unique connected nodes”. Is this really what we mean - that a path needs to represent A-B-C-D. This doesn’t seem to tally with the definition (and use) of PathList which is simply the next-hop. Is a more rigorous definition that the a Path is an entry that corresponds to the first hop (i.e., the local device’s next-hop) of a series of nodes that end at the destination prefix?
#Ahmed: the definition starts with "It is the next-hop". So it coincides 
directly with the pathlist, which is a list of "next-hops". I removed 
"connected" and "unique" and added the statement " The nodes may not be 
directly connected." to match how we use the term "next-hop" in the draft
>
> 5) Figure 3: It’s generally unclear to me what this figure is trying to show with some of the arrows. We have VPN-L11 which is (I assume) an ingress LFIB entry, and then VPN-IP1 which is an ingress IP leaf (within a VRF, I presume). What do the arrows from the top to the bottom indicate?
#Ahmed: Both the IP and the label leaf use the same pathlist and 
OutLabel-list but the label actions is different for the IP and the 
label leaf. So I modified the diagram to show two different 
OutLabel-lists for the IP and the label leaves but sharing the same pathlist
>
> 6) “BGP is inherently slow” — this seems to require some subjective view of what ‘slow’ is. Would it not be better to state that the speed of convergence of BGP is limited by the time taken to serially propagate reachability information from the point of failure to the device that must reconverge.
#Ahmed: Modified the text as you suggested
>
> 7) “Another more common and widely deployed scenario is L3VPN” — do you want to state that this is another *common* deployment scenario, otherwise, what is this L3VPN deployment more common than?
#Ahmed: Removed the word "more"
>
> 8) There are some statements that this is “enabled with zero operator intervention” — it is probably enabled by some software upgrade, so I’m not sure that this is quite zero operator intervention - however, given that this is mentioned, do you also need to discuss why this is something that can be done on a per-node basis and what the potential negative impacts could be (if any) of this happening? In general, having new things turn on by default and not being able to test them, or know that they’ve changed is not a good way to ensure that you can resolve faults in a network swiftly and effectively.
#Ahmed: The introduction of any feature on any product requires an 
upgrade. However act of upgrade itself is never considered part of the 
operator effort to "enable" that feature. Instead it is considered 
capital expenditure. Hence the statement “enabled with zero operator 
intervention” states an actual property of BGP-PIC. What you're refering 
to is controlling the operation of BGP-PIC so as to be able to validate 
it. The method of controlling or validating an implementation of a 
feature is operator-dependent and implementation dependent. Also the 
choice of how to enable/disable/control a feature (Whether to enable it 
by default, disable it by default, have it enabled all the time without 
possibility of disable) is also implementation dependent. So I do not 
think this draft is the right place to discuss such implementation and 
operator dependent properties. These discussion are more suitable in a 
vendor-specific white paper or a 3rd party testing report
>
> 9) (nit) Through the “terminology” section - the definitions probably don’t need to start with “it is” - one can simply say “BGP prefix: A prefix p/m…”
#Ahmed: Done:)