Re: [RTG-DIR] Routing Directorate Review of "Framework for Loop-free convergence using oFIB"

Acee Lindem <acee.lindem@ericsson.com> Fri, 01 February 2013 16:41 UTC

Return-Path: <acee.lindem@ericsson.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0DDA721F8F66; Fri, 1 Feb 2013 08:41:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.266
X-Spam-Level:
X-Spam-Status: No, score=-2.266 tagged_above=-999 required=5 tests=[AWL=0.333, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VKxWvHrIBEws; Fri, 1 Feb 2013 08:41:44 -0800 (PST)
Received: from usevmg21.ericsson.net (usevmg21.ericsson.net [198.24.6.65]) by ietfa.amsl.com (Postfix) with ESMTP id EBDBC21F8F63; Fri, 1 Feb 2013 08:41:43 -0800 (PST)
X-AuditID: c6180641-b7f926d000000e79-a5-510bf047326a
Received: from EUSAAHC005.ericsson.se (Unknown_Domain [147.117.188.87]) by usevmg21.ericsson.net (Symantec Mail Security) with SMTP id 41.87.03705.740FB015; Fri, 1 Feb 2013 17:41:43 +0100 (CET)
Received: from EUSAAMB101.ericsson.se ([147.117.188.118]) by EUSAAHC005.ericsson.se ([147.117.188.87]) with mapi id 14.02.0318.004; Fri, 1 Feb 2013 11:41:42 -0500
From: Acee Lindem <acee.lindem@ericsson.com>
To: "<stbryant@cisco.com>" <stbryant@cisco.com>
Subject: Re: [RTG-DIR] Routing Directorate Review of "Framework for Loop-free convergence using oFIB"
Thread-Topic: [RTG-DIR] Routing Directorate Review of "Framework for Loop-free convergence using oFIB"
Thread-Index: AQHN/+vjIMzOHGj7rEKXcaQ9o4gGD5hlif4A
Date: Fri, 01 Feb 2013 16:41:42 +0000
Message-ID: <94A203EA12AECE4BA92D42DBFFE0AE470B7F70@eusaamb101.ericsson.se>
References: <94A203EA12AECE4BA92D42DBFFE0AE470B5BF8@eusaamb101.ericsson.se> <510ACA65.9010307@cisco.com>
In-Reply-To: <510ACA65.9010307@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [147.117.188.135]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <996F1415CE3C724DAF2DE455F808C68C@ericsson.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupnkeLIzCtJLcpLzFFi42KZXLonXNf9A3egwfazihY7Z/ewWTz/eoHZ 4kbDDxaLN+/+sFksWPOU3eLCm9/MFut3P2KyOPd0DqMDh8eU3xtZPZYs+cnk8X3tNWaPV8e+ swSwRHHZpKTmZJalFunbJXBldF08xF5wwaVi/Y2HjA2MC0y7GDk5JARMJM6d/MMGYYtJXLi3 Hsjm4hASOMIo8e5DFzOEs4xR4viCBmaQKjYBHYnnj/6B2SIC+hJHPuwEK2IWOMAkcXT1CUaQ hLBAusTvoxOgijIkni3/xghhG0ksvfQBzGYRUJGYO2UtC4jNK+At8fDoTSYQW0ggS+L+0e9g cU4BTYkV246C2YxA530/tQashllAXOLWk/lMEGcLSCzZc54ZwhaVePn4HyuErSzxfc4jFoh6 HYkFuz+xQdjWElsev2WEsLUlli18zQxxg6DEyZlPWCYwis9CsmIWkvZZSNpnIWmfhaR9ASPr KkaO0uLUstx0I8NNjMBIPSbB5riDccEny0OM0hwsSuK8oa4XAoQE0hNLUrNTUwtSi+KLSnNS iw8xMnFwSjUwxjyPeW73wf+f1032fy0THdzfS03+dkVPR/SkTdeByct9+SbxnfozOXXBtcnr My6YbTj9geFdfW5shCRHXtny7Z26l4T2rBdbsc8ruHN5iEx1bUraft8bq2dzPU/Lk2/vqhav 0U80TC8zjZ294O26e292Pbh3ZOvRLw7luoI2RmYmH7pfvOXtVGIpzkg01GIuKk4EAA99elmi AgAA
Cc: "rtg-dir@ietf.org" <rtg-dir@ietf.org>, Olivier Bonaventure <olivier.bonaventure@uclouvain.be>, "Stefano Previdi (sprevidi)" <sprevidi@cisco.com>, "Mike Shand (mshand)" <mshand@cisco.com>, "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtgwg>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Feb 2013 16:41:46 -0000

Hi Stewart,
I'm looking at the -09 version. I'm happy with it. I only have one comment that I think could be a further improvement - see #1 below. 
 
On Jan 31, 2013, at 2:47 PM, Stewart Bryant wrote:

> On 30/01/2013 18:33, Acee Lindem wrote:
>> Authors, et al,
>> I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see http://www.ietf.org/iesg/directorate/routing.html
>> Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.
>> Document: draft-ietf-rtgwg-ipfrr-ordered-fib-08
>> Reviewer: Acee Lindem
>> Review Date: January 30, 2013
>> IETF LC End Date: January 31, 2013
>> Intended Status: informational
>> Summary: This document is basically ready for publication, but has clarification that should be considered prior to publication.
>> Comments: The document accomplishes what it sets out to achieve in documenting the ordered FIB mechanism for avoidance of transient loops. While Appendix B is useful, I think the document would be better without Appendix A. Of course, this is just my opinion.
> 
> AAH took a lot of thought, so we would rather not loose it, if it works better I could reverse the order of the appendixes.

Ok - it does take a lot of effort to understand the AAH state machines but since it is in an appendix, the reader can choose to skip it. 

>> 
>> Major Issues: None
>> 
>> Minor Issues:
>>             1. The document could benefit for a precise definition of a "non-urgent topology change". From what I gathered, this is any change that can be deferred during the ordered FIB delay.
> The abstract now says
> "This mechanism can be used in the case of non-urgent (management action) link or node shutdowns and restarts or link metric changes."
> 
> 3.1.1 says
> 
> "First consider the non-urgent failure of a link (i.e. where an operator or a network management system (NMS) shuts down a link thereby removing it from the currently active topology) or the increase of a link metric by the operator or NMS ."
> 
> The point is of course that although a link is shut down rather than failing, a remote node cannot distinguish the two cases.

Ok. For both 3.1.1 and 3.1.2, you may also want to point out the use case of reconvergence of IPFRR repaired path. 

> 
>>             2. Similarly, the document could benefit from a precise definition of the "rSPF". I checked RFC 5715 and it is not defined there either. I believe our discussions indicate that this is simply an SPF where the shortest path back to us is used as the cost. For example, for the first pass, the SPF would use our neighbor's link cost rather than our own.
> We use rSPT in here.
> 
> As you know we are looking for a formal definition.
> 
> I have put in Section 5
> 
> This computation required the introduction of the concept of a reverse Shortest Path Tree (rSPT). The rSPT uses the cost towards the root rather than from it and yields the best paths towards the root from other nodes in the network
> [I-D.draft-bryant-ipfrr-tunnels].

Ok. I think this will suffice for anyone who has been remotely following the IPFRR discussion. 

> 
> If there is a better ref we can swap in Auth48.
> 
> 
> 
> 
>>             3. It would be good to state early on that the current oFIB mechanism is limited to a single link or node failure and that multiple unrelated failures result in reversion to normal FIB convergence.
> Done (in section 2).
>>             4. Make sure the hold down timer is defined precisely and early in the document. Currently, this doesn't happen until section 8.2.
> On first use it says:
> 
> If all events received within some hold-down period (the time that a router waits to acquire a set of LSPs which should be processed together) h

Right. I see that now. 

>>             5. Upon the initial reading, one may think there is some correspondence between the Router (R) in sections 4 and the Router (R) in section 5. Can this be clearer? Perhaps, (R) is not needed in section 4 since in all other sections, it refers to the computing router.
> R is now removed.
> 
> As has been described, a single event such as the failure or restoration of a single link, single router or a linecard may be notified to the rest of the network as a set of individual link change events. It is necessary to deduce from this collection of link state notifications the type of event that has occurred in the network and hence the required ordering.
> 
> When a link change event is received which impacts the receiving router's FIB, the routers at the near and far end of the link are noted.
> 
> If all events received within some hold-down period (the time that a router waits to acquire a set of LSPs which should be processed together) have a single router in common, then it is assumed that the change reflects an event (line-card or router change) concerning that router.
> 
> In the case of a link change event, the router at the far end of the link is deemed to be the common router.
> 
> All ordering computations are based on treating the common router as the root for both link and node events.

Good. 

> 
> 
>>             6. In section 5, I have trouble envisioning a case where a router would not be in an pre or post failure SPT. I guess if it had no loopbacks and only unnumbered interfaces or only interfaces to broadcast links offering a longer path???
> This is the case where you have an unused link (due to costs), i.e. a link not on any shortest path.

That would make sense for a link failure. For a node failure, it is harder to visualize such a topology. 

> 
>>             7. In section 6.2, it would be instructive to say that a Link Down condition is represented by an infinite metric (or otherwise cover this condition).
> I have added "Inclusion or removal of link" to the information list.

Explicit specification works as well. 

>>             8. In section 8.5, I believe this a different hold down timer than the one used to group LSPs related to the same failure.
> Yes.
> 
> I have changes this to AAH_Hold_down_timer expires

Great. 

> 
> Pierre and Mike please check this.
>> 
>> Nits:
>> 
>>       1. Abstract - replace "However mechanism" with "However the mechanism". I chose singular since it is singular in the preceding text.
> done
>>       2. Introduction - replace "base (FIB)" with "bases (FIBs)" in the first sentence.
> Done
> 
>>       3. Page 5, replace "change order no" with "change order, no".
> Done
>>       4. Page 9, suggest adding "IGP " to "reverse connectivity check".
> Done
>>       5. Page 10, suggest using parenthesis rather than relying on arithmetic precedence for equations, e.g., T0 + H + (rank * MAX_FIB)
> Done
>>       6. There is a mixture of "neighbor" and "neighbour" in the document. Of course, I prefer the US English to UK English since this is what all the OSPF RFCs use.
> I will deal - but given we are Europeans... :)

I figured that. ;^)  You did miss one: 

A.3.2.  Per Neighbor State Machine

> 
>>       7. Section 8.1, the actions are formatting inconsistently. In one case, as a paragraph and the other as a list.
> Done
>                     8. Page 19, replace "algorithms i.e." with "algorithms, i.e.".
> Done
>>       9. Page 19 and Page 22, use of (PNSM) and (PN) is inconsistent.
> Done
>>    10. Page 23, Run-on sentence beginning "Manual configuration...".
> Done
>>    11. There some instances where the opening clause for a sentence is preceded with a comma and some where it is not. I prefer the former. For example, section 4.2 appears to be written in a different style with missing punctuation.
>> 
>> 
> I think I got them, but RFC Editor will get this

Thanks,
Acee 



> 
> Thanks for the review
> 
> Stewart
> 
>> Thanks,
>> Acee
>> 
>> 
>> 
>> 
> 
> 
> -- 
> For corporate legal information go to:
> 
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>