Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?

Pierre Francois <pierre.francois@imdea.org> Mon, 20 May 2013 15:27 UTC

Return-Path: <pierre.francois@imdea.org>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A388221F93BF for <rtgwg@ietfa.amsl.com>; Mon, 20 May 2013 08:27:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sA-s8UP3gtNK for <rtgwg@ietfa.amsl.com>; Mon, 20 May 2013 08:27:43 -0700 (PDT)
Received: from estafeta.imdea.org (maquina46.madrimasd.org [193.145.15.46]) by ietfa.amsl.com (Postfix) with ESMTP id F06A221F8F12 for <rtgwg@ietf.org>; Mon, 20 May 2013 08:27:42 -0700 (PDT)
Received: from localhost (estafeta21.imdea.org [172.17.99.144]) by estafeta21.imdea.org (Postfix) with ESMTP id 0B60F18AA02; Mon, 20 May 2013 17:27:43 +0200 (CEST)
X-Virus-Scanned: by antispam-antivirus system at imdea.org
Received: from estafeta.imdea.org ([172.17.99.144]) by localhost (estafeta21.imdea.org [172.17.99.144]) (amavisd-new, port 10024) with ESMTP id VWR17Ptls6iG; Mon, 20 May 2013 17:27:42 +0200 (CEST)
Received: from [192.168.1.60] (rosnet.u-strasbg.fr [130.79.48.19]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: pierre.francois) by estafeta21.imdea.org (Postfix) with ESMTP id 8645918AA01; Mon, 20 May 2013 17:27:42 +0200 (CEST)
Content-Type: multipart/alternative; boundary="Apple-Mail=_0DF5ABBD-E6D3-4E82-BC4E-66933124CD8D"
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?
From: Pierre Francois <pierre.francois@imdea.org>
In-Reply-To: <CAG4d1rfmP8R02311FJGqwW84kZEAwqupAoCS4yDp=LcVjxXZcA@mail.gmail.com>
Date: Mon, 20 May 2013 17:27:40 +0200
Message-Id: <F7B4C907-297E-4F5B-AC22-36CA75688C35@imdea.org>
References: <830E8EED-9BBB-4E58-8C17-BBA721B114D3@imdea.org> <CAG4d1rfmP8R02311FJGqwW84kZEAwqupAoCS4yDp=LcVjxXZcA@mail.gmail.com>
To: Alia Atlas <akatlas@gmail.com>
X-Mailer: Apple Mail (2.1503)
Cc: "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtgwg>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 May 2013 15:27:48 -0000

Alia, 

Thanks for your quick feedback. 

Let me check with my co-authors on whether we should change the doc to answer your comments and come back 
for a discussion based on ink-on-paper, or answer on the list. I am afraid of a never-ending thread for the later :)

Cheers,

Pierre.

On May 20, 2013, at 5:04 PM, Alia Atlas <akatlas@gmail.com> wrote:

> Hi Pierre,
> 
> Thank you for starting the conversation and a quick intro on the differences.
> 
> When I look at this draft and PLSN, what I see is that the PLR is definitionally either a type B router (since
> it has an alternate that is safe for forwarding traffic or for link up it's old primary) and that the PLR is then the
> only router to apply the basic procedure.  However, the PLR may not have an alternate available, unless MRT is used.
> 
> As draft-ietf-rtgwg-microloop-analysis-01 says in Sec 3.3:
> 
> " Another distinct situation is when the router does not support IPFRR or could not repair the failure, the new primary next-hops do not satisfy the safety condition, and there's no other neighbor that does, i.e. a type-C situation. Unlike other routers in the network, the router directly connected to the network does not have the old next-hop any more, and cannot continue using it. Immediately switching to the new next-hops, on the other hand, may result in a micro-loop. In this situation, the router MUST discard traffic forwarded along the affected route for the duration of DELAY_TYPEC, and then update the routes. Implementations MAY have a configuration option to allow switching immediately to the new next-hops for situations where this type of a micro-loop is not a concern. If implemented, this option MUST be disabled by default."
> 
> Granted, this discarding becomes the default behavior for draft-litkowski-rtgwg-uloop-delay-00, but the reasoning and trade-offs are not discussed.
> 
> In the analysis given in draft-litkowski-rtgwg-uloop-delay-00, the benefit discussed is only in terms of local
> microloops and completely ignores non-local microloops.  I know that this particular technique is not solving
> the remote microloops problem - but those are a real problem and without even attempting to characterize that,
> there's little way of telling whether the local microloops are 1% of the problem or 99%.
> 
> That the technique can apply when only the PLR does it is not as interesting as having a more general technique 
> that works for traffic from routers that implement it and does not cause problems.
> 
> Obviously, the WG debated this issue quite some time ago and was willing to go for a simpler partial solution (PLSN)
> over OFIB that gave similar coverage to RLFA.
> 
> Is your current argument that this even simpler and more partial a solution might gain some traction?  Or is it that this
> was simpler to implement and provides some mitigation?
> 
> In addition to lacking any guidance on the scale of the total problem that it solves, the draft also lacks details to handle
> the cases where the network hasn't been stable.  Granted, the latter is not deeply complex - but the solution isn't safely
> usable without it.
> 
> I think that we as a WG need to do 4 things:
>     a) Understand the scope of the total microloop problem and what fraction of this that draft-litkowski-rtgwg-uloop-delay-00 actually can solve.   Does it handle asymmetric link-costs and multi-hop micro-loops?  Better examples of what types of local microloops are handled and why other types aren't protected would be useful.  How would an operator be certain as to what protection would be provided or how to engineer a network to obtain it?  
>     b) Have a draft that fully describes the problem, the trade-offs, and the solution in detail rather than just a brief conceptual overview.
>     c) Understand the computation and complexity trade-offs between the different solutions - given that LFA is already assumed for it to be useful.
>     d) Discuss how partial a solution is desirable to standardize and the pros/cons of having a worse solution standardized.   Implementations aren't free - and by standardizing a more partial solution, this can delay implementations of a better solution.
> 
> I understand the desire to standardize something and to take something that seems straightforward and is likely useful to at least one network, but given the WG track record, at a minimum, I think we must have a more complete draft that fully documents the solution in detail and compares it fairly. 
> 
> Regards,
> Alia
> 
> 
> On Mon, May 20, 2013 at 7:57 AM, Pierre Francois <pierre.francois@imdea.org> wrote:
> 
> 
> Dear rtgwg list members,
> 
> I would like to know your opinion about what we should do with http://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-00 , that we presented in Orlando.
> 
> The idea was to avoid microloops occurring in the direct neighbourhood of a node shutting down or bringing up a link in an IGP topology, by introducing some
> fixed delay in the update of the FIB in the down case, and introducing a fixed delay in the propagation of the LSP describing the link as up in the up case.
> 
> The solution is simple, will be released by some in the upcoming months, and the Orlando audience was seeming to find it interesting to work on.
> 
> Alia mentioned the interest of comparing this solution with the state of the art before going further with the doc, so here it comes.
> 
> Generally, compared to other solutions, local-delay does not provide full coverage, as it only avoids all (but only)  microloops occurring locally to the affected node. However,
> in many networks, as shown by Stephane's analysis, it is already highly beneficial to have loop avoidance there. Considering the simplicity of the approach,
> this looks like a low hanging fruit.
> 
> Alia was considering a comparison  with PLSN. (described in http://tools.ietf.org/html/draft-ietf-rtgwg-microloop-analysis-01, expired 7 years ago ;) )
> 
> The differences with the PLSN approach are the following:
> 
> PLSN lets all routers having to converge for some destinations, try to understand the safety of their new next hops, for each destination.
> Based on this assessment, they either
> 
> 1. Transiently use a safe, non post-convergence, set of next hops, to finally converge to the post-convergence one, or
> 2. Transiently use old next-hops, to finally converge to the post-convergence ones.
> 
> Local delay can be defined as a subset of this approach:
> Only the node local to the event applies the procedure.
> Step 1 in PLSN is not applied, we only suggest the node to wait for a fixed time, no transient FIB state.
> 
> I was considering a comparison with oFIB, draft-ietf-rtgwg-ordered-fib , submitted to IESG as informational.
> local-delay can be defined as a subset of this approach:
> 
> While oFIB defines an ordering among all the nodes of the network, telling which node should wait for which neighbours to be done with their update, before performing their own, local-delay tells the local node to wait before fast convergence has happened in the rest of the network.
> 
> I think that despite the close relationships between these approaches, local-delay is worth being documented on its own because:
> 
> It's simple, on its way to be supported, and provides loop avoidance where they happen to be the most annoying.
> 
> Cheers,
> 
> Pierre.
> 
> 
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg
>