Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?

Alia Atlas <akatlas@gmail.com> Mon, 20 May 2013 15:37 UTC

Return-Path: <akatlas@gmail.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7FD6521F8E3C for <rtgwg@ietfa.amsl.com>; Mon, 20 May 2013 08:37:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, NO_RELAYS=-0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 87bRiczmU0zt for <rtgwg@ietfa.amsl.com>; Mon, 20 May 2013 08:37:11 -0700 (PDT)
Received: from mail-ie0-x236.google.com (mail-ie0-x236.google.com [IPv6:2607:f8b0:4001:c03::236]) by ietfa.amsl.com (Postfix) with ESMTP id DF00B21F881C for <rtgwg@ietf.org>; Mon, 20 May 2013 08:37:10 -0700 (PDT)
Received: by mail-ie0-f182.google.com with SMTP id a14so14273215iee.27 for <rtgwg@ietf.org>; Mon, 20 May 2013 08:37:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=ju2/zgwdN7nnCSTP0wU0lecrZfqLQdA1wTceRdqR7mc=; b=hltFcX+u3nL+Vh/P+YOhn7HY6mH4MiGTy5G+vP50SmrXqnGsiw8mILmmCFRvHKx8kN 01GBOegntxMf0vZQuVIomB6MjcZtnnmdpUwNc75XkE60Bm2YIFhQ5USQbwSu261UOCTM vPpJzwYR1B6v7MkhMxYur5Waf0hjmjV8MnQ3vRcQOhDL/xYzdhTL3GQXhX8hjsx6O7Zp +cQ8a5p5oAlMW0W+8Xj8XidSf88bjMpJnLEuBjkNGj8QYYT9OfmiaGZXq2iAJvRjD+4z Pgc3JAd7BqCmTcjm7Lhx9HDlJZt7VgjPDYm/wdSphyTZbWFu4cJwh+gYSVYDQpzxXwYn UVWw==
MIME-Version: 1.0
X-Received: by 10.50.176.166 with SMTP id cj6mr5685425igc.56.1369064230257; Mon, 20 May 2013 08:37:10 -0700 (PDT)
Received: by 10.64.24.170 with HTTP; Mon, 20 May 2013 08:37:10 -0700 (PDT)
In-Reply-To: <F7B4C907-297E-4F5B-AC22-36CA75688C35@imdea.org>
References: <830E8EED-9BBB-4E58-8C17-BBA721B114D3@imdea.org> <CAG4d1rfmP8R02311FJGqwW84kZEAwqupAoCS4yDp=LcVjxXZcA@mail.gmail.com> <F7B4C907-297E-4F5B-AC22-36CA75688C35@imdea.org>
Date: Mon, 20 May 2013 11:37:10 -0400
Message-ID: <CAG4d1rd_qTGd-6+8TMUij=iOt-xc99yOxr03Wiv_ivTDHUDpYQ@mail.gmail.com>
Subject: Re: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?
From: Alia Atlas <akatlas@gmail.com>
To: Pierre Francois <pierre.francois@imdea.org>
Content-Type: multipart/alternative; boundary="e89a8f6435a21e8a3a04dd281bb7"
Cc: "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtgwg>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 May 2013 15:37:12 -0000

Pierre,

The end result should be an updated draft.  The list is good for discussing
what should go in and why :-)

Regards,
Alia


On Mon, May 20, 2013 at 11:27 AM, Pierre Francois <pierre.francois@imdea.org
> wrote:

>
> Alia,
>
> Thanks for your quick feedback.
>
> Let me check with my co-authors on whether we should change the doc to
> answer your comments and come back
> for a discussion based on ink-on-paper, or answer on the list. I am afraid
> of a never-ending thread for the later :)
>
> Cheers,
>
> Pierre.
>
> On May 20, 2013, at 5:04 PM, Alia Atlas <akatlas@gmail.com> wrote:
>
> Hi Pierre,
>
> Thank you for starting the conversation and a quick intro on the
> differences.
>
> When I look at this draft and PLSN, what I see is that the PLR is
> definitionally either a type B router (since
> it has an alternate that is safe for forwarding traffic or for link up
> it's old primary) and that the PLR is then the
> only router to apply the basic procedure.  However, the PLR may not have
> an alternate available, unless MRT is used.
>
> As draft-ietf-rtgwg-microloop-analysis-01 says in Sec 3.3:
>
> " Another distinct situation is when the router does not support IPFRR or
> could not repair the failure, the new primary next-hops do not satisfy
> the safety condition, and there's no other neighbor that does, i.e. a
> type-C situation. Unlike other routers in the network, the router
> directly connected to the network does not have the old next-hop any more,
> and cannot continue using it. Immediately switching to the new next-hops,
> on the other hand, may result in a micro-loop. In this situation, the
> router MUST discard traffic forwarded along the affected route for the
> duration of DELAY_TYPEC, and then update the routes. Implementations MAY
> have a configuration option to allow switching immediately to the new
> next-hops for situations where this type of a micro-loop is not a concern.
> If implemented, this option MUST be disabled by default."
>
> Granted, this discarding becomes the default behavior
> for draft-litkowski-rtgwg-uloop-delay-00, but the reasoning and trade-offs
> are not discussed.
>
> In the analysis given in draft-litkowski-rtgwg-uloop-delay-00, the benefit
> discussed is only in terms of local
> microloops and completely ignores non-local microloops.  I know that this
> particular technique is not solving
> the remote microloops problem - but those are a real problem and without
> even attempting to characterize that,
> there's little way of telling whether the local microloops are 1% of the
> problem or 99%.
>
> That the technique can apply when only the PLR does it is not as
> interesting as having a more general technique
> that works for traffic from routers that implement it and does not cause
> problems.
>
> Obviously, the WG debated this issue quite some time ago and was willing
> to go for a simpler partial solution (PLSN)
> over OFIB that gave similar coverage to RLFA.
>
> Is your current argument that this even simpler and more partial a
> solution might gain some traction?  Or is it that this
> was simpler to implement and provides some mitigation?
>
> In addition to lacking any guidance on the scale of the total problem that
> it solves, the draft also lacks details to handle
> the cases where the network hasn't been stable.  Granted, the latter is
> not deeply complex - but the solution isn't safely
> usable without it.
>
> I think that we as a WG need to do 4 things:
>     a) Understand the scope of the total microloop problem and what
> fraction of this that draft-litkowski-rtgwg-uloop-delay-00 actually can
> solve.   Does it handle asymmetric link-costs and multi-hop micro-loops?
>  Better examples of what types of local microloops are handled and why
> other types aren't protected would be useful.  How would an operator be
> certain as to what protection would be provided or how to engineer a
> network to obtain it?
>     b) Have a draft that fully describes the problem, the trade-offs, and
> the solution in detail rather than just a brief conceptual overview.
>     c) Understand the computation and complexity trade-offs between the
> different solutions - given that LFA is already assumed for it to be useful.
>     d) Discuss how partial a solution is desirable to standardize and the
> pros/cons of having a worse solution standardized.   Implementations aren't
> free - and by standardizing a more partial solution, this can delay
> implementations of a better solution.
>
> I understand the desire to standardize something and to take something
> that seems straightforward and is likely useful to at least one network,
> but given the WG track record, at a minimum, I think we must have a more
> complete draft that fully documents the solution in detail and compares it
> fairly.
>
> Regards,
> Alia
>
>
> On Mon, May 20, 2013 at 7:57 AM, Pierre Francois <
> pierre.francois@imdea.org> wrote:
>
>>
>>
>> Dear rtgwg list members,
>>
>> I would like to know your opinion about what we should do with
>> http://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-00 , that
>> we presented in Orlando.
>>
>> The idea was to avoid microloops occurring in the direct neighbourhood of
>> a node shutting down or bringing up a link in an IGP topology, by
>> introducing some
>> fixed delay in the update of the FIB in the down case, and introducing a
>> fixed delay in the propagation of the LSP describing the link as up in the
>> up case.
>>
>> The solution is simple, will be released by some in the upcoming months,
>> and the Orlando audience was seeming to find it interesting to work on.
>>
>> Alia mentioned the interest of comparing this solution with the state of
>> the art before going further with the doc, so here it comes.
>>
>> Generally, compared to other solutions, local-delay does not provide full
>> coverage, as it only avoids all (but only)  microloops occurring locally to
>> the affected node. However,
>> in many networks, as shown by Stephane's analysis, it is already highly
>> beneficial to have loop avoidance there. Considering the simplicity of the
>> approach,
>> this looks like a low hanging fruit.
>>
>> Alia was considering a comparison  with PLSN. (described in
>> http://tools.ietf.org/html/draft-ietf-rtgwg-microloop-analysis-01,
>> expired 7 years ago ;) )
>>
>> The differences with the PLSN approach are the following:
>>
>> PLSN lets all routers having to converge for some destinations, try to
>> understand the safety of their new next hops, for each destination.
>> Based on this assessment, they either
>>
>> 1. Transiently use a safe, non post-convergence, set of next hops, to
>> finally converge to the post-convergence one, or
>> 2. Transiently use old next-hops, to finally converge to the
>> post-convergence ones.
>>
>> Local delay can be defined as a subset of this approach:
>> Only the node local to the event applies the procedure.
>> Step 1 in PLSN is not applied, we only suggest the node to wait for a
>> fixed time, no transient FIB state.
>>
>> I was considering a comparison with oFIB, draft-ietf-rtgwg-ordered-fib ,
>> submitted to IESG as informational.
>> local-delay can be defined as a subset of this approach:
>>
>> While oFIB defines an ordering among all the nodes of the network,
>> telling which node should wait for which neighbours to be done with their
>> update, before performing their own, local-delay tells the local node to
>> wait before fast convergence has happened in the rest of the network.
>>
>> I think that despite the close relationships between these approaches,
>> local-delay is worth being documented on its own because:
>>
>> It's simple, on its way to be supported, and provides loop avoidance
>> where they happen to be the most annoying.
>>
>> Cheers,
>>
>> Pierre.
>>
>>
>> _______________________________________________
>> rtgwg mailing list
>> rtgwg@ietf.org
>> https://www.ietf.org/mailman/listinfo/rtgwg
>>
>
>
>