Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?

Pierre Francois <pierre.francois@imdea.org> Mon, 20 May 2013 11:57 UTC

Return-Path: <pierre.francois@imdea.org>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8648C21F90D2 for <rtgwg@ietfa.amsl.com>; Mon, 20 May 2013 04:57:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OeUjTTOBSihd for <rtgwg@ietfa.amsl.com>; Mon, 20 May 2013 04:57:44 -0700 (PDT)
Received: from estafeta.imdea.org (maquina46.madrimasd.org [193.145.15.46]) by ietfa.amsl.com (Postfix) with ESMTP id 35B4021F9123 for <rtgwg@ietf.org>; Mon, 20 May 2013 04:57:42 -0700 (PDT)
Received: from localhost (estafeta22.imdea.org [172.17.99.146]) by estafeta22.imdea.org (Postfix) with ESMTP id DBFE125D9C8; Mon, 20 May 2013 13:57:40 +0200 (CEST)
X-Virus-Scanned: by antispam-antivirus system at imdea.org
Received: from estafeta.imdea.org ([172.17.99.146]) by localhost (estafeta22.imdea.org [172.17.99.146]) (amavisd-new, port 10024) with ESMTP id 2PQgvDNAME7b; Mon, 20 May 2013 13:57:40 +0200 (CEST)
Received: from dhcp-10-61-98-65.cisco.com (64-103-25-233.cisco.com [64.103.25.233]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: pierre.francois) by estafeta22.imdea.org (Postfix) with ESMTP id 60B1425D9C7; Mon, 20 May 2013 13:57:40 +0200 (CEST)
From: Pierre Francois <pierre.francois@imdea.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: Progressing with draft-litkowski-rtgwg-uloop-delay-00 ?
Date: Mon, 20 May 2013 13:57:39 +0200
Message-Id: <830E8EED-9BBB-4E58-8C17-BBA721B114D3@imdea.org>
To: "rtgwg@ietf.org" <rtgwg@ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
X-Mailer: Apple Mail (2.1503)
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtgwg>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 May 2013 11:57:49 -0000

Dear rtgwg list members, 

I would like to know your opinion about what we should do with http://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-00 , that we presented in Orlando. 

The idea was to avoid microloops occurring in the direct neighbourhood of a node shutting down or bringing up a link in an IGP topology, by introducing some
fixed delay in the update of the FIB in the down case, and introducing a fixed delay in the propagation of the LSP describing the link as up in the up case. 

The solution is simple, will be released by some in the upcoming months, and the Orlando audience was seeming to find it interesting to work on.

Alia mentioned the interest of comparing this solution with the state of the art before going further with the doc, so here it comes. 

Generally, compared to other solutions, local-delay does not provide full coverage, as it only avoids all (but only)  microloops occurring locally to the affected node. However, 
in many networks, as shown by Stephane's analysis, it is already highly beneficial to have loop avoidance there. Considering the simplicity of the approach, 
this looks like a low hanging fruit. 

Alia was considering a comparison  with PLSN. (described in http://tools.ietf.org/html/draft-ietf-rtgwg-microloop-analysis-01, expired 7 years ago ;) )

The differences with the PLSN approach are the following: 

PLSN lets all routers having to converge for some destinations, try to understand the safety of their new next hops, for each destination.
Based on this assessment, they either 

1. Transiently use a safe, non post-convergence, set of next hops, to finally converge to the post-convergence one, or
2. Transiently use old next-hops, to finally converge to the post-convergence ones. 

Local delay can be defined as a subset of this approach: 
Only the node local to the event applies the procedure. 
Step 1 in PLSN is not applied, we only suggest the node to wait for a fixed time, no transient FIB state. 

I was considering a comparison with oFIB, draft-ietf-rtgwg-ordered-fib , submitted to IESG as informational. 
local-delay can be defined as a subset of this approach:

While oFIB defines an ordering among all the nodes of the network, telling which node should wait for which neighbours to be done with their update, before performing their own, local-delay tells the local node to wait before fast convergence has happened in the rest of the network.

I think that despite the close relationships between these approaches, local-delay is worth being documented on its own because:

It's simple, on its way to be supported, and provides loop avoidance where they happen to be the most annoying.  

Cheers,

Pierre.