Re: discussion on fast notification work

Anton Smirnov <asmirnov@cisco.com> Thu, 07 July 2011 12:02 UTC

Return-Path: <asmirnov@cisco.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C988A22800E for <rtgwg@ietfa.amsl.com>; Thu, 7 Jul 2011 05:02:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.299
X-Spam-Level:
X-Spam-Status: No, score=-2.299 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BS0mmdYhENZS for <rtgwg@ietfa.amsl.com>; Thu, 7 Jul 2011 05:02:08 -0700 (PDT)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by ietfa.amsl.com (Postfix) with ESMTP id E368E21F86B4 for <rtgwg@ietf.org>; Thu, 7 Jul 2011 05:02:07 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id p67BsRde019667; Thu, 7 Jul 2011 13:54:27 +0200 (CEST)
Received: from asm-lnx.cisco.com (ams-asmirnov-8716.cisco.com [10.55.140.87]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id p67BsQpg021427; Thu, 7 Jul 2011 13:54:27 +0200 (CEST)
Message-ID: <4E159E72.1000400@cisco.com>
Date: Thu, 07 Jul 2011 13:54:26 +0200
From: Anton Smirnov <asmirnov@cisco.com>
Organization: Cisco Systems
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110616 SUSE/3.1.11 Thunderbird/3.1.11
MIME-Version: 1.0
To: András Császár <Andras.Csaszar@ericsson.com>
Subject: Re: discussion on fast notification work
References: <CAG4d1rfNthpfrHDzPASL5UVgP8ixXCDQY4KZSerRqx9YUriOpA@mail.gmail.com> <8DCD771BDA4A394E9BCBA8932E8392973216EA6154@ESESSCMS0363.eemea.ericsson.se>
In-Reply-To: <8DCD771BDA4A394E9BCBA8932E8392973216EA6154@ESESSCMS0363.eemea.ericsson.se>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtgwg>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2011 12:02:08 -0000

    Hi András,

 > 2. near instantaneous update of the FIB
 >

    I am no specialist in FIB implementations but it would appear to me 
that implementations and their requirements vary so much that intention 
itself of improving them all is incorrect and bound to fail.


 > 1. near instantaneous notification of failures to neighbour and 
remote nodes

    Here is my vision of the problem:
    My logic says that good inter-router notification cannot be made as 
fast as good intra-router API notification. So all good local repair 
techniques are intrinsically superior to [even good] inter-router 
notification approach. Superior first of all in speed of restoration but 
obviously things like deployment ease add attractiveness.
    That is, remote notification technique's niche is squeezed; it can 
be applied as an aid to local repair techniques in those cases where 
network topology provides redundancy but local repair techniques can't 
use it. Since more elaborate local repair techniques are being developed 
which expand their coverage, niche for remote notification technique is 
contracting to the point when people don't want to bother with it (not 
even care to criticize it :-) )

    I am guessing that authors of the proposal don't agree with this 
part: "My logic says that good inter-router notification cannot be made 
as fast as good intra-router API."
    May I suggest to authors to work on this perception? Otherwise I am 
afraid there again will be total misunderstanding and disinterest.

Anton


On 07/07/2011 12:52 PM, András Császár wrote:
> Dear All,
>
> As a recap, the basic idea was to explore how one could approximate
> 1. near instantaneous notification of failures to neighbour and remote nodes
> 2. near instantaneous update of the FIB
>
> 1 is approximated by a completely dataplane-based fast notification (FN) framework.
> 2 is approximated by pre-calculating and pre-downloading backup routes for RELEVANT failures and doing the FIB update from within the linecard.
>
> Since last IETF, based on the comments we received, we have been working on (and prototyping) a method where FNs are propagated on the shortest path and each hop performs SHA256 authentication in the dataplane before forwarding the packet.
>
> Important highlights proving feasibility:
>
> - In a 1000-node area with a diameter of 20 hops and 500k external routes, the backup FIB even in a very bad case is not bigger than 30MB with very diverse ECMP (10 ECMP alternatives for each destination). The download of this backup FIB size should be no problem.
>
> - A naive serial FIB update procedure after a failure in the above network takes less than 15ms within a dataplane card (assuming 5MT/sec memory performance and 1 memory controller). But there may be more intelligent approaches, such as a lazy (on-demand) FIB update.
>
> - In reality, our calculations show that typically only nodes between 1 and 3 hops away need to prepare for a failure, i.e. failures only 1-2-3 hops away are RELEVANT (the above calculation assumes that for each destination needs to prepare for all failures of the 20-hop diameter)
>
> - Very important: the FN packet always proceeds AHEAD OF normal data packets, so re-routed data packets typically find nodes on their way which have finished or almost finished reconfiguring. (In this way long links do not cause problems as both FN and normal data packets are delayed the same.)
>
> - Pre-calculation complexity is in the same order of magnitude as with Not-Via, and it's done "offline"
>
>
> Conclusions of our naïve implementation are the following:
>
> - The solution can be implemented on a current platform, and we don't seem to use any operation that would make it less useful on other platforms including e.g. EZChip NP-4
>
> - A FN packet can be originated in less than 200us (micro-sec) after failure detection
>
> - An FN packet can be forwarded at each hop in ca. 180us (this already includes SHA256 verification and duplicate check!)
>
>
> András
>
>
>> -----Original Message-----
>> From: rtgwg-bounces@ietf.org [mailto:rtgwg-bounces@ietf.org]
>> On Behalf Of Alia Atlas
>> Sent: 2011. július 6. 22:57
>> To: rtgwg@ietf.org
>> Subject: discussion on fast notification work
>>
>> The last 2 IETFs, we have had discussions about the idea of fast
>> notification, as described in
>> draft-lu-fast-notification-framework, draft-lu-fn-transport-00, and
>> draft-csaszar-ipfrr-fn-00.
>>
>> Since then, I have not seen substantial discussion or interest on the
>> mailing list.  If you are
>> interested in this work, have questions about it, or would like to see
>> RTGWG continue to discuss it,
>> please send email to this mailing list.  I'd like to see this
>> conversation happening here before IETF.
>>
>> Thanks,
>> Alia
>> _______________________________________________
>> rtgwg mailing list
>> rtgwg@ietf.org
>> https://www.ietf.org/mailman/listinfo/rtgwg
>>
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg