Re: discussion on fast notification work

Anton Smirnov <asmirnov@cisco.com> Fri, 08 July 2011 09:47 UTC

Return-Path: <asmirnov@cisco.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6DF4721F88BD for <rtgwg@ietfa.amsl.com>; Fri, 8 Jul 2011 02:47:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.449
X-Spam-Level:
X-Spam-Status: No, score=-2.449 tagged_above=-999 required=5 tests=[AWL=0.150, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id byYR4GikOyxK for <rtgwg@ietfa.amsl.com>; Fri, 8 Jul 2011 02:47:51 -0700 (PDT)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by ietfa.amsl.com (Postfix) with ESMTP id 4BAF021F88BA for <rtgwg@ietf.org>; Fri, 8 Jul 2011 02:47:51 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id p689aXbK019390; Fri, 8 Jul 2011 11:36:33 +0200 (CEST)
Received: from asm-lnx.cisco.com (ams-asmirnov-8716.cisco.com [10.55.140.87]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id p689aW5x018015; Fri, 8 Jul 2011 11:36:33 +0200 (CEST)
Message-ID: <4E16CFA0.1010206@cisco.com>
Date: Fri, 08 Jul 2011 11:36:32 +0200
From: Anton Smirnov <asmirnov@cisco.com>
Organization: Cisco Systems
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110616 SUSE/3.1.11 Thunderbird/3.1.11
MIME-Version: 1.0
To: curtis@occnc.com
Subject: Re: discussion on fast notification work
References: <201107072343.p67NhcIJ035360@harbor.orleans.occnc.com>
In-Reply-To: <201107072343.p67NhcIJ035360@harbor.orleans.occnc.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtgwg>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2011 09:47:52 -0000

    Curtis,
    yes, that was my message though without giving too much credit to 
LFA versus other local repair methods and without being too MPLS-centric.

Anton


On 07/08/2011 01:43 AM, Curtis Villamizar wrote:
> Anton,
>
> A key point made below is.
>
>    Protection at the PLR is always faster than inter-node protection.
>
> That said, if someone wants fast protection with full coverage, run
> MPLS with RSVP-TE and enable FRR and join the rest of the world that
> has already figured that out.  Then there is complete (single failure)
> protection at every PLR.  If multiple failures occur, then precomputed
> protection is impractical, notification schemes don't work, and
> flooding needs to work well and SPF and/or CSPF and installation of
> new FIB needs to be fast.
>
> Of course, those that think the idea of running MPLS is way too scary
> or think that reinventing the wheel is great fun can continue this
> conversation as long as they like.  After all if people didn't think
> MPLS was scary we wouldn't even have IPFRR in the first place.
>
> Curtis
>
>
> In message<4E159E72.1000400@cisco.com>
> Anton Smirnov writes:
>
>      Hi András,
>
>   >  2. near instantaneous update of the FIB
>   >
>
>      I am no specialist in FIB implementations but it would appear to me
> that implementations and their requirements vary so much that intention
> itself of improving them all is incorrect and bound to fail.
>
>
>   >  1. near instantaneous notification of failures to neighbour and
> remote nodes
>
>      Here is my vision of the problem:
>      My logic says that good inter-router notification cannot be made as
> fast as good intra-router API notification. So all good local repair
> techniques are intrinsically superior to [even good] inter-router
> notification approach. Superior first of all in speed of restoration but
> obviously things like deployment ease add attractiveness.
>      That is, remote notification technique's niche is squeezed; it can
> be applied as an aid to local repair techniques in those cases where
> network topology provides redundancy but local repair techniques can't
> use it. Since more elaborate local repair techniques are being developed
> which expand their coverage, niche for remote notification technique is
> contracting to the point when people don't want to bother with it (not
> even care to criticize it :-) )
>
>      I am guessing that authors of the proposal don't agree with this
> part: "My logic says that good inter-router notification cannot be made
> as fast as good intra-router API."
>      May I suggest to authors to work on this perception? Otherwise I am
> afraid there again will be total misunderstanding and disinterest.
>
> Anton
>
>
> On 07/07/2011 12:52 PM, András Császár wrote:
>> Dear All,
>>
>> As a recap, the basic idea was to explore how one could approximate
>> 1. near instantaneous notification of failures to neighbour and remote nodes
>> 2. near instantaneous update of the FIB
>>
>> 1 is approximated by a completely dataplane-based fast notification (FN) framework.
>> 2 is approximated by pre-calculating and pre-downloading backup routes for RELEVANT failures and doing the FIB update from within the linecard.
>>
>> Since last IETF, based on the comments we received, we have been working on (and prototyping) a method where FNs are propagated on the shortest path and each hop performs SHA256 authentication in the dataplane before forwarding the packet.
>>
>> Important highlights proving feasibility:
>>
>> - In a 1000-node area with a diameter of 20 hops and 500k external routes, the backup FIB even in a very bad case is not bigger than 30MB with very diverse ECMP (10 ECMP alternatives for each destination). The download of this backup FIB size should be no problem.
>>
>> - A naive serial FIB update procedure after a failure in the above network takes less than 15ms within a dataplane card (assuming 5MT/sec memory performance and 1 memory controller). But there may be more intelligent approaches, such as a lazy (on-demand) FIB update.
>>
>> - In reality, our calculations show that typically only nodes between 1 and 3 hops away need to prepare for a failure, i.e. failures only 1-2-3 hops away are RELEVANT (the above calculation assumes that for each destination needs to prepare for all failures of the 20-hop diameter)
>>
>> - Very important: the FN packet always proceeds AHEAD OF normal data packets, so re-routed data packets typically find nodes on their way which have finished or almost finished reconfiguring. (In this way long links do not cause problems as both FN and normal data packets are delayed the same.)
>>
>> - Pre-calculation complexity is in the same order of magnitude as with Not-Via, and it's done "offline"
>>
>>
>> Conclusions of our naïve implementation are the following:
>>
>> - The solution can be implemented on a current platform, and we don't seem to use any operation that would make it less useful on other platforms including e.g. EZChip NP-4
>>
>> - A FN packet can be originated in less than 200us (micro-sec) after failure detection
>>
>> - An FN packet can be forwarded at each hop in ca. 180us (this already includes SHA256 verification and duplicate check!)
>>
>>
>> András
>>
>>
>>> -----Original Message-----
>>> From: rtgwg-bounces@ietf.org [mailto:rtgwg-bounces@ietf.org]
>>> On Behalf Of Alia Atlas
>>> Sent: 2011. július 6. 22:57
>>> To: rtgwg@ietf.org
>>> Subject: discussion on fast notification work
>>>
>>> The last 2 IETFs, we have had discussions about the idea of fast
>>> notification, as described in
>>> draft-lu-fast-notification-framework, draft-lu-fn-transport-00, and
>>> draft-csaszar-ipfrr-fn-00.
>>>
>>> Since then, I have not seen substantial discussion or interest on the
>>> mailing list.  If you are
>>> interested in this work, have questions about it, or would like to see
>>> RTGWG continue to discuss it,
>>> please send email to this mailing list.  I'd like to see this
>>> conversation happening here before IETF.
>>>
>>> Thanks,
>>> Alia
>>> _______________________________________________
>>> rtgwg mailing list
>>> rtgwg@ietf.org
>>> https://www.ietf.org/mailman/listinfo/rtgwg
>>>
>> _______________________________________________
>> rtgwg mailing list
>> rtgwg@ietf.org
>> https://www.ietf.org/mailman/listinfo/rtgwg
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg
>