Re: [Idr] What are the solutions to address large number of routes convergence caused by Cloud Infrastructure failure described in draft-ietf-rtgwg-net2cloud-problem-statement?

Gyan Mishra <hayabusagsm@gmail.com> Thu, 30 June 2022 08:07 UTC

Return-Path: <hayabusagsm@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B9CEDC15D897; Thu, 30 Jun 2022 01:07:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.094
X-Spam-Level:
X-Spam-Status: No, score=-2.094 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VXkSONDD-yZP; Thu, 30 Jun 2022 01:07:09 -0700 (PDT)
Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DC170C15B265; Thu, 30 Jun 2022 01:06:04 -0700 (PDT)
Received: by mail-pf1-x431.google.com with SMTP id 65so17373306pfw.11; Thu, 30 Jun 2022 01:06:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ssMESskzBRQwhYiYGEa7cnf7DuLPaPgv01IzfUErkXk=; b=giXd9hvAWUMkVLn45Qu/gYW+GT9CfGjtxaHJ6ARRMg8ry1+FAmK+Sfy6saHxvvwYSV bEjnYx7tlubEKlz5f8LPkn+Ow3wmeaSjusFleunQEDw9dmm9/zeh3TGapCIGmly8YuII JeuW416gSJMbsWJxA8KX2XYldjjpDK/4KUBFicAn1mXtUCRaMb6pj3r6F2OrwJdfQ9MX qfGk7hUpAsSTjPc8iAzp5pOse/LmAMwIy9O1IdxMPxPdQgjsic5wFz8spyz/GPa28vJF 4elaAzwnZ9nGxr+dk758peMNN9Jp0nRUXQRSV0obqJJewwbipjeuGmrwcUvGRyyqdDGg Ovzg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ssMESskzBRQwhYiYGEa7cnf7DuLPaPgv01IzfUErkXk=; b=MpHcFFJ6+X0qhAvGrOr6JsPzPfPX5m8Vr+d4AyHe5IoIvXrCET46uZ9NFVY/Ozuf4q Bx1imSFQAS9g/NjNfOypUQ2y850+jknIGNNBsrEPQqmRyUxtz23KDW4AipLr4UwVjK2h JNhblOF2kyrYY5cmqZrrRnX8Qr1eRWM/Z/hOFMIyao5MQh1hhT6aLCVW2LSFiu2X2GBF 9j6Zm6LrAdF4ciCIv/rGxPTT6A1Cx33bLIYJCDT7T1VSrG6/Tdz42t2dPykKbmhUQavR dUy6t5jpydz0Z4y6TEzoSphi/09GMJma0vSyOU07doeFgVV3YMa53Iw611dUlBuz28TE CDow==
X-Gm-Message-State: AJIora+ehPOGe7q7MlDDYMZ5KG8slAia3nI4729pegQNma0w6alHc9y0 BaaD4Un8vinUCRqV4bS6rCTQU1B0Kzf5jfLJzd3i8CyE
X-Google-Smtp-Source: AGRyM1sgq0rokYQxWKAFDT8qi5egv9RYVhGtIPAOtsautNWedc58ExKWOrEGmmqxM+zabj7MWh8s8TN6fPmBUzv3/3w=
X-Received: by 2002:a05:6a00:400b:b0:525:1c50:557a with SMTP id by11-20020a056a00400b00b005251c50557amr13456830pfb.4.1656576364213; Thu, 30 Jun 2022 01:06:04 -0700 (PDT)
MIME-Version: 1.0
References: <CAOj+MMFoa2eXRc2DpaV_5PVssg18fFBWzwvSN5d9P_8YmSj4yQ@mail.gmail.com> <EA65A88E-B068-4810-A42D-2A721BDAAF2C@gmail.com>
In-Reply-To: <EA65A88E-B068-4810-A42D-2A721BDAAF2C@gmail.com>
From: Gyan Mishra <hayabusagsm@gmail.com>
Date: Thu, 30 Jun 2022 03:41:08 -0400
Message-ID: <CABNhwV0hQUZLKeLPqYoYQyV0KyQRHW93o8QNBXnKQekKPyGcqg@mail.gmail.com>
To: Jeff Tantsura <jefftant.ietf@gmail.com>
Cc: Linda Dunbar <linda.dunbar@futurewei.com>, Robert Raszuk <robert@raszuk.net>, idr@ietf.org, rtgwg@ietf.org
Content-Type: multipart/alternative; boundary="000000000000bb79df05e2a5bd9e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/y6Q0MaisX1jerb0zNoWj0cS2Iho>
Subject: Re: [Idr] What are the solutions to address large number of routes convergence caused by Cloud Infrastructure failure described in draft-ietf-rtgwg-net2cloud-problem-statement?
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2022 08:07:13 -0000

Hi Linda

The mass mac withdrawal is an BGP PIC like optimization feature where
instead of updating the next hop of every prefix only the H-FIB next hop is
tracked, similarly with EVPN to combat slow convergence with MSDC massive
MAC VRF use cases, rather than withdrawal of individual mac addresses, the
Ethernet A-D per ES route is withdrawn and any mac pointing to the ES is
marked invalid and purged from the mac VRF on the PE.

In a DC NVO CLOS fabric n-way ECMP scaled out spine RFC 8365 NVO non MPLS
EVPN use case or MPLS EVPN use case RFC 7432, I have not seen any issues
with degradation of service due to mass mac withdrawal optimized
convergence feature caused an outage or problems.

Kind Regards

Gyan

On Wed, Jun 29, 2022 at 9:21 PM Jeff Tantsura <jefftant.ietf@gmail.com>
wrote:

> Linda,
>
> EVPN mass withdraw is an EVPN (as the name suggests)  technology and to my
> memory is supported by all implementations.
>
> Wrt RFC7938 (and to rephrase Robert), in presence of multiple equally
> preferred routes towards a destination, failure of one of the routes need
> not to be propagated downstream, since the destination is still reachable.
> If you happen to use BGP BW communities, then there’s going to be an
> update every time cumulative BW towards destination has changed.
>
> Hope this helps
>
> Cheers,
> Jeff
>
> On Jun 29, 2022, at 15:03, Robert Raszuk <robert@raszuk.net> wrote:
>
> 
>
> Hi Linda,
>
> The most important premise on why BGP can be used in data centers fabrics
> (not that this is a good idea in vast majority of deployments) is based on
> the critical assumption that multipath eBGP is in place.
>
> So single link or switch failure is really a local event and does not need
> to be reflected in any protocol action.
>
> Otherwise use of BGP would be a fatal idea when number of underlay routes
> is relatively high.
>
> With that your email is a bit confusing as you quote rfc7938 which talks
> about how to construct underlay, yet suddenly you bring EVPN which is an
> overlay. You could more likely bring BGP aggregate withdraw idea, but again
> while applicable to WANs in correctly build DCs should have no need.
>
> Thx,
> R.
>
>
> On Wed, Jun 29, 2022 at 11:49 PM Linda Dunbar <linda.dunbar@futurewei.com>
> wrote:
>
>> BGP experts:
>>
>>
>>
>> The Section 3.2 of
>> https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/
>> describes a problem of a Cloud DC infrastructure failure, that may lead to
>> massive route changes.
>>
>>
>>
>>    As described in RFC7938, Cloud DC BGP might not have an IGP to route
>>
>>    around link/node failures within the Assess. Fiber-cut is not uncommon
>>
>>    within Cloud DCs or between sites. Sometimes, an entire cloud data
>>
>>    center goes dark caused by a variety of reasons, such as too many
>>
>>    changes and updates at once, changes of outside of maintenance
>>
>>    windows, cybersecurity threats attacks, cooling failures,
>>
>>    insufficient backup power, etc. When those events happen, massive
>>
>>    numbers of routes need to be changed.
>>
>>
>>
>>    The large number of routes switching over to another site can also
>>
>>    cause overloading that triggers more failures.
>>
>>
>>
>>    In addition, the routes (IP addresses) in a Cloud DC cannot be
>>
>>    aggregated nicely, triggering very large number of BGP UPDATE
>>
>>    messages when a failure occurs.
>>
>>
>>
>> EVPN [RFC7432] defined mass withdraw mechanism to signal a large number
>> of routes being changed to remote PE nodes.
>>
>>
>>
>> Is Mass withdrawn supported by all networks?
>>
>>
>>
>> Thank you
>>
>> Linda Dunbar
>>
>>
>> _______________________________________________
>> rtgwg mailing list
>> rtgwg@ietf.org
>> https://www.ietf.org/mailman/listinfo/rtgwg
>>
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg
>
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg
>
-- 

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *

*Email gyan.s.mishra@verizon.com <gyan.s.mishra@verizon.com>*



*M 301 502-1347*