Re: [Idr] What are the solutions to address large number of routes convergence caused by Cloud Infrastructure failure described in draft-ietf-rtgwg-net2cloud-problem-statement?

Jeff Tantsura <jefftant.ietf@gmail.com> Thu, 30 June 2022 01:21 UTC

Return-Path: <jefftant.ietf@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EEB8EC15AD45; Wed, 29 Jun 2022 18:21:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.103
X-Spam-Level:
X-Spam-Status: No, score=-2.103 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XNtdLEJKPJy5; Wed, 29 Jun 2022 18:21:27 -0700 (PDT)
Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B429EC15AD32; Wed, 29 Jun 2022 18:21:27 -0700 (PDT)
Received: by mail-pj1-x102a.google.com with SMTP id w24so17255703pjg.5; Wed, 29 Jun 2022 18:21:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=OYMAhn/5XMYqwgTqZTqaskIJ68Qf6PUYJqvev8l6o/Y=; b=q37bCcAPDrZKTxkvKTGaJdIATipGkXNLipswz3V0T8oe1+ffMONMbpF0QHivbzqq+z fgkzLx8owgcarl3j/sVsvoRY8SN2V6Rd/rpC9HD36BzSp4PwcBvfdQKrLBCkwQB22+n8 Qtqr1MNLNAv623fcNYNG5m2eKDiIAcm5UCY1GRAb52gOAcjRKBXSCyBdWgiXnh5zGf9j hGYw2ZXA+E0++ke6JTw74PwAtD4Dr04R2GMx5jQTqq2WyBrbMAKOJeA2eEUKm7V9Fpr6 v3VBLwlbUJHSfk2uR5OA+HK7PRHQ4xZlcmSnFWQlADFcgw8KD/tGw2O6qtkZB6QkHP/B /fdA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=OYMAhn/5XMYqwgTqZTqaskIJ68Qf6PUYJqvev8l6o/Y=; b=qZFfF0PoWMtygNb3V4rCBJdWfmDnNsw8c0K07eo4ndv3kF8E0GK6GTbcDTbLXdXiwb +oo3S0kJldA4zbcax87tQl+CuIR7t/3N6bm3POVOnPdKFHyP2DgE3YBJvJU0Q1Frfbh6 MlELdoe4lDDr4geTHsroGbwZzuXHid8/Z2yvlKd2a9Tz5vsLjT0jG7vlhQ8D2FuOahbR xhKVCr/dW33Wf5F9vKNnvAWHFM9pSd5WRtTKLKmNmhW0fMB59MIf2CqPE2JnZdn0stE9 YalYNQJo0jQXDXATEjQuDJulTxRNrcUyk6ioVqON9/PNIE+MIwY+HXxbdi96A1msgyJm wMfw==
X-Gm-Message-State: AJIora956e1Ra/s6I2wPgJ0ntKW99vu1lC6L8FHlnN8F0YvHPRrQ1PqO 4OTEHiC4df2+FRzneY1uq14gsKnqZnA=
X-Google-Smtp-Source: AGRyM1t2UycO0hk1jQxuqoTFCdPUgsR+lvWS1Xhhve5v6e2HhVgRCGoiRRsqtUhYPCqimExsCNQYOQ==
X-Received: by 2002:a17:902:d5c3:b0:169:672:f897 with SMTP id g3-20020a170902d5c300b001690672f897mr11876201plh.71.1656552086259; Wed, 29 Jun 2022 18:21:26 -0700 (PDT)
Received: from smtpclient.apple (c-73-63-232-212.hsd1.ca.comcast.net. [73.63.232.212]) by smtp.gmail.com with ESMTPSA id n17-20020a056a0007d100b0051bada81bc7sm12100948pfu.161.2022.06.29.18.21.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jun 2022 18:21:25 -0700 (PDT)
Content-Type: multipart/alternative; boundary="Apple-Mail-01D86255-A076-4535-883E-9FB78BB985CC"
Content-Transfer-Encoding: 7bit
From: Jeff Tantsura <jefftant.ietf@gmail.com>
Mime-Version: 1.0 (1.0)
Date: Wed, 29 Jun 2022 18:21:24 -0700
Message-Id: <EA65A88E-B068-4810-A42D-2A721BDAAF2C@gmail.com>
References: <CAOj+MMFoa2eXRc2DpaV_5PVssg18fFBWzwvSN5d9P_8YmSj4yQ@mail.gmail.com>
Cc: Linda Dunbar <linda.dunbar@futurewei.com>, rtgwg@ietf.org, idr@ietf.org
In-Reply-To: <CAOj+MMFoa2eXRc2DpaV_5PVssg18fFBWzwvSN5d9P_8YmSj4yQ@mail.gmail.com>
To: Robert Raszuk <robert@raszuk.net>
X-Mailer: iPhone Mail (19F77)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/xWXn637CdveYm4gn3OH4d5NUSJw>
Subject: Re: [Idr] What are the solutions to address large number of routes convergence caused by Cloud Infrastructure failure described in draft-ietf-rtgwg-net2cloud-problem-statement?
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2022 01:21:30 -0000

Linda,

EVPN mass withdraw is an EVPN (as the name suggests)  technology and to my memory is supported by all implementations.

Wrt RFC7938 (and to rephrase Robert), in presence of multiple equally preferred routes towards a destination, failure of one of the routes need not to be propagated downstream, since the destination is still reachable.
If you happen to use BGP BW communities, then there’s going to be an update every time cumulative BW towards destination has changed.

Hope this helps

Cheers,
Jeff

> On Jun 29, 2022, at 15:03, Robert Raszuk <robert@raszuk.net> wrote:
> 
> 
> Hi Linda, 
> 
> The most important premise on why BGP can be used in data centers fabrics (not that this is a good idea in vast majority of deployments) is based on the critical assumption that multipath eBGP is in place. 
> 
> So single link or switch failure is really a local event and does not need to be reflected in any protocol action. 
> 
> Otherwise use of BGP would be a fatal idea when number of underlay routes is relatively high. 
> 
> With that your email is a bit confusing as you quote rfc7938 which talks about how to construct underlay, yet suddenly you bring EVPN which is an overlay. You could more likely bring BGP aggregate withdraw idea, but again while applicable to WANs in correctly build DCs should have no need. 
> 
> Thx,
> R.
> 
> 
>> On Wed, Jun 29, 2022 at 11:49 PM Linda Dunbar <linda.dunbar@futurewei.com> wrote:
>> BGP experts:
>> 
>>  
>> 
>> The Section 3.2 of https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/ describes a problem of a Cloud DC infrastructure failure, that may lead to massive route changes.
>> 
>>  
>> 
>>    As described in RFC7938, Cloud DC BGP might not have an IGP to route
>> 
>>    around link/node failures within the Assess. Fiber-cut is not uncommon
>> 
>>    within Cloud DCs or between sites. Sometimes, an entire cloud data
>> 
>>    center goes dark caused by a variety of reasons, such as too many
>> 
>>    changes and updates at once, changes of outside of maintenance
>> 
>>    windows, cybersecurity threats attacks, cooling failures,
>> 
>>    insufficient backup power, etc. When those events happen, massive
>> 
>>    numbers of routes need to be changed.
>> 
>>  
>> 
>>    The large number of routes switching over to another site can also
>> 
>>    cause overloading that triggers more failures.
>> 
>>  
>> 
>>    In addition, the routes (IP addresses) in a Cloud DC cannot be
>> 
>>    aggregated nicely, triggering very large number of BGP UPDATE
>> 
>>    messages when a failure occurs.
>> 
>>  
>> 
>> EVPN [RFC7432] defined mass withdraw mechanism to signal a large number  of routes being changed to remote PE nodes.
>> 
>>  
>> 
>> Is Mass withdrawn supported by all networks?
>> 
>>  
>> 
>> Thank you
>> 
>> Linda Dunbar
>> 
>>  
>> 
>> _______________________________________________
>> rtgwg mailing list
>> rtgwg@ietf.org
>> https://www.ietf.org/mailman/listinfo/rtgwg
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg