Re: [OPSAWG] Solicit feedback for the cloud site failure impact to forwarding for workloads hosted in Cloud DCs described in draft-ietf-rtgwg-net2cloud-problem-statement

Hesham ElBakoury <helbakoury@gmail.com> Tue, 31 January 2023 17:34 UTC

Return-Path: <helbakoury@gmail.com>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8FA83C1524BC; Tue, 31 Jan 2023 09:34:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O8UmAKrluQ0w; Tue, 31 Jan 2023 09:34:30 -0800 (PST)
Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CB5C1C14EB1A; Tue, 31 Jan 2023 09:34:30 -0800 (PST)
Received: by mail-pj1-x102b.google.com with SMTP id c10-20020a17090a1d0a00b0022e63a94799so4929023pjd.2; Tue, 31 Jan 2023 09:34:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Q2z4Ook1KFufJXTd74R89nbLsSdeX/ub59yq/fDvMFA=; b=oprdzZAaSL94qpNZhFkddfALXL06eVhIS5rZTI876UAvZEl94gJijEkQd7hotK/jPR BKtIvjWvXG73V8Bic6LoMQQVCOC82is4j3Q8AoEtiu3ATfgwmB7gs3deqtX+/zCnQUFf +EUhXTGZDijK4F3ottG0s5nEe8r7OCW2EFVmpwZNvBZTuH6NvgkRmbBs7orIsTFdpa4q 7lTOiUhk0q1wSR89xqc4N2X4jMKhd5ABf6HGyzeOkrxcRfEQ8mzcnc/NuPuBGRNii+u7 NXUQwpRY97nJX8ZraVlS0GlQSiECHVlMJ2C04f0ptw0VQdyq3RKuYWmXy+fgXEqGdWzF LJaQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Q2z4Ook1KFufJXTd74R89nbLsSdeX/ub59yq/fDvMFA=; b=22MtS/ryM2Vpl8UTzwPHhLh+wWpWxl/99/yjxomjtZpXggsZi3AY3py//SStodE3Ck URpCHi/3Xg7A3EuSS+s+9371yQVxexYHIIrEobvg8xI6SCiTCToiDotA7pX43QdzI/+K Vp/N9nzc1H6JvW1qUfHOIEc7wJ1dWV3g14kmVUVvDMOHrktBQUGzXl8j2f3qOEQEclez seQZLNi7/qXj4KczVP+dnOhaSdoyThKIM88wcWpKqoE1agSx4jdazWpbUDd4aw3ainPV 8aQb1adcNmYw07cO2xs8+MW94C2Ydv6OIay5OtjE/nWuRNvX26SWjWVxmznqP7Kc+3aq Ztdg==
X-Gm-Message-State: AO0yUKUEaVo/9cB7Yp4N0YIw1mggaDarenv0LloFEXwpjDlD7dM1LFCL il3Fb3FqrE1e/94O5CnchjJeIXlHpgZ/65atcM1hwtLy
X-Google-Smtp-Source: AK7set+CsKLQADkcQXOXAIlL5f03O9xoGQvyhqlmtaQTd/akrfPzAh8hw81rbcOyLvza5CYgoruyGCGiq9wgzF+7LDo=
X-Received: by 2002:a17:90b:a43:b0:22c:b9f7:cbbe with SMTP id gw3-20020a17090b0a4300b0022cb9f7cbbemr1003115pjb.173.1675186470109; Tue, 31 Jan 2023 09:34:30 -0800 (PST)
MIME-Version: 1.0
References: <CO1PR13MB49205414F4D43B26111F3FAC85CE9@CO1PR13MB4920.namprd13.prod.outlook.com> <PH0PR13MB492204BF215F2E094961726685D39@PH0PR13MB4922.namprd13.prod.outlook.com>
In-Reply-To: <PH0PR13MB492204BF215F2E094961726685D39@PH0PR13MB4922.namprd13.prod.outlook.com>
From: Hesham ElBakoury <helbakoury@gmail.com>
Date: Tue, 31 Jan 2023 09:34:18 -0800
Message-ID: <CAFvDQ9qfkzAGQrgd2HQFF22rbvTjUCiqDMU4BT4qzfggRWKpVw@mail.gmail.com>
To: Linda Dunbar <linda.dunbar@futurewei.com>
Cc: opsawg <opsawg@ietf.org>, rtgwg@ietf.org
Content-Type: multipart/alternative; boundary="0000000000007bc63805f392bec9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/My7gjkC488C1FyYUXTmA1SIQcds>
Subject: Re: [OPSAWG] Solicit feedback for the cloud site failure impact to forwarding for workloads hosted in Cloud DCs described in draft-ietf-rtgwg-net2cloud-problem-statement
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 31 Jan 2023 17:34:34 -0000

Hi Linda,
I am using Google cloud. I can't talk about other cloud providers such as
MSFT Azure and Amazon AWS.

This page describes Google cloud resilience. Hope it provides useful info
for you:
https://cloud.google.com/architecture/disaster-recovery#resilience_and_availability_approach

Hesham

On Mon, Jan 30, 2023, 2:31 PM Linda Dunbar <linda.dunbar@futurewei.com>
wrote:

> Opsawg,
>
>
>
> Section 3.2 of
> https://datatracker.ietf.org/doc/draft-ietf-rtgwg-net2cloud-problem-statement/
>  describes the Cloud Site failure impact to traffic to/from the
> enterprises’ workloads hosted in Cloud DCs.
>
>
>
> We really appreciate your feedback to this description.
>
>
>
> ----------
>
> *3.2. Site failures and Methods to Minimize Impacts*
>
>
>
> Site failures include, but not limited to, a site capacity degradation or
> entire site going down caused by a variety of reasons, such as fiber cut
> connecting to the site or among pods within the site, cooling failures,
> insufficient backup power, cyber threats attacks, too many changes outside
> of the maintenance window, etc. Fiber-cut is not uncommon within a Cloud
> site or between sites.
>
> As described in RFC7938, Cloud DC BGP might not have an IGP to route
> around link/node failures within the ASes.
>
> When those failure events happen, the Cloud DC GW which is visible to
> clients are running fine. Therefore, the Client GW can’t use BFD to detect
> the failures.
>
> When a site capacity degrades or goes dark, there are massive numbers of
> routes needing to be changed.
>
> The large number of routes switching over to another site can also cause
> overloading that triggers more failures.
>
> In addition, the routes (IP addresses) in a Cloud DC cannot be aggregated
> nicely, triggering very large number of BGP UPDATE messages when a failure
> occurs.
>
> It might be more effective to do mass reroute, similar to EVPN [RFC7432]
> defined mass withdraw mechanism to signal a large number of routes being
> changed to remote PE nodes as quickly as possible.
>
> -------------------------------------
>
> Thank you very much
>
> Linda Dunbar
>
>
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg
>