Re: [Lsr] Multiple failures in Dynamic Flooding
tony.li@tony.li Mon, 11 March 2019 17:41 UTC
Return-Path: <tony1athome@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 993DA131142; Mon, 11 Mar 2019 10:41:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dXJ8Lzc21rhs; Mon, 11 Mar 2019 10:41:10 -0700 (PDT)
Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7C01713104A; Mon, 11 Mar 2019 10:41:10 -0700 (PDT)
Received: by mail-pf1-x443.google.com with SMTP id n125so4213476pfn.5; Mon, 11 Mar 2019 10:41:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=KVMFxKmrwfz/qh2dQTUccEKKwSSIe4JsPQ9cpSa380o=; b=VTLnDCskg8TUbA9phsrS4kjHbblxxUXqZlDrEq2/EPOEFLE+6wjDYIvXWwqQSMf3b2 5qObkOcZy0EZEfwum/b0SKteKf+/6HFCwOT2ZidpVavK6sVmp/6fdMJgW6axKFN6v9jo /s4Hq2pF6K2t/w6fhG1TZNEfYLnSvi4ucOg8NFjym1BXlCbS3B77tQJtN0SiyBgt4v/2 SKAVU/5Op1l7RVxZwhv8N1fEJivooeyMolNeMxbKbiYJBmGcPEkmi7KfJ59RCyeL5iEI 1P2nQvn/XhWsHDqihjLLrfYAK6B4LoAN673KE4tyIO3V0MfWsALBtx5w5yEGYlYBhDkG 3IZg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=KVMFxKmrwfz/qh2dQTUccEKKwSSIe4JsPQ9cpSa380o=; b=DYjhKddZG7PEq/F+6DezTIP7BkxxpjKmZ+D5NI5b7krOlC0wBIsfoBP0h82FFnrKQU +5Jdwv2C3tYQWKsW7RWc8czG/WHVrMng4vYT1wdp/HwKyxsuj3cineyHWGC12eBBNciT fAP0uOg2QWc2QFo73zse5Nd2E3BuVcZepsk213/TyoA1JWBQfDKDLOz6AqVe0RlOG+OF +QHLuXC4gpGTT1wCb8qZl+E8q6LhmHI4685HckdOBqH7Lnsth4Wx8S/P37fGT/rnr64t tNkByW2sO3TJnAt1IA5sOGz9MD9FcOPmLNlCo3OrDkIalauF6XIyvshjO7s+CSRTX0IM 6epQ==
X-Gm-Message-State: APjAAAWI3sW2nR/1jlGdAIQi6r8cdZi+IS75BWi0wxokVpV4OPA0//0Y 9xSMME1JE69n9jyklVT9pIk=
X-Google-Smtp-Source: APXvYqwxvXw0OWRVY2mJlKwtlvLr2Ndchz/42eIgl8OKajGRnBe6Y0hv2PnapPjNWDeagiNoxmgHqw==
X-Received: by 2002:a62:5c87:: with SMTP id q129mr33596019pfb.180.1552326070018; Mon, 11 Mar 2019 10:41:10 -0700 (PDT)
Received: from [172.22.228.48] ([162.210.130.3]) by smtp.gmail.com with ESMTPSA id e9sm22227180pfh.42.2019.03.11.10.41.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Mar 2019 10:41:09 -0700 (PDT)
Sender: Tony Li <tony1athome@gmail.com>
From: tony.li@tony.li
Message-Id: <10A1CA48-0D09-44FF-95ED-8D52FB867B8B@tony.li>
Content-Type: multipart/alternative; boundary="Apple-Mail=_AB2D3636-C782-4119-92FA-EA69EA66D72B"
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Date: Mon, 11 Mar 2019 10:41:08 -0700
In-Reply-To: <5316A0AB3C851246A7CA5758973207D463B76FDD@sjceml521-mbx.china.huawei.com>
Cc: "lsr@ietf.org" <lsr@ietf.org>, "lsr-chairs@ietf.org" <lsr-chairs@ietf.org>, "lsr-ads@ietf.org" <lsr-ads@ietf.org>
To: Huaimo Chen <huaimo.chen@huawei.com>
References: <sa6lg2md2ok.fsf@chopps.org> <SN6PR11MB284553735B2351FB584BE792C17F0@SN6PR11MB2845.namprd11.prod.outlook.com> <5316A0AB3C851246A7CA5758973207D463B5858A@sjceml521-mbx.china.huawei.com> <420ed1b5-d849-99cc-bcb0-d159783e4de2@cisco.com> <5316A0AB3C851246A7CA5758973207D463B59041@sjceml521-mbx.china.huawei.com> <0B4DF2AC-8EE1-41CA-B357-98325067CA30@gmail.com> <5316A0AB3C851246A7CA5758973207D463B66FE9@sjceml521-mbx.china.huawei.com> <78A866F4-9AF0-481A-9DEC-B04DE72AFDA3@tony.li> <5316A0AB3C851246A7CA5758973207D463B76FDD@sjceml521-mbx.china.huawei.com>
X-Mailer: Apple Mail (2.3445.102.3)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/VHyR7YrNT4ftMrnXufhGa7mm89w>
Subject: Re: [Lsr] Multiple failures in Dynamic Flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Mar 2019 17:41:13 -0000
Hi Huaimo, > In summary for multiple failures, two issues below in draft-li-lsr-dynamyic-flooding are discussed: > 1) how to determine the current flooding topology is split; and > 2) how to repair/connect the flooding topology split. > For the first issue, the discussions are still going on. > For the second issue, repairing/connecting the flooding topology split through Hello protocol extensions does not work. When a “backup path”/connection of multiple hops is needed to connect/repair the flooding topology split, Hello can not go beyond one hop, thus can not repair the flooding topology split in this case. You do not try to repair things remotely, they are always repaired locally. If there are multiple failures in the flooding topology and it is partitioned, then it follows that there are multiple remaining connected components of the flooding topology. Nodes that are adjacent to the failures will update their LSPs and flood them throughout their connected component. Each component will see at least two link failures if there is a partition of the FT and each node in the component can detect that the FT has partitioned. Each node is then capable of enabling temporary flooding on one or more links that will traverse the partition, thereby restoring a functioning FT. The Area Leader then recomputes and redistributes the revised FT. To put it yet another way, repair is fully distributed. You should like that. :-) > >We are not requiring it, but a system could also do a more extensive computation and compare the links between itself and the neighbor > >by tracing the path in the FT and then confirming that each link is up in the LSDB. > > It normally takes a long time such as more than ten minutes to age out and remove an LSP/LSA for the neighbor from the LSDB even though the neighbor is disconnected physically. > How can you decide quickly in tens of milliseconds that the flooding topology is disconnected? You do not wait for LSP/LSA removal. You look for link changes in the LSPs that you do get, or local link changes. > >As we have discussed, this is not a solution. In fact, this is more dangerous than anything else that has been proposed and > >seems highly likely to trigger a cascade failure. You are enabling full flooding for many nodes. In dense topologies, even > >a radius of 3 is very high. For example, in a LS topology, a radius of 3 is sufficient to enable full flooding throughout the > >entire topology. If that were stable, we would not need Dynamic Flooding at all. > > This full flooding is enabled only for a very short time. All it takes is enabling it at sufficient density to create a cascade failure. Milliseconds are sufficient for a collapse. > How do you get that this is more dangerous than anything else and seems highly likely to trigger a cascade failure? Can you give some explanations in details? Again, we do not have absolute metrics on what triggers a cascade failure today. We have several data points of several different implementations at different points in time. We know that in the early ‘90s, a full mesh of 20 neighbors running L1L2 was sufficient. Obviously things have changed somewhat, but even more modern implementations have had problems. This is why the MSDC went to BGP. As a result, we need to be very conservative about what flooding we temporarily enable. We do not want to walk anywhere near the cliff, as the cascade failure is fatal to the network. Tony
- [Lsr] WG Adoption Call for draft-li-lsr-dynamic-f… Christian Hopps
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Acee Lindem (acee)
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… tony.li
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Robert Raszuk
- [Lsr] 答复: WG Adoption Call for draft-li-lsr-dynam… Lizhenbin
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Edward
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Christian Hopps
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… David Allan I
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… steve ulrich
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Peter Psenak
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Naiming Shen (naiming)
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Les Ginsberg (ginsberg)
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Jeff Tantsura
- [Lsr] 答复: WG Adoption Call for draft-li-lsr-dynam… Aijun Wang
- Re: [Lsr] 答复: WG Adoption Call for draft-li-lsr-d… Lizhenbin
- Re: [Lsr] 答复: WG Adoption Call for draft-li-lsr-d… Guyunan (Yunan Gu, IP Technology Research Dept. NW)
- Re: [Lsr] 答复: WG Adoption Call for draft-li-lsr-d… Huzhibo
- [Lsr] 答复: 答复: WG Adoption Call for draft-li-lsr-d… Dongjie (Jimmy)
- Re: [Lsr] 答复: WG Adoption Call for draft-li-lsr-d… Yangang
- Re: [Lsr] 答复: WG Adoption Call for draft-li-lsr-d… Christian Hopps
- Re: [Lsr] 答复: WG Adoption Call for draft-li-lsr-d… John E Drake
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… LEI LIU
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Mankamana Mishra (mankamis)
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Ketan Talaulikar (ketant)
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… sridhar santhanam
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Huaimo Chen
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Peter Psenak
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Christian Hopps
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Sri
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Tony Li
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Christian Hopps
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Huaimo Chen
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Tony Li
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Huaimo Chen
- Re: [Lsr] WG Adoption Call for draft-li-lsr-dynam… Christian Hopps
- [Lsr] Multiple failures in Dynamic Flooding tony.li
- Re: [Lsr] Multiple failures in Dynamic Flooding Les Ginsberg (ginsberg)
- Re: [Lsr] Multiple failures in Dynamic Flooding Huaimo Chen
- Re: [Lsr] Multiple failures in Dynamic Flooding Peter Psenak
- Re: [Lsr] Multiple failures in Dynamic Flooding tony.li