Re: [Lsr] WG Adoption Call for draft-li-lsr-dynamic-flooding-02 + IPR poll.

Huaimo Chen <huaimo.chen@huawei.com> Wed, 06 March 2019 05:05 UTC

Return-Path: <huaimo.chen@huawei.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4C0D3130EB8; Tue, 5 Mar 2019 21:05:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AxaebnSWsVMW; Tue, 5 Mar 2019 21:05:04 -0800 (PST)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 868B012D4EF; Tue, 5 Mar 2019 21:05:03 -0800 (PST)
Received: from lhreml704-cah.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id 4C68AF5014D4A9A00B35; Wed, 6 Mar 2019 05:05:01 +0000 (GMT)
Received: from SJCEML702-CHM.china.huawei.com (10.208.112.38) by lhreml704-cah.china.huawei.com (10.201.108.45) with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 6 Mar 2019 05:05:00 +0000
Received: from SJCEML521-MBX.china.huawei.com ([169.254.1.96]) by SJCEML702-CHM.china.huawei.com ([169.254.4.10]) with mapi id 14.03.0415.000; Tue, 5 Mar 2019 21:04:54 -0800
From: Huaimo Chen <huaimo.chen@huawei.com>
To: Tony Li <tony1athome@gmail.com>
CC: Christian Hopps <chopps@chopps.org>, "lsr@ietf.org" <lsr@ietf.org>, "lsr-chairs@ietf.org" <lsr-chairs@ietf.org>, "lsr-ads@ietf.org" <lsr-ads@ietf.org>
Thread-Topic: [Lsr] WG Adoption Call for draft-li-lsr-dynamic-flooding-02 + IPR poll.
Thread-Index: AQHUwfc3rlPy0Ul4REi1Ut4/82BOw6XsOGoAgAJCW3CAAWolgIADTJLAgACmyQCAClTjYA==
Date: Wed, 06 Mar 2019 05:04:54 +0000
Message-ID: <5316A0AB3C851246A7CA5758973207D463B66FE9@sjceml521-mbx.china.huawei.com>
References: <sa6lg2md2ok.fsf@chopps.org> <SN6PR11MB284553735B2351FB584BE792C17F0@SN6PR11MB2845.namprd11.prod.outlook.com> <5316A0AB3C851246A7CA5758973207D463B5858A@sjceml521-mbx.china.huawei.com> <420ed1b5-d849-99cc-bcb0-d159783e4de2@cisco.com> <5316A0AB3C851246A7CA5758973207D463B59041@sjceml521-mbx.china.huawei.com> <0B4DF2AC-8EE1-41CA-B357-98325067CA30@gmail.com>
In-Reply-To: <0B4DF2AC-8EE1-41CA-B357-98325067CA30@gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.212.244.189]
Content-Type: multipart/alternative; boundary="_000_5316A0AB3C851246A7CA5758973207D463B66FE9sjceml521mbxchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/jCAaD4ShqUO7HqELPLbEg8e4TSI>
Subject: Re: [Lsr] WG Adoption Call for draft-li-lsr-dynamic-flooding-02 + IPR poll.
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Mar 2019 05:05:07 -0000

Hi Tony,

> From: Tony Li [mailto:tony1athome@gmail.com]
> Sent: Wednesday, February 27, 2019 2:07 AM
> To: Huaimo Chen <huaimo.chen@huawei.com>
> Cc: Peter Psenak <ppsenak@cisco.com>; Christian Hopps <chopps@chopps.org>; lsr@ietf.org;
> lsr- chairs@ietf.org; lsr-ads@ietf.org
> Subject: Re: [Lsr] WG Adoption Call for draft-li-lsr-dynamic-flooding-02 + IPR poll.
>
>
> Hi Huaimo,
>
> > > 1)           There is no concrete procedure/method for fault tolerance
> > > to multiple failures. When multiple failures happen and split the
> >>  flooding topology, the convergence time will be increased
> > > significantly without fault tolerance. The longer the convergence
> >>  time, the more the traffic lose.
> >
> > there is a solution for multiple failures - see section 6.7.11.
> >

> > Section 6.7.11 just briefly mentions that the edges of split parts will determine
> > and repair the split after the split of the flooding topology happens. However,
> > there is not any details or description on how to determine or repair the split.
> > This is not useful for implementers.


> I’m sorry that you don’t find it useful. Determining the split is trivial: when you receive an IIH,
> it has a system ID of the another system in it. If that other system is not currently part of the
> flooding topology, then it is quite clear that it is disconnected from the flooding topology.
> Repairing the split is done by enabling temporary flooding on the new link.

For an adjacency between two nodes is up, the Hello packets exchanged between them will not change node/system IDs in them.
How do you determine that other system is not currently part of the flooding topology?


> There is an issue here that we have not yet resolved, which is the rate that new links should be
> temporarily added to the flooding topology.  Some believe that adding any new link is the
> correct thing to do as it minimizes the recovery time. Others feel that enabling too many links
> could cause a flooding collapse, so link addition should be highly constrained. We are still
> discussing this and invite the WG’s opinions.

The issue is resolved by the solutions in draft-cc-lsr-flooding-reduction.
One solution is below, where the given distance can be adjusted/configured.
If we want every node to flood on all its links, we let the given
distance to a big number. If we want the nodes within 2 hops to a failure
to flood on all their links, we set the given distance to 2.
   “In one way, when two or more failures on the current flooding
   topology occur almost in the same time, each of the nodes within a
   given distance (such as 3 hops) to a failure point, floods the link
   state (LS) that it receives to all the links (except for the one from
   which the LS is received) until a new flooding topology is built.”

Another solution is just adding minimum links temporarily on the flooding
topology to repair the split flooding topology until a new flooding topology
is built.

> > > 2)           The extensions to Hello protocols for enabling “temporary
> > flooding” over a new link is not needed.
> >
> > not if you do flooding on every link that comes up. If you want to be smarter, then you need to
> > selectively enable flooding only under specific conditions and that must be done from both sides of
> > the new link.

> > There are only a limited number of conditions (or cases).  In each condition/case, it is
> > deterministic whether we need to enable “temporary flooding” for a new link when it
> > is up.  Thus there is no need for any extensions to Hello protocols for enabling
> > “temporary flooding” on a new link.


> We know of only two cases: (1) the neighbor is not part of the flooding topology and we feel
> that we can add more temporary flooding. (2) The neighbor is not part of the flooding topology
> and we cannot add more temporary flooding.

> Obviously, in the case where we want to add temporary flooding, that TLV is needed in the IIH.


> > For example, suppose that we have a current flooding topology containing all live
> > nodes in an area, when a new link comes up, we may just have two conditions/cases.
> > One condition/case is that the new link is attached to a new node not on the current
> > flooding topology. In this condition/case, the new link needs to be enabled for
> > “temporary flooding” after it is up.


> Agreed, which is why we need the TLV.

The link can be enabled for “temporary flooding” by the node without using any TLV or Hello with the TLV.

> >The other condition/case is that the new link is attached to nodes on the current
> >flooding topology. In this condition/case, there is no need to enable “temporary
> > flooding” on the link.


>Agreed.

>Note that there are some additional corner cases.  Since the two neighbors may not have the
>exact same information, one may consider the other to be on the flooding topology when in fact
> it is not.  This might happen in the case of a node reboot. The IIH TLV gives us an explicit way
> of signaling, rather than simply guessing and sometimes getting it wrong.

The TLV in Hello packet just requests for adding “temporary flooding” on the link. The other information is accessed by the node locally. The TLV in Hello packet does not help for corner case. In the case where a node is rebooted, a new link attached to a new node may apply.


>> > 3)           The extensions to Hello protocols for requesting/signaling
>> > “temporary flooding” for a connection does not work.
>>>
>>> sorry, but if you see a problem, please provide details, saying above is
>>> simply unproductive.

>> “The nodes … will try to repair the flooding topology locally by enabling
>>temporary flooding towards the nodes that they consider disconnected from the
>>flooding topology ...”

>>The above quoted text is from draft-li-lsr-dynamic-flooding-02, where
>> “enabling temporary flooding towards the nodes” is to request/signal
>> “temporary flooding” for a connection to connect partitioned/disconnected
>>flooding topology into one through the extensions to Hello protocols described
>>in draft-li-lsr-dynamic-flooding-02. Right?

>>The extensions to Hello protocols for requesting/signaling “temporary
>>flooding” for a connection to connect partitioned/disconnected flooding
>>topology into one does not work since the connection may have two or more
>>hops and a Hello packet may get lost.


>All adjacencies are a single hop in both IS-IS and OSPF.  Yes, Hello packets may be lost.
>Fortunately, they are periodically transmitted, thus the next transmission will also contain the
> TLV.  If IIH’s are getting lost at a significant rate, then the adjacency will not (and should not)
>come up.  Thus, the request for temporary flooding will propagate to the neighbor in all cases
>that matter.

It takes too long when Hello packet is lost. Repairing split flooding topology needs to be fast.

>>It is not convenient for a user/operator to configure on an area leader since the
>>leader is dynamically selected. How do you address this?


>No configuration is required.  The election algorithm selects the area leader.  The rules are in
>the draft.  An implementation may have a default priority and a default algorithm setting, so no
>configuration is mandatory.  If the operator desires a specific node to become area leader, then
>configuration may be required to adjust the priority.  FWIW, we have this already working in
>our implementation.  It Just Works.

It does not mean that a user/operator configures/select an area leader. It means that a user/operator configures other things such as indicating an algorithm or selecting the centralized mode on the area leader.

>>After the user/operator does some configurations on the (designated) leader, will the
>>backup leader takes over the configurations after the designated leader is down?


>There is no need for a backup leader.  If the area leader is partitioned from the topology, then
> leader election is repeated, resulting in a new leader.  Again, no configuration is required.

The above does not talk about topology split, but about the leader down. After a user/operator has configured some things on the leader, and the leader has got them and distributed them in some form, and then some time later, the leader goes down, a new leader is selected. In this case, will the new leader take and maintain the configurations or the information derived from the configurations done on the old leader.

Best Regards,
Huaimo

>Tony