Re: [Lsr] Comments regarding convergence regarding draft-cc-lsr-flooding-reduction and draft-li-lsr-dynamic-flooding

tony.li@tony.li Tue, 29 January 2019 20:46 UTC

Return-Path: <tony1athome@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB44F130EAE; Tue, 29 Jan 2019 12:46:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NTnuEO5nCo_h; Tue, 29 Jan 2019 12:46:15 -0800 (PST)
Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D2EA4130F3B; Tue, 29 Jan 2019 12:46:15 -0800 (PST)
Received: by mail-pf1-x433.google.com with SMTP id u6so10210068pfh.11; Tue, 29 Jan 2019 12:46:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=oToXRQixm+ZCbQqfMcwFs8lEZBYyrJw0G95rkhk4RVE=; b=CGVmM7r6P2GSyLiIw0vtt232VR+PpkSJ0Xm3x93xddJXVnU3CuTWUMEXWmXPVJXhFe KwXExGmUndSGx5YUJVoX/TwZwB6hn3JJTJqUtYCO3DYAkR/flifd8YSE6w96xgCKv6lj +t4thKkodtDv3wJ9zoYY8vYvS4jPw2tFFf177znyuIpa7uCLqGhwtnX0CLv18yd2abMG mIPGF3mYw02ZS72yie3gOtAlBdMxe1JIjogiLthP0S6JEY8liAL9t0Uz0rHGbLp6NV7t pSJRdcdBhnFfr1xdVqaZt1A6MvIc1TlNr9LncY6OxG/Jb3sJDKDK0JppCryDstDBQrEB rllg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:mime-version:subject:from:in-reply-to :date:cc:content-transfer-encoding:message-id:references:to; bh=oToXRQixm+ZCbQqfMcwFs8lEZBYyrJw0G95rkhk4RVE=; b=L+gScbJivCAso8pBmnEZuzn8nS+Go0mSKqMnz0ajX5t9HMJMVa3PDPHpIfXOYSX8+j tk2f6TiiskVAlv4qeoDTamcR5qavq+F5v4kzzl7R7Ydiv/8g0tadh2eURJs+spwvPrVg Rp+bCAJe5DmugwloJFw6BqivA6cEcxZ82f9yxle4DiYzX682T8MsZXWfK0gfLI28X0A2 d6a94rnrRBL8nd3B07n7+Fs0NnKrxODZQV4Vhp+a9wJL2/E7SAtki8Fug5a/kPCCIVPw oos8Jy5bvEmKyMQ85fchtdMRd1PWQ6Rzt8/fWrxmc4GlW/eBsDZvDxUjpQciJ00CeBMw WsPg==
X-Gm-Message-State: AJcUukdNPyonOPPD++YgAdZdvTvTBGeVZlbaw08MOyp69epEpy1bIyxT YOfiOCa4c4grZnFKv0uC/Jc=
X-Google-Smtp-Source: ALg8bN4dgxE7ye9x46IE2yVYnuj8x7vZB+w0WwDhvI3UWfeQkIBf+wdNBRbDKrV5frpBV0ve7HFuEQ==
X-Received: by 2002:a62:f54f:: with SMTP id n76mr27780169pfh.59.1548794775242; Tue, 29 Jan 2019 12:46:15 -0800 (PST)
Received: from [172.22.228.216] ([162.210.130.3]) by smtp.gmail.com with ESMTPSA id g2sm43507118pfi.95.2019.01.29.12.46.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Jan 2019 12:46:14 -0800 (PST)
Sender: Tony Li <tony1athome@gmail.com>
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
From: tony.li@tony.li
In-Reply-To: <5316A0AB3C851246A7CA5758973207D463B3A32F@sjceml521-mbx.china.huawei.com>
Date: Tue, 29 Jan 2019 12:46:13 -0800
Cc: "Van De Velde, Gunter (Nokia - BE/Antwerp)" <gunter.van_de_velde@nokia.com>, "lsr@ietf.org" <lsr@ietf.org>, Yingzhen Qu <yingzhen.ietf@gmail.com>, Christian Hopps <chopps@chopps.org>, "draft-li-lsr-dynamic-flooding@ietf.org" <draft-li-lsr-dynamic-flooding@ietf.org>, Alvaro Retana <aretana.ietf@gmail.com>, "draft-cc-lsr-flooding-reduction@ietf.org" <draft-cc-lsr-flooding-reduction@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0FE21364-D23A-4CFE-8559-A9125EC01E77@tony.li>
References: <DB7PR07MB5148B48096A7AD81A385DCFEE09A0@DB7PR07MB5148.eurprd07.prod.outlook.com> <5316A0AB3C851246A7CA5758973207D463B3A32F@sjceml521-mbx.china.huawei.com>
To: Huaimo Chen <huaimo.chen@huawei.com>
X-Mailer: Apple Mail (2.3445.102.3)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/w4hqsVA1AMnO7NoHF_ej2qR5GeI>
Subject: Re: [Lsr] Comments regarding convergence regarding draft-cc-lsr-flooding-reduction and draft-li-lsr-dynamic-flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Jan 2019 20:46:20 -0000

Hi all,

I wanted to address some of the points that Huaimo has raised.  I’ve taken the editorial liberty of reformatting some of 
Huaimo’s comments.

> 1.                  Flooding topology encoding
> 
> The encoding in "draft-li-lsr-dynamic-flooding" comprises two parts. One part is the encoding of the flooding topology using the TLVs of the paths constituting the flooding topology; the second part is the encoding of the mapping from the nodes in the area to their indexes. The TLVs use the indexes of the nodes in the paths.

> There are two encodings in "draft-cc-lsr-flooding-reduction", each of which
> can be used to encode the flooding topology. One encoding is called "Links
> Encoding". The other is called "Block Encoding”.


Not distributing the encoding of the indices is an issue that needs to be addressed. Without an explicit encoding, all systems are forced to rely on the complete and area-wide synchronization of the entire LSDB. This seems like it is a significant risk. What happens if there is an expired LSP and one system has expunged it from the database but another has not? This would perturb the indices. If a new node has been added to the topology and its presence has not completed flooding, then the indices would also be perturbed. 

> The former encodes the links between a local node and its adjacent nodes using the compact indexes of these nodes. The whole flooding topology is represented by the links and 
> nodes on it. This is like the normal topology representation in an IGP, in which the topology is represented by the links and nodes in an IGP area.  No mapping encoding and flooding is needed. Every node creates and maintains the mapping from the nodes to their indexes. It is simpler and faster for 
> the node to have the mapping in this way.


The use of two different encodings for the topology is highly unusual.

It is not at all clear about how it should be done. Ambiguity in a specification results in differing interpretations, that leads to complexity and interoperability errors. In this specific case, it would also seem to result in redundancy.  It would be better to have a single encoding.

The use of the “compact index” is not clear.  At 14 bits, it is clearly not large enough to support large areas. While we are generally not exceeding 16k systems today, that may grow. It wasn’t long ago that areas were practically confined to 1k nodes.  That’s exceeded now.  

A better way of encoding the indices is to make them variable length, depending on the number of nodes in the area (credit: Tony P.). In our draft, the explicit encoding of the mapping from system to index makes it very clear how many systems there are, and thus how many bits should be used in this encoding.  Please note that this optimization is not yet reflected in our draft.


> For example, one issue is that different types of messages (i.e., the messages for the mapping, and the messages for the TLVs of the paths in the flooding topology, depending on the mapping) for the flooding topology may be out of order on a receiving router. This may cause incorrect/corrupted flooding topology.


As I’ve explained, the ordering is irrelevant. 

The ordering of the TLVs for the mapping from node to index is self describing. Each TLV carries the starting index for the contents of its TLV.

The ordering of arrival of various LSP segments is also irrelevant as the computation need only be deferred until all segments have arrived.  This is already commonly done as part of SPF scheduling. This is also a necessity for all proposals, as any computation with inconsistent segments is going to yield dubious results.


> 2.                  Fault tolerance to multiple failures
> 
> An efficient method for fault tolerance to multiple failures on the flooding topology is very important to centralized flooding reduction. Without it,
> the network convergence will slow down significantly. In general, the convergence time will be more than doubled comparing to normal convergence when
> the flooding topology is split by multiple failures.


Multiple simultaneous failures are an extremely unlikely event. Routers are not designed to survive multiple concurrent hardware failures. Most topologies run at commercially viable loading levels cannot survive multiple critical link or node failures without manual intervention. What is critical is to survive a single failure efficiently. Our proposal is to compute a biconnected flooding topology. This protects against single failures, retaining a functioning flooding topology in the event of a failure. At that point, the system computes a new bi-connected flooding topology, protecting against further failures.

Carrying arbitrary alternate information to protect against multiple failures negates your point about encoding efficiency. Backup links are not a tractable solution.


> The more time the network convergence takes, the more customers' traffic in
> the network gets lost in general. Thus after the centralized flooding reduction without fault tolerance to multiple failures is deployed in a network, some customers' traffic lose will be more than doubled when multiple failures splitting the flooding topology occur comparing to the same network without the flooding reduction.


If there are multiple failures, in our proposal, this can easily be detected. The flooding topology would clearly be partitioned. Systems on the edge of the partition can explicitly request temporary changes to the flooding topology, immediately restoring the connectivity of the flooding topology if that is possible.  Thus, the convergence time is going to be on the order of two LSP propagation times: one to detect the failure and another to repair it and converge.

Tree topologies are particularly vulnerable to failures since each and every link failure creates a partition of the flooding topology. 

Repairing this in a distributed fashion is also not perceptibly better.  After a partition, it will require at least one LSP propagation time to detect the failure. Then execution of the algorithm, followed by another LSP propagation time before a complete SPF can be run.

> "draft-cc-lsr-flooding-reduction" has a light-weight method for providing fault tolerance to multiple failures on the flooding topology. This method makes sure that the network still converges fast when the multiple failures happen. Thus the failures have almost minimal impact on the network convergence in the network with the flooding reduction deployment. The customers' traffic lose is significantly reduced (or almost minimized) when the failures occur in the network with the flooding reduction.


Your proposal is to revert to almost full flooding.  You bound the flooding to three hops (maybe, this is not defined), but in a LS topology, that is the full topology.  This is not a solution.  If we could support full flooding, we would not have problem in the first place.


> "draft-li-lsr-dynamic-flooding" just mentions that the edges of split parts
> will do something after the split of the flooding topology happens. However, it needs time to determine if a split of the flooding topology occurs and to find backup paths to connect split parts. There is no description on how to determine a split or find backup paths. This is not useful for implementors or users.


It is described. Please see section 6.7.11. After a topology change, all nodes can examine the existing flooding topology and the faults that have occurred.  If it is clear that the flooding topology has partitioned, then nodes that have adjacencies that are not reachable on the flooding topology make a temporary request to their adjacencies to add to the flooding topology.  This creates a temporary repair until the next flooding topology is distributed.


> In addition, the algorithm for computing a flooding topology that can survive any one link failure on the flooding topology (i.e., any one link failure on the flooding topology does not split the flooding topology) will be very complex, at least in terms of time. With a good method for fault tolerance, the algorithm is not required to compute a flooding topology that can survive any one link failure on the flooding topology.


Collapsing the network is not a 'good method'.


> 3.                  A New Node and Link Addition
> 
> Two different procedures for handling a new node and link addition to the topology are described in the two drafts ("draft-li-lsr-dynamic-flooding" and "draft-cc-lsr-flooding-reduction").
> 
> The procedure in "draft-cc-lsr-flooding-reduction": After the new node (say node A) and an existing node (say node B) establishes the adjacency over the link, A and B add the link to the flooding topology temporarily until a new flooding topology is built.


If A and B are already on the flooding topology, then this is redundant and harmful.


> The procedure in "draft-li-lsr-dynamic-flooding": A new TLV with R bit in IIH (IS-IS Hello), and FR-bit in OSPF Hello LLS Extended Options are defined. The new node (say node A) and an existing node (say node B), send each other the TLV with R bit in IS-IS Hello or FR-bit in OSPF Hello to request for adding the link to flooding topology temporarily (or say enabling temporary flooding over the link).


This only happens if the other node does not appear on the flooding topology.


> The procedure in "draft-cc-lsr-flooding-reduction" is simpler and more efficient. We can see that the two procedures try to achieve the same result ("
> temporary flooding" on the link) in different approaches. When a new node a
> nd link is added to the topology/network, it is deterministic that the link
> needs to be added to the flooding topology temporarily by the two end node
> s of the link. There is no need for the end nodes to add/enable "temporary 
> flooding" on the link through (new TLV in IIH or FR-Bit in OSPF Hello LLS) 
> signaling over the link.


Automatically adding to the flooding topology risks collapsing the network, again contradicting the entire goal of reducing the flooding topology.


> During internal review of "draft-li-lsr-dynamic-flooding" we observed significant procedure issues, which are big issues for implementations and users.  For example in one issue, if the first Hello packet containing the TLV with R bit in IS-IS (or FR-bit in OSPF) gets lost, then to add/enable "temporary flooding" on the link will take more than a Hello interval (typically 
> 10 seconds in OSPF). This is too long to accept!


Loss of a hello also implies that the adjacency will be delayed in coming up, so this is the same in both situations.


> 4.                  An Existing Link Addition
> 
> The approach in "draft-cc-lsr-flooding-reduction": The end node sends its adjacent node over the adjacency the link states with changes it received or
> originated during the period of time in which the flooding topology may split.


This requires the implementation to retain state about every LSP on every link, even when it is down.


> The approach in "draft-li-lsr-dynamic-flooding": A full LSDB resynchronization is requested over the existing adjacency/link between two end nodes for
> possible flooding topology split which may cause LSDB out of synchronization.


Resynchronization is a normal part of the protocol process at link up anyway.

Regards,
Tony