[Isis-wg] Some comments on draft-white-openfabric-02
Erik Auerswald <auerswald@fg-networking.de> Wed, 12 April 2017 08:50 UTC
Return-Path: <auerswald@fg-networking.de>
X-Original-To: isis-wg@ietfa.amsl.com
Delivered-To: isis-wg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C99BC13146D; Wed, 12 Apr 2017 01:50:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G3e5wCTYhy0u; Wed, 12 Apr 2017 01:50:43 -0700 (PDT)
Received: from mailgw1.uni-kl.de (mailgw1.uni-kl.de [IPv6:2001:638:208:120::220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F064D131460; Wed, 12 Apr 2017 01:50:32 -0700 (PDT)
Received: from mail.fg-networking.de (mail.fg-networking.de [IPv6:2001:638:208:cd01::23]) by mailgw1.uni-kl.de (8.14.4/8.14.4/Debian-8+deb8u1) with ESMTP id v3C8oSoN018217 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 12 Apr 2017 10:50:28 +0200
Received: from fgn-t61 (unknown [10.122.4.15]) by mail.fg-networking.de (Postfix) with ESMTP id 52EE92007A; Wed, 12 Apr 2017 10:50:19 +0200 (CEST)
Received: by fgn-t61 (Postfix, from userid 1000) id EAB95100445; Wed, 12 Apr 2017 10:50:18 +0200 (CEST)
Date: Wed, 12 Apr 2017 10:50:18 +0200
From: Erik Auerswald <auerswald@fg-networking.de>
To: Russ White <7riw77@gmail.com>, rtgwg@ietf.org, isis-wg@ietf.org
Cc: Erik Auerswald <auerswald@fg-networking.de>
Message-ID: <20170412085018.GA29441@fg-networking.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/isis-wg/LcF9az82-1HfuD94y7BnOxuzEU0>
X-Mailman-Approved-At: Wed, 12 Apr 2017 08:05:47 -0700
Subject: [Isis-wg] Some comments on draft-white-openfabric-02
X-BeenThere: isis-wg@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF IS-IS working group <isis-wg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/isis-wg>, <mailto:isis-wg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/isis-wg/>
List-Post: <mailto:isis-wg@ietf.org>
List-Help: <mailto:isis-wg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/isis-wg>, <mailto:isis-wg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Apr 2017 08:53:58 -0000
Hi all, I have read draft-white-openfabric-02 and would like to comment on a few points. I'll start at the top of the draft and continue through the text. Please keep my e-mail address in replies, because I am not subscribed to the isis-wg and rtgwg mailing lists. 1. The abstract states "[...]topology information is extracted through broad based connections." I do not understand that sentence. 2. Section 1.1., Goals, mentions large scale data centers. Would it be appropriate to reference RFC 7938, Use of BGP for Routing in Large-Scale Data Centers, here? Said RFC proposes a Clos topology for the network, which seems to be similar to the spine and leaf topology of openfabric. 3. In section 1.3., Simplification, I noticed a spelling mistake: mutliaccess (should be multiaccess). 4. In section 1.5., Sample Network, a spine and leaf network is shown in figure 1. The topology shown in that figure is different from the 5-stage Clos topology shown in RFC 7938, figure 3. The 5-stage Clos topology from RFC 7938 represents the network topology used by Facebook for the Altoona data center, as publicized in https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/. Another generalization of the 3-stage Clos network to more than 3 stages called Beneš network can be found on Wikipedia: https://en.wikipedia.org/wiki/Clos_network#Clos_networks_with_more_than_three_stages Both of these 5-stage networks differ from figure 1 of the openfabric draft insofar as each T2 switch is connected to a proper subset of T1 switches (openfabric designation) in both the RFC 7938 "Clos" topology and the Beneš network. This is crucial for increasing the amount of input- and output ports without using bigger switches. Since this is important for later comments, I have adapted figure 3 from RFC 7938 into the following drawing: +----+ +----+ |L1.1| |L1.2| (T0) +----+ +----+ | \________________ / | | ________________\/ | | / \ | +----+ +----+ |F1.1| |F1.2| (T1) +----+ +----+ / \ / \ / \ / \ +----+ +----+ +----+ +----+ |S1.1| |S1.2| |S2.1| |S2.2| (T2) +----+ +----+ +----+ +----+ \ / \ / \ / \ / +----+ +----+ |F2.1| |F2.2| (T1) +----+ +----+ | \________________ / | | ________________\/ | | / \ | +----+ +----+ |L2.1| |L2.2| (T0) +----+ +----+ Legend: Lx.y: Leaf switches (a.k.a. Top of Rack (ToR) switches) Fx.y: Fabric switches Sx.y: Spine switches Inter-switch connections: Lx.y is connected to Fx.* Fx.y is connected to Lx.* and Sy.* Sx.y is connected to F*.x Figure 2: 5-Stage Clos Topology (adapted from [RFC7938], Figure 3) I have used the name "Fabric switch" similar to Facebook's use of that name in the above referenced blog post, just to have distinct names and single letter abbreviations for each tier. A reference to RFC 7938, section 3.2, Clos Network Topology, would fit into this section. 5. It might be appropriate to mention the use of timeouts and exponential back-off for initial adjacency formation in section 2. Something like sequentially trying all discovered neighbors and using exponentially increasing random timeouts for subsequent rounds until the first adjacency is formed. A "Happy Eyeballs" (RFC 6555) like approach of trying to form two adjacencies with a slight delay in-between might be nice as well. 6. Section 3., Determining Location on the Fabric, relies on the special topology from figure 1 of the openfaric draft. In both Beneš networks and the topology shown in figure 2 (of this mail), FD == TD and TD == 4 holds for non-T0 switches. One example is S1.1 from figure 2. It can be easily seen from that figure that for all switches in that topology FD == TD == 4. Thus the algorithms from sections 3.1., Determining T0, and 3.2., Determining T1 and above, do not work for general fabric topologies. 7. The algorithm described in section 4, Flooding Optimization, does not work for the 5-stage "Clos" topology (see figure 2). An example for this is a change that pertains just switches S1.1 and F1.1 in figure 2 (e.g. a link between these two switches fails). Because the T0 switches Lx.y receive the LSPs as DNR, the LSPs do not reach switches Fx.2 and S2.y during flooding. The failure recovery mechanism of section 4.1., Flooding Failures, is needed to propagate the LSPs by design, but this is clearly thought of as a backup mechanism that is not needed for normal operation. 8. Section 5.1., Transit Link Reachability, would benefit from a reference to RFC 5837, Extending ICMP for Interface and Next-Hop Identification. 9. Section 6., Openfabric and Route Aggregation, should disallow route summarization. Otherwise the failure of a single link will result in traffic black-holing without intra-tier links. See e.g. RFC 7839, sections 8.2. and 8.2.1. But intra-tier links are disallowed in section 1.5, Sample Network. Since the reason for disallowing intra-tier links, topology auto- detection, is not yet solved (see comment 6. above), you might allow the combination of intra-tier links and route summarization. I would prefer disallwoing both for openfabric, because the added complexity of route summarization and its effects on resiliency in the case of failures seem a bad trade-off for the reduced routing table size. Thanks for reading this far. :-) Best regards, Erik -- Dipl.-Inform. Erik Auerswald http://www.fg-networking.de/ auerswald@fg-networking.de T:+49-631-4149988-0 M:+49-176-64228513 Gesellschaft für Fundamental Generic Networking mbH Geschäftsführung: Volker Bauer, Jörg Mayer Gerichtsstand: Amtsgericht Kaiserslautern - HRB: 3630
- [Isis-wg] Some comments on draft-white-openfabric… Erik Auerswald