Re: [Isis-wg] Some comments on draft-white-openfabric-02

Jeff Tantsura <jefftant.ietf@gmail.com> Wed, 12 April 2017 18:33 UTC

Return-Path: <jefftant.ietf@gmail.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CA2DE12EB4C for <rtgwg@ietfa.amsl.com>; Wed, 12 Apr 2017 11:33:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JBPr_PdGfFQz for <rtgwg@ietfa.amsl.com>; Wed, 12 Apr 2017 11:33:57 -0700 (PDT)
Received: from mail-pf0-x241.google.com (mail-pf0-x241.google.com [IPv6:2607:f8b0:400e:c00::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3FF32127444 for <rtgwg@ietf.org>; Wed, 12 Apr 2017 11:33:57 -0700 (PDT)
Received: by mail-pf0-x241.google.com with SMTP id i5so6568778pfc.3 for <rtgwg@ietf.org>; Wed, 12 Apr 2017 11:33:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=user-agent:date:subject:from:to:message-id:thread-topic:references :in-reply-to:mime-version:content-transfer-encoding; bh=BiahpOsYy9795jLxYcBonxcM96FB5oDu90uSB0uNZqg=; b=RFtQ10BR9UIj+yoXMUgM/ajMfDdlZAzXrhNFvDgOF3+j2a+IibldFoMcaj185+b0w5 hwC1qNub3/+Z04Z4vy7w6mg4qfJc3aegIIJ3cRe/Xgv/6Hd/oEGupMqxmZjh2sk7q5VG KvXbZtodBvHofWBovKx5km9+tQFyJRmfiRBm+DB8FNLLmNSoP6aJznQwWhsa3vyjmYzl JnmG4oR2IKJkZip0s8c5vAOr3YBhqvklxtrjJtXnF5HrhoqcgpRNxehTTNIbxgtTCOpE 55QeoyDIY6gAeSxaAA3PDXqbVX9LYh9KNW2ZYS1qVI4zcggljHprK7TUh8A1joXsa5n6 24gA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:user-agent:date:subject:from:to:message-id :thread-topic:references:in-reply-to:mime-version :content-transfer-encoding; bh=BiahpOsYy9795jLxYcBonxcM96FB5oDu90uSB0uNZqg=; b=BAFVIUqiDfZuyCQ0bDOZXMBDYm1gGpGt8l/mFK6TtWfNjd3i0f2ulu6l5jM6EMwbY+ HkEca6ECMrOd8BGCuApsPcJs0Qh7OiQ+w8YyeyIjmiqQ7nYqOt0gTKr9xgHZnL2xYwDn hfx9OZlqlcUUbjq8p0EhZPVroXgmbd9hfmBEeCql7W0xaa/d3oJgB+zvPz9smMMObWQz o9cAmYPwylGH/IO3E1Jey8rNbW1rudAp/TKsAjyXlDECS31iwFDKnpP3ZVcXYMH23NEH TIQr8kIDT2jlffvpZqmE6XBGGAMSpjGLts065kCN6vMI2vkDo8UvHZKRFusfN13ARnJA eLMQ==
X-Gm-Message-State: AFeK/H2GQPNSPxx9bPTP3ZZSMw8J4Xrfw+Zg+ZSTghwAyeuIDP5jw3n/aLLC5HIKUv4/xQ==
X-Received: by 10.99.225.5 with SMTP id z5mr69075001pgh.145.1492022036732; Wed, 12 Apr 2017 11:33:56 -0700 (PDT)
Received: from [10.24.45.80] ([68.65.169.228]) by smtp.gmail.com with ESMTPSA id s21sm38106436pgg.65.2017.04.12.11.33.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Apr 2017 11:33:55 -0700 (PDT)
User-Agent: Microsoft-MacOutlook/f.20.0.170309
Date: Wed, 12 Apr 2017 11:33:54 -0700
Subject: Re: [Isis-wg] Some comments on draft-white-openfabric-02
From: Jeff Tantsura <jefftant.ietf@gmail.com>
To: Erik Auerswald <auerswald@fg-networking.de>, Russ White <7riw77@gmail.com>, rtgwg@ietf.org
Message-ID: <8C49C6EC-C129-452E-AF3B-35C74611F51E@gmail.com>
Thread-Topic: [Isis-wg] Some comments on draft-white-openfabric-02
References: <20170412085018.GA29441@fg-networking.de>
In-Reply-To: <20170412085018.GA29441@fg-networking.de>
Mime-version: 1.0
Content-type: text/plain; charset="UTF-8"
Content-transfer-encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/X_9H0N0SpnsSJmoLkhUD-phdTMQ>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Apr 2017 18:34:00 -0000

Erik,

I have added your email to RTGWG list, so now you are allowed to post there. 

Cheers,
Jeff
 

On 4/12/17, 01:50, "Isis-wg on behalf of Erik Auerswald" <isis-wg-bounces@ietf.org on behalf of auerswald@fg-networking.de> wrote:

    Hi all,
    
    I have read draft-white-openfabric-02 and would like to comment
    on a few points. I'll start at the top of the draft and continue
    through the text.
    
    Please keep my e-mail address in replies, because I am not subscribed
    to the isis-wg and rtgwg mailing lists.
    
    1.
       The abstract states "[...]topology information is extracted
    through broad based connections." I do not understand that sentence.
    
    2.
       Section 1.1., Goals, mentions large scale data centers. Would
    it be appropriate to reference RFC 7938, Use of BGP for Routing
    in Large-Scale Data Centers, here? Said RFC proposes a Clos topology
    for the network, which seems to be similar to the spine and leaf
    topology of openfabric.
    
    3.
       In section 1.3., Simplification, I noticed a spelling mistake:
    mutliaccess (should be multiaccess).
    
    4.
       In section 1.5., Sample Network, a spine and leaf network is
    shown in figure 1. The topology shown in that figure is different
    from the 5-stage Clos topology shown in RFC 7938, figure 3. The
    5-stage Clos topology from RFC 7938 represents the network topology
    used by Facebook for the Altoona data center, as publicized in
    https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/.
    
    Another generalization of the 3-stage Clos network to more than
    3 stages called Beneš network can be found on Wikipedia:
    https://en.wikipedia.org/wiki/Clos_network#Clos_networks_with_more_than_three_stages
    
    Both of these 5-stage networks differ from figure 1 of the
    openfabric draft insofar as each T2 switch is connected to a
    proper subset of T1 switches (openfabric designation) in both the
    RFC 7938 "Clos" topology and the Beneš network. This is crucial
    for increasing the amount of input- and output ports without
    using bigger switches.
    
    Since this is important for later comments, I have adapted figure 3
    from RFC 7938 into the following drawing:
    
    
            +----+                  +----+
            |L1.1|                  |L1.2|             (T0)
            +----+                  +----+
             |   \________________  /   |
             |    ________________\/    |
             |   /                 \    |
            +----+                  +----+
            |F1.1|                  |F1.2|             (T1)
            +----+                  +----+
            /    \                  /    \
           /      \                /      \
       +----+    +----+        +----+    +----+
       |S1.1|    |S1.2|        |S2.1|    |S2.2|        (T2)
       +----+    +----+        +----+    +----+
           \      /                \      /
            \    /                  \    /
            +----+                  +----+
            |F2.1|                  |F2.2|             (T1)
            +----+                  +----+
             |   \________________  /   |
             |    ________________\/    |
             |   /                 \    |
            +----+                  +----+
            |L2.1|                  |L2.2|             (T0)
            +----+                  +----+
    
         Legend:
           Lx.y: Leaf switches (a.k.a. Top of Rack (ToR) switches)
           Fx.y: Fabric switches
           Sx.y: Spine switches
    
         Inter-switch connections:
           Lx.y is connected to Fx.*
           Fx.y is connected to Lx.* and Sy.*
           Sx.y is connected to F*.x 
    
       Figure 2: 5-Stage Clos Topology (adapted from [RFC7938], Figure 3)
    
    I have used the name "Fabric switch" similar to Facebook's use
    of that name in the above referenced blog post, just to have
    distinct names and single letter abbreviations for each tier.
    
    A reference to RFC 7938, section 3.2, Clos Network Topology, would
    fit into this section.
    
    5.
       It might be appropriate to mention the use of timeouts and
    exponential back-off for initial adjacency formation in section 2.
    Something like sequentially trying all discovered neighbors and
    using exponentially increasing random timeouts for subsequent
    rounds until the first adjacency is formed. A "Happy Eyeballs"
    (RFC 6555) like approach of trying to form two adjacencies with
    a slight delay in-between might be nice as well.
    
    6.
       Section 3., Determining Location on the Fabric, relies on the
    special topology from figure 1 of the openfaric draft. In both
    Beneš networks and the topology shown in figure 2 (of this mail),
    FD == TD and TD == 4 holds for non-T0 switches. One example is
    S1.1 from figure 2. It can be easily seen from that figure that
    for all switches in that topology FD == TD == 4. Thus the algorithms
    from sections 3.1., Determining T0, and 3.2., Determining T1 and
    above, do not work for general fabric topologies.
    
    7.
       The algorithm described in section 4, Flooding Optimization, does
    not work for the 5-stage "Clos" topology (see figure 2). An example
    for this is a change that pertains just switches S1.1 and F1.1 in
    figure 2 (e.g. a link between these two switches fails). Because
    the T0 switches Lx.y receive the LSPs as DNR, the LSPs do not reach
    switches Fx.2 and S2.y during flooding. The failure recovery
    mechanism of section 4.1., Flooding Failures, is needed to propagate
    the LSPs by design, but this is clearly thought of as a backup
    mechanism that is not needed for normal operation.
    
    8.
       Section 5.1., Transit Link Reachability, would benefit from
    a reference to RFC 5837, Extending ICMP for Interface and Next-Hop
    Identification.
    
    9.
       Section 6., Openfabric and Route Aggregation, should disallow
    route summarization. Otherwise the failure of a single link will
    result in traffic black-holing without intra-tier links. See e.g.
    RFC 7839, sections 8.2. and 8.2.1. But intra-tier links are
    disallowed in section 1.5, Sample Network.
    
    Since the reason for disallowing intra-tier links, topology auto-
    detection, is not yet solved (see comment 6. above), you might
    allow the combination of intra-tier links and route summarization.
    I would prefer disallwoing both for openfabric, because the added
    complexity of route summarization and its effects on resiliency
    in the case of failures seem a bad trade-off for the reduced
    routing table size.
    
    Thanks for reading this far. :-)
    
    Best regards,
    Erik
    -- 
    Dipl.-Inform. Erik Auerswald         http://www.fg-networking.de/
    auerswald@fg-networking.de T:+49-631-4149988-0 M:+49-176-64228513
    
    Gesellschaft für Fundamental Generic Networking mbH
    Geschäftsführung: Volker Bauer, Jörg Mayer
    Gerichtsstand: Amtsgericht Kaiserslautern - HRB: 3630
    
    _______________________________________________
    Isis-wg mailing list
    Isis-wg@ietf.org
    https://www.ietf.org/mailman/listinfo/isis-wg