Re: [Lsr] Thoughts on the area proxy and flood reflector drafts.

Tony Li <tony1athome@gmail.com> Sat, 06 June 2020 21:15 UTC

Return-Path: <tony1athome@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DCC0C3A0CB3 for <lsr@ietfa.amsl.com>; Sat, 6 Jun 2020 14:15:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.797
X-Spam-Level:
X-Spam-Status: No, score=-0.797 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (body has been altered)" header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id miIpyrbI4m7j for <lsr@ietfa.amsl.com>; Sat, 6 Jun 2020 14:15:08 -0700 (PDT)
Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF7223A0CAF for <lsr@ietf.org>; Sat, 6 Jun 2020 14:15:08 -0700 (PDT)
Received: by mail-pf1-x42c.google.com with SMTP id d66so6672125pfd.6 for <lsr@ietf.org>; Sat, 06 Jun 2020 14:15:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=B69vJ6stlMe55RAo72hm4A8zSQk3Zw8LkC9BXlAe6lI=; b=byVW7PKEfiQKbyKIUbCy7S6DZ2VH2hiL7rmJIm6VBc/cI/E603OKPJADkrnzGZAaqy 0ShssQdjdlEhvPhy9NsijNnelXwLNncYa1gaL8zhQx2Aa+Gu71FJqoAsJoJM9AHffTqJ SZNfLl8mN6jPFuHngblW/yIiiOpRom1sg4rX9wxWcvzj6yAbpl4qtALnpVfSdIeISnIE u0mrPMwD/Zm8ct6uI77gfk94/hZk/jKdpRkIVsUNJEwIN7YpAYSnemWUsB4vxsf9nTGO Dtn6DCeEqzUz+0WuQmFKrxMXT7QAgd0Qmlcja11rQt6VGIWMpMIPOfribCGUxugkPFkH OLcg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=B69vJ6stlMe55RAo72hm4A8zSQk3Zw8LkC9BXlAe6lI=; b=oTo9/8Anh//c0Pr7zU2m9CDYwoDC7AShGJMZTvENV0qPpXX2g6qAoDJcq2Dgc38LIP yZuoYtPfZTS8xMgugaCIWjQnHHk6xcA6OLb6IAa/Fkv+eCfdsAvXZPCuYhD06gz6yz+A zjvsVuMyZNtN2SAXPYZL7DHKi/j8H5qKhbZ3NFbzGv/OxyMsWh/EF6hji3J+eEOTeB2D iY6+88JuHD9WN4jQhTp9XuHi/sOobaV7umYylpx1gItUh/0nncqHc7qCic5mnrYgd4nb 9vfS6T8FMYonEHC1qJOE2OeSn4U4U2BRq4nv/pEo6jF3rvJFiqzCnmi15pFdbrUd9WGg QrIA==
X-Gm-Message-State: AOAM53202Vmay7gJ//eCMjDWbThEBaM5AGFgz8k2uYC7YOpZ7n9Tm0EV ID0GN9XVCS1WeY6Qe95bCEA=
X-Google-Smtp-Source: ABdhPJyizwbSMsnwCbV8iKp+MJsRlMmWL9UFMaFSQhHc+TPHIeYYoJlQS/EXjYVrwFdZ/DXq02OAvQ==
X-Received: by 2002:a65:4903:: with SMTP id p3mr14422567pgs.318.1591478105498; Sat, 06 Jun 2020 14:15:05 -0700 (PDT)
Received: from [10.95.83.65] ([162.210.129.5]) by smtp.gmail.com with ESMTPSA id z1sm3019841pfn.178.2020.06.06.14.15.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 06 Jun 2020 14:15:04 -0700 (PDT)
From: Tony Li <tony1athome@gmail.com>
Message-Id: <B53A7AAF-171B-4186-8486-069BF9460797@gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_12920B45-D6AB-4F65-A8DE-5A250DC3FA72"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\))
Date: Sat, 06 Jun 2020 14:15:01 -0700
In-Reply-To: <790B898F-DB03-499E-BAAE-369504539475@chopps.org>
Cc: lsr@ietf.org
To: Christian Hopps <chopps@chopps.org>
References: <790B898F-DB03-499E-BAAE-369504539475@chopps.org>
X-Mailer: Apple Mail (2.3608.80.23.2.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/Kozhg0l5dDMItrGxKixt_ODlbDg>
Subject: Re: [Lsr] Thoughts on the area proxy and flood reflector drafts.
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 06 Jun 2020 21:15:11 -0000

Hi Chris,

Thank you for your thoughtful comments.


> A simplified model of this design can be seen as a horseshoe of horseshoes. the Major horseshoe is L2 and the minor horseshoes are L1. Each L1 area has 2 L2 routers for redundancy (I'll consider more though), and all L2 routers are full-mesh connected to support that redundancy.
> 
> Telco
> 
> 
> But let's map this to a more DC centric view (I guess?) where each L1 area now has 10 L2 routers instead of 2 (i.e., 10%, but that could be change that later if need be).


The key point here is not a DC centric view, but one of scale. The above worked just fine back when we were terminating T1’s and E1’s.  Now, PE routers easily have terabit capacity. WIth the 100:1 ratio between the PE and the L1L2 routers (Area Edge routers), the bandwidth requirements drive us to petabit, multi-chassis routers. While these have been great fun to build, operators are not happy with this direction. Forklift upgrades are necessary with every silicon cycle. The complexity of the router has multiplied, the blast radius is off the charts, and the premiums charged for these devices are impressive. As a result, folks are trying to explore a ’scale-out’ approach. Rather than 2 huge area edge routers, they would much rather be able to scale out the area edge to be many routers.

If you pursue this design direction, the first thing you observe is that you can see is that you cannot afford to build a full-mesh of inter-area circuits across all of the area edge routers.

And if your network is a bit more geographically dispersed, you find that it becomes inefficient to have even a full mesh at the area level. This forces some areas to become transit.

Between these two forces, we are compelled to push transit traffic through the internals of an area.

> Natural Design
> 
> 
> 
> 


> Now for whatever reason some operators do not want to provision high-bandwidth transit links between their L2 routers in their L1 areas. This is critically important b/c otherwise you would simply use the above Natural Design. I'd like to better understand why that isn't just the answer here.


In this design, you’ve shown ten area edge routers.  Yes, you could provision a full mesh of links between them.  The issue remains one of scalability and uniformity. The number of ports per area edge router has to scale linearly with the size of area edge and the overall number of links used for this purpose is O(n^2).  Combine this with the fact that any non-uniformity in the traffic pattern and your full mesh ends up being congested and under-utilized simultaneously.  

All is not lost, however. Charles Clos to the rescue. By structuring each area as a Clos (or Benes) network (which my employer seems to insist on spelling ‘leaf-spine’), you avoid this. I assume I don’t need to go into details on this.  


> Area Proxy
> 
> First I'll look at area proxy. This seems a fairly simple idea, basically it's taking the now L1L2 areas and advertising them externally as a single LSP so the impact is very similar to if they were L1 only. This maps fairly closely to the Telco and Natural Design from above. Each L1 router in the Telco design would have 100 LSPs The L12 routers would have 100 L1 + 16 L2 LSP. In the Natural Design each L1 router has 100 L1 and each L12 router would have 100 L1 and 80 L2. With Area Proxy each router  has 100 L1 and 100 "Inner L2 LSPs" and 80 "Outer LSPs" + 8 "Outer L2 LSPs"


We’ve made some changes in the latest version of the draft.  In the current version, we require that each router in the area be L1L2.  However, only one LSP is advertised externally for each area.  Thus, each router will see 100 L1 LSPs, 100 L2 LSPs and 8 L2 Proxy LSPs.


> The key thing to note here is that if you double the number of areas you only add to the Outside LSP and Proxy count just as it would scale in the Natural Design, so going from 8 to 16 areas here adds 80 more "Outside LSPs" and 8 more L2 Proxy LSPs for a total of 276 L2 LSPs even though you've added 800 more routers to your network.


Doubling the number of areas would give you 16 L2 proxy LSPs, so you end up going from 208 LSPs to 216.  The key point here is that the database now scales linearly with the number of areas.

Demo time:

In a lab setup, we have an area of five routers.  We have three L2 routers.  The database on one of the pure L2 routers is just 5 entries (three L2, one proxy LSP, one pseudonode):

rtrmpls8>show isis data

IS-IS Instance: Amun VRF: default
  IS-IS Level 2 Link State Database
    LSPID                 Seq Num   Cksum  Life  IS Flags
    ip6.00-00             5         39442  1077  L2 <>
    ip7.00-00             6         40764  1077  L2 <>
    ip8.00-00             4         17533  1071  L2 <>
    ip8.06-00             1         42260  1071  L2 <>
    Proxy.00-00           1         13553  1087  L2 <>

On an inside node, the L1 data is 7 entries (including 2 pseudonodes), and the L2 database is 12 (including 3 pseudonodes):

rtrmpls4>show isis data

IS-IS Instance: Amun VRF: default
  IS-IS Level 1 Link State Database
    LSPID                 Seq Num   Cksum  Life  IS Flags
    ip1.00-00             6         57570  962   L2 <DefaultAtt>
    ip2.00-00             5         12641  963   L2 <DefaultAtt>
    ip2.06-00             1         26206  963   L2 <>
    ip3.00-00             5         30428  964   L2 <DefaultAtt>
    ip4.00-00             5         58405  962   L2 <DefaultAtt>
    ip4.07-00             1         35124  955   L2 <>
    ip5.00-00             5         55336  962   L2 <DefaultAtt>
  IS-IS Level 2 Link State Database
    LSPID                 Seq Num   Cksum  Life  IS Flags
    ip1.00-00             10        890    966   L2 <>
    ip2.00-00             7         4390   965   L2 <>
    ip2.06-00             1         61120  963   L2 <>
    ip3.00-00             7         43115  965   L2 <>
    ip4.00-00             7         64064  962   L2 <>
    ip4.07-00             1         6798   955   L2 <>
    ip5.00-00             9         34969  975   L2 <>
    ip6.00-00             5         39442  967   L2 <>
    ip7.00-00             6         40764  966   L2 <>
    ip8.00-00             4         17533  961   L2 <>
    ip8.06-00             1         42260  961   L2 <>
    Proxy.00-00           1         13553  975   L2 <>

Regards,
Tony