CA*net report

Dennis Ferguson <dennis@gw.ccie.utoronto.ca> Wed, 14 March 1990 19:58 UTC

Received: by devvax.TN.CORNELL.EDU (5.59-1.12/1.3-Cornell-Theory-Center) id AA04365; Wed, 14 Mar 90 14:58:33 EST
Received: from [128.100.63.2] by devvax.TN.CORNELL.EDU (5.59-1.12/1.3-Cornell-Theory-Center) id AA04361; Wed, 14 Mar 90 14:58:17 EST
Received: by gw.ccie.utoronto.ca id 41307; Wed, 14 Mar 90 14:56:21 EST
From: Dennis Ferguson <dennis@gw.ccie.utoronto.ca>
To: tewg@devvax.TN.CORNELL.EDU
Subject: CA*net report
Message-Id: <90Mar14.145621est.41307@gw.ccie.utoronto.ca>
Date: Wed, 14 Mar 1990 14:56:17 -0500

I was asked by Scott to send an overview of CA*net topology (or
lack thereof).  This is pieced together from a set of notes I have
been keeping, and I have been a little lazy about editting down the
size, so please don't believe I wrote all this especially to
fill TEWG people's mail boxes.

The first phase topology, when completed, is planned to look as follows:

              Edm            Wpg
         /---- 0            /-0-------------\
    Van /             Sask /          /------\--------\
     0-/                0-/         Tor       Mtl      Fred      Chrl
     |\                              0 ------- 0         0 ------- 0
     | \----------------------------/|         |\
     |                               |         | \            Hal     StJ
     V                               V         V  \----------- 0 ----- 0
   Seattle                        Ithaca   Princeton

For those not up on Canadian geography, the place names are spelled out
following (spelling mistakes and all), west to east:

    1    Vancouver, British Columbia
    2    Edmonton, Alberta
    3    Saskatoon, Saskatchewan
    4    Winnipeg, Manitoba
    5    Toronto, Ontario
    6    Montreal, Quebec
    7    Fredricton, New Brunswick
    8    Halifax, Nova Scotia
    9    Charlottown, Prince Edward Island
   10    St. John's, Newfoundland

A couple of design features are worth noting.  There are 10 nodes, since
Canada has 10 provinces.  There are 9 internal links, since this is the
minimum you can connect 10 nodes with.  All internal links are (extremely
expensive) 56 kbps circuits.  The three southbound circuits may be
somewhat larger than 56 kbps.  The network will operate with a substantial
fraction of the bills being payed by the users of the network (provincial
networks, or "regionals") from day one, so much of the planning was done
in scarcity-of-resources mode.  The two links which are needed for full
redundancy are awaiting a reduction in the tariffs.

The routers are being provided by IBM, and for the first round are RT-based
with the NSS software squeezed to fit in a single box.  This equipment
may not be permanent.  The southern end of the three NSFnet links will
be connected directly to the respective NSFnet NSSes.

The architecture internally is somewhat constrained by (or at least,
defined by) the NSS software.  CA*net will comprise a complete autonomous
system by itself, and will exchange routing with the provincial network
via EGP (and/or BGP if desired).  Routing is planned to be exchanged with
the NSFnet via BGP exclusively, if only to minimize traffic on the
southbound links through the incremental updates (such things become
a concern if you are planning a network you know will be congested from
the outset).

With respect to the treatment of the multiple NSFnet connections, we
had three or four goals in mind.  These were:

(1) make routing between CA*net and NSFnet as efficient as possible,
    this by minimizing the distance traversed within CA*net (e.g.
    traffic from Montreal should leave CA*net via Princeton and
    should return that way).

(2) when one of the NSFnet-CA*net links breaks, traffic should be
    routed via one of the other two links.

(3) make sure breakage of the Vancouver-Toronto or Toronto-Montreal
    links does not prevent, say, Halifax from reliably communicating
    with NSFnet (this is somewhat the same as (1)).

(4) perhaps, heal breakage of the Vancouver-Toronto and Toronto-Montreal
    links via the NSFnet, so that breakage of one of these doesn't
    affect connectivity between CA*net sites.

Of these, only (2) turned out to be perfectly straight forward.  To
understand the difficulties with (1) and (3), however, requires some
background.  The northern end of the NSFnet links is not directly
connected to the CA*net NSS, but rather to a separate box, to avoid
load difficulties should the southbound link become higher bandwidth.
The node is thus laid out as follows:


                Point-to-point links
       Subnets of 138.59     To NSFnet NSS
             | | |                 |
           +-+-+-+-+           +---+---+
           |       |           |       |
           |  NSS  |           |   ?   |
           |       |           |       |
           +---+---+           +---+---+
               |                   |
           ----+-------------------+---- ...
               Ethernet            |
            Class C network #  +---+---+
                               |Prov-  |
                               | incial| ...
                               |Router |
                               +-------+

The (?) is the issue.  Originally it was planned to make this an NSFnet
split E-PSP (i.e. a topological piece of the NSFnet).  The CA*net NSS
would peer with this via BGP.  The mid-level router(s), however, would
only peer with the CA*net NSS, this to allow CA*net to apply its own
policy to the NSFnet routing information beforce giving it to its
mid-level networks, and (less important) to avoid having routes cross
the serial link to the US more than once.  There are a couple of gotchas
here relating to BGP and the way NSSes handle routing.  If you read the
fine print in the BGP spec, it is illegal for the CA*net NSS to advertise
routes to the mid-level router which use the interface address of the
NSFnet box as a next hop.  If the provincial router isn't a BGP peer
of (?) it can't forward directly to it, but rather is only allowed to
forward through the NSS (and back out on the same network, complete
with ICMP redirect).

Second, if you multiply the above by three links you end up with the
following AS topology (CA*net == 577, NSFnet == 145):

                  Other ADs
                      |
     +----------------+----------------+
     |                                 |
     |            AS #577              |
     |                                 |
     +---+------------+------------+---+
         |            |            |
         |            |            |
     +---+------------+------------+---+
     |                                 |
     |            AS #145              |
     |                                 |
     +----------------+----------------+
                      |
                  Other ADs

Unfortunately, NSSes route by mapping a network number to an AS number
and then using the AS number as an entry into the routing table.  What
this means is that an NSFnet NSS will route all networks advertised by
AS #577 towards a single link (worse, it will do this even when 577
partitions and the single link it is routing to has declared some of
the networks unreachable).  This means that you can't meet the requirements
(1) and (3) if you do this.  NSSes really want one link per AS.

Now, the NSFnet usually handles multiple links by mapping one (some) of the
links to a different AS number (e.g. see the RFC on how MILNET is handled).
This works fine for EGP, but for BGP you now have trouble figuring out what
the AS paths look like after mapping.  And you are now telling lies about
your topology to the whole world, rather than just yourself.

Anyway, it is noted that lots of problems go away if CA*net doesn't talk
to the NSFnet directly.  To this end it is planned instead to make (?)
into its own autonomous domain, running back-to-back BGP.  You hence
end up with an AS topology which looks like:

                  Other ADs
                      |
     +----------------+----------------+
     |                                 |
     |            AS #577              |
     |                                 |
     +---+------------+------------+---+
         |            |            |
         |            |            |
     +---+---+    +---+---+    +---+---+
     |       |    |       |    |       |
     |AS #579|    |AS #6xx|    |AS #578|
     |       |    |       |    |       |
     +---+---+    +---+---+    +---+---+
         |            |            |
         |            |            |
     +---+------------+------------+---+
     |                                 |
     |            AS #145              |
     |                                 |
     +----------------+----------------+
                      |
                  Other ADs

which makes the NSSes on both sides happy since there is now only one
link per AS.  It also means we get around the unfortunate BGP restriction,
since CA*net now controls the policy on the (?) router as well.

In any event, it is then possible to construct gateway preferences
based on network number (or origin AS, if the software supported it).
CA*net will route south based on internal topology.  NSFnet will route
north based on first, second and third choice preferences which are
administratively set to match CA*net's internal topology.  (1), (2)
and (3) all work.

(4) doesn't really work in a BGP-based world since there is an explicit
assumption in the BGP document that ASes don't partition, and routing
loop control is sort of based on this.  In general this is probably a
good thing since trying to repair partitioned ASes is scary.  If one
has a fairly limited topology, however, there is a dishonest hack one
can do to get routing between the AS partitions via a third party.  This
might be tried in the interests of knowing if it works.

That's about it, I guess.  If I get to Pittsburgh I hope to have prettier
graphics.

Dennis