Re: [rbridge] network topology constraints in draft-tissa-trill-cmt-00

Santosh Rajagopalan <sunny.rajagopalan@us.ibm.com> Thu, 05 April 2012 22:48 UTC

Return-Path: <rbridge-bounces@postel.org>
X-Original-To: ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com
Delivered-To: ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A358321F86F1 for <ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com>; Thu, 5 Apr 2012 15:48:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.598
X-Spam-Level:
X-Spam-Status: No, score=-6.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YNH8W7zAlYyo for <ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com>; Thu, 5 Apr 2012 15:48:51 -0700 (PDT)
Received: from boreas.isi.edu (boreas.isi.edu [128.9.160.161]) by ietfa.amsl.com (Postfix) with ESMTP id 83D7B21F868A for <trill-archive-Osh9cae4@lists.ietf.org>; Thu, 5 Apr 2012 15:48:46 -0700 (PDT)
Received: from boreas.isi.edu (localhost [127.0.0.1]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id q35MTaYf014294; Thu, 5 Apr 2012 15:29:38 -0700 (PDT)
Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.145]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id q35MSlIb014242 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <rbridge@postel.org>; Thu, 5 Apr 2012 15:28:56 -0700 (PDT)
Received: from /spool/local by e5.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <rbridge@postel.org> from <sunny.rajagopalan@us.ibm.com>; Thu, 5 Apr 2012 18:28:46 -0400
Received: from d01dlp02.pok.ibm.com (9.56.224.85) by e5.ny.us.ibm.com (192.168.1.105) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 5 Apr 2012 18:28:43 -0400
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 9570B6E804C; Thu, 5 Apr 2012 18:28:42 -0400 (EDT)
Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q35MSfo4289186; Thu, 5 Apr 2012 18:28:42 -0400
Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q35MSMT4017572; Thu, 5 Apr 2012 16:28:23 -0600
Received: from d03nm127.boulder.ibm.com (d03nm127.boulder.ibm.com [9.17.195.18]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q35MSMnR017569; Thu, 5 Apr 2012 16:28:22 -0600
In-Reply-To: <CAF4+nEHRBt02zeC=v=qhCYQJq_fPhMNddWOU_ag3-mVjoAC=qw@mail.gmail.com>
References: <OF71D13FAE.515DCA1B-ON872579CF.0065A890-882579CF.00790240@us.ibm.com> <CAF4+nEHRBt02zeC=v=qhCYQJq_fPhMNddWOU_ag3-mVjoAC=qw@mail.gmail.com>
To: Donald Eastlake <d3e3e3@gmail.com>
MIME-Version: 1.0
X-KeepSent: 5C9DC8DA:85823406-872579D7:0078EB2A; type=4; name=$KeepSent
X-Mailer: Lotus Notes Release 8.5.1FP5 SHF29 November 12, 2010
Message-ID: <OF5C9DC8DA.85823406-ON872579D7.0078EB2A-882579D7.007B69B4@us.ibm.com>
From: Santosh Rajagopalan <sunny.rajagopalan@us.ibm.com>
Date: Thu, 5 Apr 2012 15:27:53 -0700
X-MIMETrack: Serialize by Router on D03NM127/03/M/IBM(Release 8.5.1FP2|March 17, 2010) at 04/05/2012 16:27:55, Serialize complete at 04/05/2012 16:27:55
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12040522-5930-0000-0000-000006C309FC
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: sunny.rajagopalan@us.ibm.com
Cc: rbridge@postel.org, rbridge-bounces@postel.org
Subject: Re: [rbridge] network topology constraints in draft-tissa-trill-cmt-00
X-BeenThere: rbridge@postel.org
X-Mailman-Version: 2.1.6
Precedence: list
List-Id: "Developing a hybrid router/bridge." <rbridge.postel.org>
List-Unsubscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=unsubscribe>
List-Archive: <http://mailman.postel.org/pipermail/rbridge>
List-Post: <mailto:rbridge@postel.org>
List-Help: <mailto:rbridge-request@postel.org?subject=help>
List-Subscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0222824213=="
Sender: rbridge-bounces@postel.org
Errors-To: rbridge-bounces@postel.org

Hi, Donald
1) The most significant topology concern I have is that with CMT, the CE 
switches that are connected into the rbridges cannot be interconnected - 
this forces all east-west traffic to take an extra hop north. This needs 
to be well-documented.

2) You're right that STP isn't precluded, but with no interconnects 
possible between the CE bridges, its not needed either.

3) The solution for the CE-rbridge link failure isn't presented in the 
draft. My bigger problem is that all the solutions I can think of in the 
framework of this draft are expensive to the point of being deal-breakers. 
Shutting down *all* the south-facing links on an rbridge when *one* link 
goes down isn't workable. Can you explain how having multiple RBv 
nicknames solves this problem?
There is one (expensive) solution I can think of, if you have an RBv per 
CE switch. In this case the rbridge will only advertise a virtual nickname 
if the link which has been "assigned" to that RBv is still up. This 
solution, unfortunately, expands the databases and state everywhere, in 
addition to making our 16-bit space run through faster. Is this the 
solution the draft authors will be going towards?

--
Sunny



From:   Donald Eastlake <d3e3e3@gmail.com>
To:     Santosh Rajagopalan/Santa Clara/IBM@IBMUS
Cc:     rbridge@postel.org
Date:   04/05/2012 01:15 PM
Subject:        Re: [rbridge] network topology constraints in 
draft-tissa-trill-cmt-00
Sent by:        rbridge-bounces@postel.org



Hi Sunny,

On Wed, Mar 28, 2012 at 6:00 PM, Santosh Rajagopalan
<sunny.rajagopalan@us.ibm.com> wrote:
> This is a clever draft, but I wanted to point out some network topology
> constraints in the proposal:
>
> 1) It looks like you can only use this proposal if the CE switches are 
*not*
> interconnected in any fashion outside of the trill network. This is 
because
> sending packets into the CE network strips off information needed to 
prevent
> loops. Let me illustrate using the example from the draft: RB1 receives 
a
> multidestination packet from the trill campus on tree 1, and it also has 
an
> "affinity" for that tree. So it decaps it and send the packet into the 
CE
> network (basically, a copy gets sent to each of CE1..CEn).
>
> Let's assume that the CE network is composed of interconnected switches,
> instead of the isolated switches shown in the picture. This is 
reasonable,
> because it avoids needing to take the extra hop to the aggregation layer 
for
> end-systems on the same chassis or rack. This means that a broadcast 
packet
> would be replicated by the CE network to each of its switches, including
> CE1...CEn. So CE1..CEn just got their first duplicate. Each of these
> switches looks at the attached rbridges as edge ports, so it sends them 
a
> copy. Now, rbridges RB1..RBk each label the ingressing packet with their
> respective "affinity" tree labels and sends them into the TRILL network,
> where it gets to the edge of the trill network, and the cycle repeats. 
You
> now have a loop.
>
> In addition, if you had interconnectivity between the CE switches, then 
the
> edge rbridges would be able to exchange LSPs with each other, which will

(I think you mean Hellos.)

> result in one of them being elected the AF. The others will then not 
encap
> or decap CE packets. So the affinity based approach would conflict with 
 RFC
> 6325. All in all, we need the CE switches to be isolated here.

OK. The technique in this draft is an optimization to spread load more
evenly and improve fail over. If there are cases where you can't use
this technique, then, in those cases, you can look for other ways to
accomplish either or both such improvements.

> 2) Applying an STP-based solution like the one described in RFC 6325 
("The
> Spanning Tree Solution") to  break the connectivity between the CE 
switches
> won't work here, because this will render certain switches unreachable 
on
> some trees. In figure 13 ("wiring closet topology") in rfc 6325, if RB1 
has
> an affinity for tree k, then packets coming in from the trill cloud on 
that
> tree will need to get to B2 through RB1, but since STP has blocked B1-B2
> this won't happen. This just reiterates that no form of interconnect
> whatsoever between the CE switches is permissible, and "The Spanning 
Tree
> Solution" will not work here.

I don't understand your comment #2 above. Assuming use of the
technique in RFC 6325 A.3.3 so the link between B1 and B2 is blocked,
you imply that multi-destination frames for B2 must somehow be
delivered via RB1->B1->B2, which will not work. But why can't they be
directly delivered via RB1->B2, since I believe we are assuming that
both B1 and B2 are each directly connected to both RB1 and RB2? After
all, with the B1-B2 link blocked, there isn't any way that RB1 and RB2
can tell if that link is still up. If someone just snipped the cable
between B1 and B2, what would change at RB1 or RB2? Yet you agree, I
believe, that if B1 and B2 had never been connected, it would work...

> 3) In addition to the above constraint, each CE switch needs to be 
connected
> to every rbridge, and the consequence of any of the LAG links going down 
are
> catastrophic. Also, each rbridge needs to have a vLAG each to each CE 
switch
> in the LAN. This is necessary because the entire CE network has been
> "emulated" by the pseudo-rbridge (RBv) in the draft.

This depends on whether one RBv is for all the CEs or you have
multiple RBv nicknames. It seems to me you at least need a different
RBv for each different set of RBridges whose links are aggregated.
(Except. of course, if you get down to one rbridge, you can just use
that rbridge's nickname for the ingress...)

> Let's say a packet arrives from the trill core at a certain rbridge on a
> tree that it has an affinity for. The assumption is that by 
decapsulating
> the packet and sending it to each of the attached CE links, all the 
stations
> in the CE network will get the packet. So if there's a certain CE switch
> which isn't connected to this rbridge, it will not get the packet (the
> packet can't get to the CE switch through another CE switch, because of 
the
> constraint in 1) above).
>
> This means that a) each rbridge needs to have n vLAGs, one for each CE
> switch, and b) each CE switch needs to have k ports in its LAG, one for 
each
> rbridge. Note that most switches have scalability constraints on the 
number
> of LAG members and on the number of VLAGs. For small networks this may 
not
> be a problem, however, this may still be a problem if one of your links 
on
> the LAG goes down. In that case, that CE switch will get permanently
> blackholed for some trees. (Essentially, the upstream rbridge on the 
other
> end of the down link no longer has any way of reaching the CE switch on 
the
> x trees it has an affinity for)
>
> At the very least, this proposal needs a way for an rbridge to 
"relinquish"
> its affinity trees when any VLAG link goes down, and a way those trees 
to
> either be retired or be picked up by other rbridges. In addition, the
> rbridge will need to bring all of its CE-facing links down, so that the 
CE
> bridges don't try to use that rbridge to inject packets into the TRILL
> network.

Lots of things can happen. Links can fail or come up. RBridges can
fail or come up. The campus can be re-configured to increase or
decrease the number of trees. I don't think any of the proposals
specifies how to handle all these events.

> 4) Because of the constraint imposed by 1), you cannot interconnect two
> trill clouds using an intermediate CE cloud - the trill clouds will need 
to
> be merged using p2p trill links. This could be a problem if you plan to
> incrementally upgrade your switches to trill, as opposed to a fork-lift
> upgrade of your whole data center.

As far as I can tell, you mean "bridged LAN" when you say "CE cloud".
Of course you can connect trill clouds with a bridged LAN to form a
single campus, you just can't use this particular technique at the
same time. Considering the bridged LAN as a multi-access transit link,
if you follow typically recommended network design and do not put end
stations on that link, it doesn't make much difference that you can't
use this technique.

I don't think it has anything to do with "p2p trill links". TRILL has
always fully supported multi-access links. There is no problem using
one or more multi-access and/or p2p physical links to connect RBridges
to a bridged LAN as part of a TRILL campus whether or not the topology
is such that that bridged LAN is the only connection between two parts
of the campus.

Thanks,
Donald
=============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 d3e3e3@gmail.com

> Note that the existing version of RFC 6325 does not have constraints on
> interconnectivity of CE switches or rbridges as described above.
>
> Thoughts?
>
> --
> Sunny Rajagopalan

_______________________________________________
rbridge mailing list
rbridge@postel.org
http://mailman.postel.org/mailman/listinfo/rbridge


_______________________________________________
rbridge mailing list
rbridge@postel.org
http://mailman.postel.org/mailman/listinfo/rbridge