Re: [trill] Thoughts on active-active edge

zhai.hongjun@zte.com.cn Fri, 14 December 2012 03:17 UTC

Return-Path: <zhai.hongjun@zte.com.cn>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C564421F8C51; Thu, 13 Dec 2012 19:17:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -96.999
X-Spam-Level:
X-Spam-Status: No, score=-96.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, MIME_BASE64_TEXT=1.753, MIME_CHARSET_FARAWAY=2.45, USER_IN_WHITELIST=-100, WEIRD_QUOTING=1.396]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ku76jBts3RPi; Thu, 13 Dec 2012 19:17:34 -0800 (PST)
Received: from zte.com.cn (mx5.zte.com.cn [63.217.80.70]) by ietfa.amsl.com (Postfix) with ESMTP id 0F9D921F8C55; Thu, 13 Dec 2012 19:17:33 -0800 (PST)
Received: from zte.com.cn (unknown [192.168.168.119]) by Websense Email Security Gateway with ESMTP id 2BC111244348; Fri, 14 Dec 2012 11:19:24 +0800 (CST)
Received: from mse01.zte.com.cn (unknown [10.30.3.20]) by Websense Email Security Gateway with ESMTPS id 3504E72518D; Fri, 14 Dec 2012 11:06:34 +0800 (CST)
Received: from notes_smtp.zte.com.cn ([10.30.1.239]) by mse01.zte.com.cn with ESMTP id qBE3HGM1006133; Fri, 14 Dec 2012 11:17:16 +0800 (GMT-8) (envelope-from zhai.hongjun@zte.com.cn)
In-Reply-To: <CAFOuuo5o+=YT3TOVRp1Kxm_M3vL1Ko_1enb5fg2HuKjiUFRrKQ@mail.gmail.com>
To: Radia Perlman <radiaperlman@gmail.com>
MIME-Version: 1.0
X-KeepSent: A7FD3909:DFC4C7F6-48257AD4:0010B0EE; type=4; name=$KeepSent
X-Mailer: Lotus Notes Release 6.5.6 March 06, 2007
Message-ID: <OFA7FD3909.DFC4C7F6-ON48257AD4.0010B0EE-48257AD4.00124CC7@zte.com.cn>
From: zhai.hongjun@zte.com.cn
Date: Fri, 14 Dec 2012 11:17:13 +0800
X-MIMETrack: Serialize by Router on notes_smtp/zte_ltd(Release 8.5.3FP1 HF212|May 23, 2012) at 2012-12-14 11:17:08, Serialize complete at 2012-12-14 11:17:08
Content-Type: multipart/alternative; boundary="=_alternative 00124CC648257AD4_="
X-MAIL: mse01.zte.com.cn qBE3HGM1006133
Cc: Thomas Narten <narten@us.ibm.com>, trill-bounces@ietf.org, Sam Aldrin <aldrin.ietf@gmail.com>, Mingui Zhang <zhangmingui@huawei.com>, "trill@ietf.org" <trill@ietf.org>, "Tissa Senevirathne (tsenevir)" <tsenevir@cisco.com>
Subject: Re: [trill] Thoughts on active-active edge
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/trill>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Dec 2012 03:17:35 -0000

Hi Radia

Thanks for your answering.


> Why it is necessary to have a different pseuonode nickname if the upllnk 
is to 
> different sets of RBridges:

> If hypervisor H1 has uplinks to R1, R2, and R3, and uses pseudonode 
nickname P1, 
> and hypervisor H2 has uplinks to R1 and R2 (or even had uplinks to R1, 
R2, and R3, 
> but its link to R3 fails), then if H1 and H2 use the same nickname, say 
P1, 
> then traffic for H2's MAC addresses might get sent to R3 (since R3 has 
to claim to 
> be connected to P1 because it is, for H1).  But it is no longer attached 
to H2 
> because H2's uplink to R3 failed.

I think you are right if member RBriges of an RBv do not share their 
learned addresses 
(of local attathed end nodes and remote nodes) among the RBv. If they 
share the learned 
addresses, RB3 will know it can reach H2 via RB1 or RB2. Since H2 has 
uplinks to RB1 
and RB2, the two RBridges can learn H2's MAC address by oberving H2's 
native frames. 
Then RB1 and RB2 can share H2's MAC within the RBv, so RB3 will know (from 
the shared 
information) that it can reach H2 via RB1 or RB2. On receipt of traffic 
for H2's 
MAC addresses, RB3 will tunnel the traffic to RB1 or RB2 and the latter RB 
egress 
the traffic to H2.

As for the traffic tunneling, it has been specified in 
draft-ietf-trill-clear-correct-06.txt
(Section 2.4.2.1 in Page 9 and Section 2.4.2.3 in Pag2 10). The tunneling 
is simple, 
RB3 replaces the egress nickname in the TRILL header of the traffic with 
RB1 or RB2's nickname, 
then transmits the re-encapsultion traffic to RB1 or RB2.

Therefore, After the MAC sharing among member Rbridges of an RBv and the 
traffic tunneling 
are support, it is not necessary one nickname per hypervisor.

The safest thing would be for every hypervisor to have a nickname.

> So, how many hypervisors are there likely to be?  How usual would it be 
for all of 
> them to attach to the same set of uplinks, so that we can use the same 
pseuodnode nickname?

I do not know how many hypervisors there likely be, but I know one RBridge 
usually has 
24 or 48 down-links that can acts as uplinks for several hypervisors. So 
about 10 or 20 
hypervisors can be dual-homed to the same two of such Rbridges. If those 
hypervisors use 
same a pseudo-nickname, it not only saves nicknames but also decreases the 
RPFC entries 
on the RBriges in the TRILL campus scope.

> Do we care about the case of one of a hypervisor's uplinks failing, in 
which case, 
> would the RBridges know? If R3 (the one to which the uplink failed) 
know?  Would R1 and R2 know?

I think the member RBridge SHOULD know the uplink failure and tell other 
member RBridges 
the failure, if that Rbridge is directly connected to hypervisor which 
uplink fails. Otherwise, 
we can make sure the multi-destination traffic can be properly egressed to 
the hypervisor.


> So the main sort of configuration that I can think of, off the top of my 
head, 
> is which pseudonodes go with which hypervisors. 

If hypervisors do not make TRILL-encapsulation/decapsulation, they do not 
need to know 
which pseudonodes go with them. But if they do the 
encapsulation/decapsultion, they really need to know.

> I don't know how R1 can know that a particular port is to "H1" so that 
it can inform R2 
> (via LSPs?) that R1 is attached to H1, and R2 can notice that it, 
indeed, is also attached to H1.

If I not misunderstood your meaning, I think R1 can learn H1's MAC 
addresses by observing H1's 
native frames. Then it will know which particular port is to H1. R2 can 
also do the learning. 
If R2 has not learned H1's MAC addresses, R1 can share the addressed with 
it via ESADI PDU. 

> And part of the description of the problem would be answering questions 
like 
> how many uplinks would need to be supported.  Two at most?  30?  If a 
lot, 
> then solutions that require a tree for every uplink would be problematic 
if 
> implementations don't want to support that many trees.  Or is it OK to 
require 
> lots of trees?

In my mind, some vendors say their proprietary MC-LAG technologies can 
support at most 8 member 
devices in theory. But in practical deployment, two or three member 
devices in a MAC-LAG 
group are OK. So I think 8 trees is supported is OK at current. It can 
meet the practical 
requirements and does not make mass RPFC entries.

If I am wrong, please correcte me.


Best Regards,
Zhai Hongjun
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 Protocol Development Dept.VI, Central R&D Institute, ZTE Corporation
 No. 68, Zijinghua Road, Yuhuatai District, Nanjing, P.R.China, 210012
 
 Zhai Hongjun
 
 Tel: +86-25-52877345
 Email: zhai.hongjun@zte.com.cn
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""





Radia Perlman <radiaperlman@gmail.com> 
发件人:  trill-bounces@ietf.org
2012-12-14 02:10

收件人
zhai.hongjun@zte.com.cn
抄送
Thomas Narten <narten@us.ibm.com>, trill-bounces@ietf.org, Sam Aldrin 
<aldrin.ietf@gmail.com>, Mingui Zhang <zhangmingui@huawei.com>, 
"trill@ietf.org" <trill@ietf.org>, "Tissa Senevirathne \(tsenevir\)" 
<tsenevir@cisco.com>
主题
Re: [trill] Thoughts on active-active edge






Answering Zhai Hongjun's questions:

Why it is necessary to have a different pseuonode nickname if the upllnk 
is to different sets of RBridges:

If hypervisor H1 has uplinks to R1, R2, and R3, and uses pseudonode 
nickname P1, and hypervisor H2 has uplinks to R1 and R2 (or even had 
uplinks to R1, R2, and R3, but its link to R3 fails), then if H1 and H2 
use the same nickname, say P1, then traffic for H2's MAC addresses might 
get sent to R3 (since R3 has to claim to be connected to P1 because it is, 
for H1).  But it is no longer attached to H2 because H2's uplink to R3 
failed.

The safest thing would be for every hypervisor to have a nickname.

So, how many hypervisors are there likely to be?  How usual would it be 
for all of them to attach to the same set of uplinks, so that we can use 
the same pseuodnode nickname?  Do we care about the case of one of a 
hypervisor's uplinks failing, in which case, would the RBridges know?  If 
R3 (the one to which the uplink failed) know?  Would R1 and R2 know?  Even 
if R3 knew, how could it alert R1 and R2 to now use a different pseudonode 
nickname for H2?  Would it be obvious to them which of the hypervisors 
that have uplinks to R1, R2, and R3 they are referring to?

All of this must be configured, I assume, and if the configuration is 
wrong, then who knows what happens..presumably that traffic may or may not 
get delivered to a hypervisor.  And even if configured properly, what 
happens when uplinks fail?  Again, presumably, traffic may or may not get 
delivered to the hypervisor whose uplink fails.

-------------
As for VLANs...that actually is not a problem here, since we're not using 
AFs.  The hypervisor determines which uplink to send something to.  And 
which tree is being used for distribution determines which of R1, R2, or 
R3 will decapsulate the packet.

So the main sort of configuration that I can think of, off the top of my 
head, is which pseudonodes go with which hypervisors.  And I do think 
configuration is scary, especially if there's no "sanity check" whereby 
the RBs can compare notes.  I don't know how R1 can know that a particular 
port is to "H1" so that it can inform R2 (via LSPs?) that R1 is attached 
to H1, and R2 can notice that it, indeed, is also attached to H1.

--------
And part of the description of the problem would be answering questions 
like how many uplinks would need to be supported.  Two at most?  30?  If a 
lot, then solutions that require a tree for every uplink would be 
problematic if implementations don't want to support that many trees.  Or 
is it OK to require lots of trees?

So I think there are lots of things that should be written down, as part 
of describing the problem.



Radia





On Thu, Dec 13, 2012 at 4:03 AM, <zhai.hongjun@zte.com.cn> wrote:

Hi Radia 

> This case is scary because the RBridges on the uplink cannot see Hellos 
from each other, 
> so if misconfigured, at the very least I could imagine multiple RBridges 
decapsulating 
> multicast from the campus to the hypervisor. 

> Anyway...how many uplinks do we need to support?  Do we care about 
problems due to misconfiguration? 

I don't know what the misconfiguration refers to. Is it the set of VLANs 
for which an Rbridge acts as AF? 


> Are there cases where there are lots of hypervisors, where they attach 
to different subsets of edge RBs? 
> In that case, we might eat up a lot of nicknames, since if one 
hypervisor is attached to {R1, R2}, 
> and another is attached to {R1, R2, R3}, they cannot use the same 
pseudonode nickname. 

I don't know why the two sets of RBridges can not use the same 
pseudo-nickname. If the learned MAC addresses 
can be shared among member Rbridges of an RBv and TRILL data frames can be 
tunneled to another member Rbridge 
that can egress the frame, I think they can use the same pseudo-nickname. 

If I am wrong, please correct me.


Best Regards,
Zhai Hongjun
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Protocol Development Dept.VI, Central R&D Institute, ZTE Corporation
No. 68, Zijinghua Road, Yuhuatai District, Nanjing, P.R.China, 210012

Zhai Hongjun

Tel: +86-25-52877345
Email: zhai.hongjun@zte.com.cn
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""





Radia Perlman <radiaperlman@gmail.com> 
发件人:  trill-bounces@ietf.org 
2012-12-13 15:13 


收件人
Mingui Zhang <zhangmingui@huawei.com> 
抄送
Thomas Narten <narten@us.ibm.com>, Sam Aldrin <aldrin.ietf@gmail.com>, 
"Tissa Senevirathne \(tsenevir\)" <tsenevir@cisco.com>, "trill@ietf.org" <
trill@ietf.org> 
主题
Re: [trill] Thoughts on active-active edge








I think it would be good to have a document that explains the problem...I 
certainly don't believe I know all the cases that need to be solved.  I 
think I understand the hypervisor case...where the hypervisor decides 
which uplink to send things to, and never forwards between the up-links. 

This case is scary because the RBridges on the uplink cannot see Hellos 
from each other, so if misconfigured, at the very least I could imagine 
multiple RBridges decapsulating multicast from the campus to the 
hypervisor. 

Anyway...how many uplinks do we need to support?  Do we care about 
problems due to misconfiguration? 

In cases like this, is it common to also have pt-to-pt links between all 
the RBs attaching to the hypervisor?  If so, then it seems like it would 
be possible for them to coordinate to at least detect misconfiguration, 
and possibly play games with forwarding messages to each other (e.g., if 
one of them is not attached to a tree and needs to encapsulate a 
multidestination frame). 

How many trees does the campus need? 

Are there cases where there are lots of hypervisors, where they attach to 
different subsets of edge RBs?  In that case, we might eat up a lot of 
nicknames, since if one hypervisor is attached to {R1, R2}, and another is 
attached to {R1, R2, R3}, they cannot use the same pseudonode nickname. 

Are there cases other than hypervisors?  I think there are cases of 
bridges that have this behavior (a port with a bunch of endnodes, and 
several up-links, where the bridge does not forward between the up-links.  


If this has been written down anywhere, can anyone point me to it?  If 
not, it seems really prudent to answer these (and I'm sure other) 
questions before arguing about specific solutions. 

Radia 

_______________________________________________
trill mailing list
trill@ietf.org
https://www.ietf.org/mailman/listinfo/trill

_______________________________________________
trill mailing list
trill@ietf.org
https://www.ietf.org/mailman/listinfo/trill