Re: [rbridge] Updated charter

James Carlson <carlsonj@workingcode.com> Sat, 17 July 2010 14:44 UTC

Return-Path: <rbridge-bounces@postel.org>
X-Original-To: ietfarch-trill-archive-Osh9cae4@core3.amsl.com
Delivered-To: ietfarch-trill-archive-Osh9cae4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 686AE3A67EF for <ietfarch-trill-archive-Osh9cae4@core3.amsl.com>; Sat, 17 Jul 2010 07:44:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.999
X-Spam-Level:
X-Spam-Status: No, score=-3.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, GB_I_LETTER=-2, J_CHICKENPOX_64=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xjj1GU8ouwJN for <ietfarch-trill-archive-Osh9cae4@core3.amsl.com>; Sat, 17 Jul 2010 07:44:19 -0700 (PDT)
Received: from boreas.isi.edu (boreas.isi.edu [128.9.160.161]) by core3.amsl.com (Postfix) with ESMTP id 192363A681A for <trill-archive-Osh9cae4@lists.ietf.org>; Sat, 17 Jul 2010 07:44:19 -0700 (PDT)
Received: from boreas.isi.edu (localhost [127.0.0.1]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id o6HEKFxk023101; Sat, 17 Jul 2010 07:20:16 -0700 (PDT)
Received: from carlson.workingcode.com (carlson.workingcode.com [75.150.68.97]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id o6HEJSC9023037 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <rbridge@postel.org>; Sat, 17 Jul 2010 07:19:38 -0700 (PDT)
Received: from dhcp-226.workingcode.com (dhcp-226 [192.168.254.226]) (authenticated bits=0) by carlson.workingcode.com (8.14.2+Sun/8.14.4) with ESMTP id o6HEJ5MD012427 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 17 Jul 2010 10:19:06 -0400 (EDT)
Message-ID: <4C41BBD9.5060606@workingcode.com>
Date: Sat, 17 Jul 2010 10:19:05 -0400
From: James Carlson <carlsonj@workingcode.com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.4) Gecko/20100608 Thunderbird/3.1
MIME-Version: 1.0
To: Linda Dunbar <ldunbar@huawei.com>
References: <4BE7A984.8050607@oracle.com> <01a101cb2471$0541bdd0$3a0c7c0a@china.huawei.com> <4C40483E.6020207@workingcode.com> <003201cb24f7$8036d3f0$3a0c7c0a@china.huawei.com> <4C40794B.10304@workingcode.com> <00c201cb2527$633374e0$3a0c7c0a@china.huawei.com>
In-Reply-To: <00c201cb2527$633374e0$3a0c7c0a@china.huawei.com>
X-Enigmail-Version: 1.1.1
X-DCC-x.dcc-servers-Metrics: carlson; whitelist
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: carlsonj@workingcode.com
Cc: "'Developing a hybrid router/bridge.'" <rbridge@postel.org>, 'Jari Arkko' <jari.arkko@piuha.net>
Subject: Re: [rbridge] Updated charter
X-BeenThere: rbridge@postel.org
X-Mailman-Version: 2.1.6
Precedence: list
List-Id: "Developing a hybrid router/bridge." <rbridge.postel.org>
List-Unsubscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=unsubscribe>
List-Archive: <http://mailman.postel.org/pipermail/rbridge>
List-Post: <mailto:rbridge@postel.org>
List-Help: <mailto:rbridge-request@postel.org?subject=help>
List-Subscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: rbridge-bounces@postel.org
Errors-To: rbridge-bounces@postel.org

On 7/16/10 4:42 PM, Linda Dunbar wrote:
> Section 3 of the draft (ARP Optimization Details) listed 5 steps. The first
> step is "observing native ARP request and reply frame" to learn the frame's
> IP, VLAN. 
> 
> This snooping at line rate is not trivial, requiring the switch port to
> filtering all the ARP messages at line rate and store the learned IP
> addresses and VLANs.

It already has to snoop at all frames on the wire to learn source MAC
addresses and to do normal forwarding.  In some implementations (such as
the one I worked on at Sun), that's 99%+ of the cost.  Having to look a
little more at normally low-rate ARPs in particular to learn a bit more
is very little work compared to handling the bulk data traffic and (more
importantly) the significant effects that running in promiscuous mode
all the time has.

The trade-offs may be different for other implementations.  That's not a
problem.  Just because there's a draft doesn't mean there's someone
forcing you to implement.

(For what it's worth, performance-based arguments are usually viewed by
IETFers as weak, unless the "performance" issue is rather severe, such
as scaling geometrically or worse, because today's hardware is fleeting
but protocols are forever.  In this case, the work required grows with
the traffic, and only for those implementations that bother to optimize.)

> But most ARP requests/replies may not traverse through
> TRILL ports on a switch, which really defeats the purpose of this extra
> work. 

It's an optimization.  It optimizes the cases it's able to, and does
leaves for regular processing the ones it cannot.  How could it be
otherwise?

The very same could be said of all unicast traffic.  Not all MAC SAs
will be seen by all RBridges, so not all RBridges will have exact
knowledge of the locations of all nodes in the network.  Given that,
should we not bother to try to optimize the forwarding of unicast
traffic, because the purpose is "defeated?"

(Actually, I expect the ARP optimization to work a bit better, because
the draft specifies keeping the gratuitous ARPs as regular broadcast,
meaning that they'll always be seen -- much unlike the cached unicast case.)

> Let's assume that a switch has 12 ports, with 2 ports facing TRILL
> interface, 10 ports facing servers via standard Ethernet interface. 
> 
> Target hosts of ARP requests from servers can be one of the servers connect
> to the switch. When servers are virtualized, target hosts can even be on the
> same physical server, you could even have ARP messages being hairpinned back
> (IEEE802.1Qbc & 802.1Qbg). 
>
> If you want to add a function to snoop all the ARP requests and replies (ARP
> Proxy) to cache target IP-MAC/VLAN to reduce broadcast messages, it is much
> more effective to add this function on the ports facing servers, or in a
> centralized CPU, so that the switch's local cache (IP<->MAC/VLAN) can have
> all the IP <->MAC/VLAN mapping for all hosts traversing through this switch.
> When a target IP is found in the switch's local cache, the unicast MAC/VLAN
> is used as DA, so that this frame will only exit one port of the switch,
> instead of broadcasting to all ports. If the frame is to exit the TRILL
> port, the RBridge function on the port will encapsulate the frame with
> appropriate egress RBridge address. Most likely the frame will exit ports
> facing other servers connected to the switch. 

It sounds like you're advocating modifying the packet in flight, so that
the Ethernet MAC DA becomes unicast rather than broadcast.

That's possible, and it does have the somewhat interesting property that
it could be modularized, but it's not what the draft talks about.

The advantages of what the draft proposes are that it preserves the
frame's integrity (so that it doesn't presume to know whether the
broadcast/unicast distinction is important to the receiver), it doesn't
require modifications of the original frame (which are terribly
expensive in some implementations -- prepend is cheap, but modify costs
a data copy operation), and, perhaps most importantly, it's *exactly*
analogous to the existing multicast optimization.  When the system knows
(from monitoring other protocols, such as IGMP) what ports or paths a
packet should traverse, it uses them, and when it doesn't know, it sends
it through the normal broadcast-like tree.  This just the same scheme
applied to ARP/NDP.  (And, really, what's done for unicast as well.)

As I see it, as an implementer, you're really free to do either one of
these mechanisms (as you propose or as the draft proposes) without
having to coordinate with anyone else, because all of this relies on the
existing unicast infrastructure.  It's not a true protocol issue and
requires no extra signaling.  As such, perhaps it doesn't really need to
be in an IETF draft and could be left to implementations to decide what
to do.  But it's also not something worth fighting over.

At most, perhaps the draft could acknowledge that there's a fourth
option -- mangle^Wset the DA on the packet to the cached value and
forward normally.

> Therefore, mixing ARP proxy function with TRILL just creates a more
> complicated specification, which doesn't help anyone (system vendors,
> component vendors or software stack vendors). Most importantly, ARP proxy
> and RBridge are two distinct functions.  This is especially bad when APR
> proxy has been deployed in many places. 

I still don't get the "complicated specification" issue.  This is a
separate draft.  I would expect that as it's an optional feature, it'll
remain a separate document "forever."  This isn't like one of those IEEE
lettered documents that eventually gets folded into the base document.

Where is it written that ARP proxy and RBridge "are" or must be distinct
functions?  You seem to be stating that as an article of faith, but I
don't understand the basis.

And I fail to see how deployment of any sort of ARP Ethernet MAC DA
modification mechanism would make the proposed mechanism "bad."  Do they
fail to interoperate in some way?  Demonstrating some problem -- some
way in which "true ARP proxies" fail to interoperate with RBridges using
the ideas in the draft -- would go a long way towards making the
objections clear.

> For a switch with TRILL ports, ports facing to (virtual) servers, and ports
> facing other switches via Ethernet interfaces, mixing ARP proxy with TRILL
> makes even less sense. Most traffic will be switched between
> Ethernet<->Ethernet and Ethernet <-> TRILL. 

I'm afraid I can't parse that objection.  I can't see how the
applicability of the mechanism proposed in the draft would ever differ
from that of any other optimization mechanism.

If there are indeed special usage cases where optimization is unwise,
then savvy implementers should avoid making those unwise optimizations
for the sake of the product they're building.  If they don't, it's not a
big deal, because the marketplace will eventually solve that problem as
well.

> If it is still not clear, we can discuss more in Maastricht. 

It'll be a decidedly one-sided conversation, as I have no plans to
travel.  ;-}

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>
_______________________________________________
rbridge mailing list
rbridge@postel.org
http://mailman.postel.org/mailman/listinfo/rbridge