Re: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.

"Ramkumar Parameswaran (ramkumar)" <ramkumar@cisco.com> Fri, 24 August 2012 02:43 UTC

Return-Path: <ramkumar@cisco.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0BEAE21F849B for <trill@ietfa.amsl.com>; Thu, 23 Aug 2012 19:43:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6tSZI5TNca3C for <trill@ietfa.amsl.com>; Thu, 23 Aug 2012 19:43:18 -0700 (PDT)
Received: from rcdn-iport-2.cisco.com (rcdn-iport-2.cisco.com [173.37.86.73]) by ietfa.amsl.com (Postfix) with ESMTP id D20FE21F8497 for <trill@ietf.org>; Thu, 23 Aug 2012 19:43:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=ramkumar@cisco.com; l=7970; q=dns/txt; s=iport; t=1345776198; x=1346985798; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=uJQLAT3hOW/wQFE/MrPjZ9e4Nhj3ljRoD3nkGjXUeOo=; b=TTjqwVh8ZLbJBdMzGsZvOzV+hk66E/KQkLw3lreWqP1kK76DF3sj3dV1 a5zkdl/l2MFiXy8H/koxzv6UBnpRG59SQy114zB5X9QDcEX/u2uQ091NO DXyJcMkYWBEbRO+iOe2/zyQuCvsXGN8o8t+/fP/8jquovKYGnyEBUnppl w=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgAFALroNlCtJXG8/2dsb2JhbABCA7pTgQeCIAEBAQMBAQEBCwQBFBMrAgcLBQsCAQgOAwQBAQsLCRAhBgsdCAIEDgUIGodcAwYFAQuZSJZGDYlKBIolY4NpB4JBYAOUAYxggyCBZ4Jj
X-IronPort-AV: E=Sophos;i="4.80,301,1344211200"; d="scan'208";a="114796497"
Received: from rcdn-core2-1.cisco.com ([173.37.113.188]) by rcdn-iport-2.cisco.com with ESMTP; 24 Aug 2012 02:43:09 +0000
Received: from xhc-aln-x04.cisco.com (xhc-aln-x04.cisco.com [173.36.12.78]) by rcdn-core2-1.cisco.com (8.14.5/8.14.5) with ESMTP id q7O2h785015418 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Fri, 24 Aug 2012 02:43:07 GMT
Received: from xmb-rcd-x15.cisco.com ([169.254.5.218]) by xhc-aln-x04.cisco.com ([173.36.12.78]) with mapi id 14.02.0298.004; Thu, 23 Aug 2012 21:43:07 -0500
From: "Ramkumar Parameswaran (ramkumar)" <ramkumar@cisco.com>
To: Donald Eastlake <d3e3e3@gmail.com>
Thread-Topic: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.
Thread-Index: AQHNgHupUthoY2E1xEuAqXewoT2QYZdmdjgAgAHDm1w=
Date: Fri, 24 Aug 2012 02:43:06 +0000
Message-ID: <29177B06F86DD543B9AFA2ECD0312CDB0F4BBC0F@xmb-rcd-x15.cisco.com>
References: <CC5A4A25.3C56%ramkumar@cisco.com>, <CAF4+nEG5e5PuiPV_j+Uo3qEaayuz9t36y-JznF2cbSnh88hkGg@mail.gmail.com>
In-Reply-To: <CAF4+nEG5e5PuiPV_j+Uo3qEaayuz9t36y-JznF2cbSnh88hkGg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [171.71.55.9]
x-tm-as-product-ver: SMEX-10.2.0.1135-7.000.1014-19134.001
x-tm-as-result: No--54.596800-8.000000-31
x-tm-as-user-approved-sender: No
x-tm-as-user-blocked-sender: No
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "trill@ietf.org" <trill@ietf.org>
Subject: Re: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/trill>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Aug 2012 02:43:19 -0000


Hi Donald,

Thanks for the consideration. One comment inline, tagged [**]:
________________________________________
From: trill-bounces@ietf.org [trill-bounces@ietf.org] on behalf of Donald Eastlake [d3e3e3@gmail.com]
Sent: Wednesday, August 22, 2012 11:12 AM
To: Ramkumar Parameswaran (ramkumar)
Cc: trill@ietf.org
Subject: Re: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.

Hi Ramkumar,

On Wed, Aug 22, 2012 at 11:34 AM, Ramkumar Parameswaran (ramkumar)
<ramkumar@cisco.com> wrote:

> Hi,
>
> With regard to the Parallel Link check in section 4.5.2 of RFC 6325,
> item 3 talks about how to select links in multi-destination trees
> when a node in a multi-destination tree has a single parent
> connected to it by multiple links. The section in the RFC identifies
> how to select a link in the bundle, over which all multi-destination
> traffic may traverse, when the links are of type P2P or LAN links
> with pseudonode suppressed. Specifically, a single link is selected
> and traffic for all multi-destination trees is carried over the
> selected link.

Generally speaking, if the links are equal speed P2P Ethernet links, I
would recommend just aggregating them with 802.1AX, which resolves
these issues.

> When a node is connected to a single parent across several LAN links
> which advertise a pseudonode (when pseudonode is not suppressed),
> RFC 6325 section 4.5.1 already specifies a way to load balance
> multi-destination traffic across these links by pulling a unique
> link into each multi-destination tree.
>
> This is illustrated in the following example
>
>           |   Node B      | -> upstream, closer to tree root
>           -----------------
>              /  |   |  \
>             |   |   |   |
>             |01 02  |03 |04
>             |   |   |   |
>              \  |   |  /
>          -----------------
>          |   Node A      |
>

> Consider Node A and node B being connected to each other on 4
> independent LAN links. Each of the four links above is a LAN link
> with a DRB elected and pseudo-node advertised. The number to the
> right of the link identifies the pseudo-node id operational on the
> link. In this situation, per 4.5.1, since seven byte system id is
> considered in load-balancing links across trees, other factors
> holding, link 01 will be assigned to Tree 1, link 02 will be
> assigned to tree 2 and so on.
>
> However, if the links were P2P links with 01, 02, 03, and 04 being
> the extended circuit id of the node with higher system id (B, say),
> then per 4.5.2, link 04 would be pulled into all multi-destination
> trees, with no load-balancing on any of the other links.

This provision in RFC 6325 didn't come out of nowhere. It was
requested due, if I recall correctly, to limitation of some hardware
in implementing the Reverse Path Forwarding Check for
multi-destinaiton frames.

> For P2P links and pseudonode suppressed LAN links, we were wondering
> whether the following variant can be accommodated, so that
> multi-destination traffic can be spread over more links, with better
> link utilization:

When there are multiple parallel unlabeled links visible to TRILL
between two RBridges, this is reported to the campus as a single
adjacency. If the two RBridges can come to agreement, it is really a
private matter between them how those RBridges handle it. The
provisions in RFC 6325 are just a minimum to assure that the default
behavior will interoperate, taking into account the hardware
limitations that, at the time, indicated only one of such links could
be used for multi-destinaiton frames.

There are lots of possible optimizations that are left out so that
implementations have room to distinguish themselves. For example, the
sending RBridge gets to chose which of the unlabeled parallel links to
send unicast frames on. In doing so, it might want to avoid the link
chosen for multi-destination.

> Modified Approach: Once a dominant link-type is identified using the
> tie-break rule specified in item 3 (P2P or LAN link with suppressed
> pseudo node), if there is more than one link of the dominant type,
> then such links are arranged in ascending order of parent's Extended
> Circuit ID or Pseudonode ID, given serial numbers starting with 0,
> and assigned to trees with a logic similar to the multi-parent ECMP
> case:
>
> Link on Tree T = (T-1) mod N where N is the number of parallel links
>
> The parallel link check in section 4.5.2 must be modified to allow
> only the adjacency that is selected for the tree that the packet
> arrived on.

I don't see any particular problem with such a change if we could get
assurance that the previous reported hardware limitation was no longer
a factor. However, it is pretty hard to prove a negative. And if you
could, so different multi-destination frames could be sent over
different links of such a bundle of unlabeled links, then why have any
limiations at all? Just permit the sending RBridge to send over
whichever link its private hash algorithm and weighting indicate...

[**]: Agree in general, but wanted to point out the following - 
the unrestricted approach of sending on any link in the bundle (on a tree) may have issues with RPF  (Parallel link) check.  
Waiving the RPF check may not be a good option, for whatever reasons that RPF check exists in networks today. If the RPF check is 
retained, the hardware may have to maintain a list of interfaces to cross check against and this would be expensive - Table space for a single interface is probably the norm. 
It may also dilute the check. Or, the load-distribution hash may need to be applied on the receive side before the RPF check as well, again this may be expensive in 
hardware. 

When it was discussed in the past, were you considering multiple links in the same tree, or one link per tree? 
The approach we are suggesting needs only one RPF interface (per tree), and makes the P2P and pseudo-node suppressed LAN cases 
symmetric with the treatment of regular LAN.  Treating it as optional tied to a capability bit is fine. 

thanks,

ramkumar 



In fact, the capability of receiving multi-destination frames over any
link in such a unlabeled bundle could be advertised by a capability
bit, which would make such a relaxation of the Parallel Links Check
backwards compatible. If the capability was not signalled, you would
still have to do what is in RFC 6325. This seems like a reasonable way
to go to me.

Thanks,
Donald
=============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 d3e3e3@gmail.com

> No change in metric advertisement is needed with regard to the rest of the
> network, and in line with what is mentioned in 6325, the notion of mapping
> continues to remain local to the two adjacent nodes RB1-RB2.
>
> Please let us know if the approach suggested above is feasible or if we may
> have missed something.
>
> Thanks,
> Ramkumar
_______________________________________________
trill mailing list
trill@ietf.org
https://www.ietf.org/mailman/listinfo/trill