Re: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.

Donald Eastlake <d3e3e3@gmail.com> Wed, 22 August 2012 18:12 UTC

Return-Path: <d3e3e3@gmail.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8536E21F8545 for <trill@ietfa.amsl.com>; Wed, 22 Aug 2012 11:12:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.528
X-Spam-Level:
X-Spam-Status: No, score=-103.528 tagged_above=-999 required=5 tests=[AWL=0.071, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7B1uisG13a5V for <trill@ietfa.amsl.com>; Wed, 22 Aug 2012 11:12:47 -0700 (PDT)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by ietfa.amsl.com (Postfix) with ESMTP id AA46521F844B for <trill@ietf.org>; Wed, 22 Aug 2012 11:12:47 -0700 (PDT)
Received: by iabz21 with SMTP id z21so1221362iab.31 for <trill@ietf.org>; Wed, 22 Aug 2012 11:12:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=gFtouaNYH6JdjzoPqcU95qiGyseXoBERrRyXQha2wn8=; b=RRgGbxqcoerZ4TrAbyUqqTWAkFFJ6u3sIjsCiGibW0Za62IW6ob1r9TL8xeOpSPGqU OB2ditDz3xMesfkB4Fx7Ud43v1fvjIiT7kG/c5Du6G4ki8gKuJrY3WBs3UYcEgkFyoPf b/yoGCssv45x5FvKbbyBipxMxq9BSOeujSdY7s0mpuLGhjUCoNDxW9vHyTRofEtmcg7P VGYVy02/MNxDyal+RKz3eN9m46jBLKk8BapO0h8qvD6h18Uol7bELHNBkduOZU5RikG7 HyS0iIfn2NQrZL8wPiojKTKxzijd46XLVctsM4ZX8bPxw/GPjA2C4jZ6Vk2kMIy00PBi sq9Q==
Received: by 10.43.45.200 with SMTP id ul8mr17775700icb.36.1345659167245; Wed, 22 Aug 2012 11:12:47 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.64.15.6 with HTTP; Wed, 22 Aug 2012 11:12:26 -0700 (PDT)
In-Reply-To: <CC5A4A25.3C56%ramkumar@cisco.com>
References: <CC5A4A25.3C56%ramkumar@cisco.com>
From: Donald Eastlake <d3e3e3@gmail.com>
Date: Wed, 22 Aug 2012 14:12:26 -0400
Message-ID: <CAF4+nEG5e5PuiPV_j+Uo3qEaayuz9t36y-JznF2cbSnh88hkGg@mail.gmail.com>
To: "Ramkumar Parameswaran (ramkumar)" <ramkumar@cisco.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "trill@ietf.org" <trill@ietf.org>
Subject: Re: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/trill>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Aug 2012 18:12:48 -0000

Hi Ramkumar,

On Wed, Aug 22, 2012 at 11:34 AM, Ramkumar Parameswaran (ramkumar)
<ramkumar@cisco.com> wrote:

> Hi,
>
> With regard to the Parallel Link check in section 4.5.2 of RFC 6325,
> item 3 talks about how to select links in multi-destination trees
> when a node in a multi-destination tree has a single parent
> connected to it by multiple links. The section in the RFC identifies
> how to select a link in the bundle, over which all multi-destination
> traffic may traverse, when the links are of type P2P or LAN links
> with pseudonode suppressed. Specifically, a single link is selected
> and traffic for all multi-destination trees is carried over the
> selected link.

Generally speaking, if the links are equal speed P2P Ethernet links, I
would recommend just aggregating them with 802.1AX, which resolves
these issues.

> When a node is connected to a single parent across several LAN links
> which advertise a pseudonode (when pseudonode is not suppressed),
> RFC 6325 section 4.5.1 already specifies a way to load balance
> multi-destination traffic across these links by pulling a unique
> link into each multi-destination tree.
>
> This is illustrated in the following example
>
>           |   Node B      | -> upstream, closer to tree root
>           -----------------
>              /  |   |  \
>             |   |   |   |
>             |01 02  |03 |04
>             |   |   |   |
>              \  |   |  /
>          -----------------
>          |   Node A      |
>

> Consider Node A and node B being connected to each other on 4
> independent LAN links. Each of the four links above is a LAN link
> with a DRB elected and pseudo-node advertised. The number to the
> right of the link identifies the pseudo-node id operational on the
> link. In this situation, per 4.5.1, since seven byte system id is
> considered in load-balancing links across trees, other factors
> holding, link 01 will be assigned to Tree 1, link 02 will be
> assigned to tree 2 and so on.
>
> However, if the links were P2P links with 01, 02, 03, and 04 being
> the extended circuit id of the node with higher system id (B, say),
> then per 4.5.2, link 04 would be pulled into all multi-destination
> trees, with no load-balancing on any of the other links.

This provision in RFC 6325 didn't come out of nowhere. It was
requested due, if I recall correctly, to limitation of some hardware
in implementing the Reverse Path Forwarding Check for
multi-destinaiton frames.

> For P2P links and pseudonode suppressed LAN links, we were wondering
> whether the following variant can be accommodated, so that
> multi-destination traffic can be spread over more links, with better
> link utilization:

When there are multiple parallel unlabeled links visible to TRILL
between two RBridges, this is reported to the campus as a single
adjacency. If the two RBridges can come to agreement, it is really a
private matter between them how those RBridges handle it. The
provisions in RFC 6325 are just a minimum to assure that the default
behavior will interoperate, taking into account the hardware
limitations that, at the time, indicated only one of such links could
be used for multi-destinaiton frames.

There are lots of possible optimizations that are left out so that
implementations have room to distinguish themselves. For example, the
sending RBridge gets to chose which of the unlabeled parallel links to
send unicast frames on. In doing so, it might want to avoid the link
chosen for multi-destination.

> Modified Approach: Once a dominant link-type is identified using the
> tie-break rule specified in item 3 (P2P or LAN link with suppressed
> pseudo node), if there is more than one link of the dominant type,
> then such links are arranged in ascending order of parent's Extended
> Circuit ID or Pseudonode ID, given serial numbers starting with 0,
> and assigned to trees with a logic similar to the multi-parent ECMP
> case:
>
> Link on Tree T = (T-1) mod N where N is the number of parallel links
>
> The parallel link check in section 4.5.2 must be modified to allow
> only the adjacency that is selected for the tree that the packet
> arrived on.

I don't see any particular problem with such a change if we could get
assurance that the previous reported hardware limitation was no longer
a factor. However, it is pretty hard to prove a negative. And if you
could, so different multi-destination frames could be sent over
different links of such a bundle of unlabeled links, then why have any
limiations at all? Just permit the sending RBridge to send over
whichever link its private hash algorithm and weighting indicate...

In fact, the capability of receiving multi-destination frames over any
link in such a unlabeled bundle could be advertised by a capability
bit, which would make such a relaxation of the Parallel Links Check
backwards compatible. If the capability was not signalled, you would
still have to do what is in RFC 6325. This seems like a reasonable way
to go to me.

Thanks,
Donald
=============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 d3e3e3@gmail.com

> No change in metric advertisement is needed with regard to the rest of the
> network, and in line with what is mentioned in 6325, the notion of mapping
> continues to remain local to the two adjacent nodes RB1-RB2.
>
> Please let us know if the approach suggested above is feasible or if we may
> have missed something.
>
> Thanks,
> Ramkumar