Re: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.

Donald Eastlake <> Fri, 24 August 2012 19:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 764EB21F8503 for <>; Fri, 24 Aug 2012 12:34:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -103.522
X-Spam-Status: No, score=-103.522 tagged_above=-999 required=5 tests=[AWL=0.077, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Ut5jKkYDsd0R for <>; Fri, 24 Aug 2012 12:34:57 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 679A721F845B for <>; Fri, 24 Aug 2012 12:34:57 -0700 (PDT)
Received: by obbwc20 with SMTP id wc20so5684524obb.31 for <>; Fri, 24 Aug 2012 12:34:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=WJyDgcOLMj9WOtqzpCCUvTvJqTUEPCNZ925P5RQmBnA=; b=MSf34rEGyvuqiklTUBNoZwhkiC0ob0WnN55OJFbN7+gQqg+6MvBi7Dzua4g/3eHMeT olu68kGDYCKje+xDjJgyecDg28fAlbA9LM2xP71psqy/XKUgJpI6dgcBc4pS45SMRsx2 CdR6PajhSNhCRw/zI7mWgPsZx914IawjyzAlDcRER8Ra1b+3SuDi5lPYfQueBu4cO6CW CVk3x4pNv/SDg6xs2D4r2RwPGzaLrPeMlSpzdQHLjvYwrbWcEe2Bl9k8XzN1Pg7KbVFY mhrXOjxZvDSKC39cw8Xu9rU6PgSp/32K7JXGcRwLVpx3mF5+GR1IW6txzuNpJ7TLbMHI OBBA==
Received: by with SMTP id dn6mr3390400igb.16.1345836896624; Fri, 24 Aug 2012 12:34:56 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Fri, 24 Aug 2012 12:34:35 -0700 (PDT)
In-Reply-To: <>
References: <> <> <>
From: Donald Eastlake <>
Date: Fri, 24 Aug 2012 15:34:35 -0400
Message-ID: <>
To: "Ramkumar Parameswaran (ramkumar)" <>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "" <>
Subject: Re: [trill] Suggestion on RFC 6325 section 4.5.2, single parent ECMP link selection.
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Developing a hybrid router/bridge." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 24 Aug 2012 19:34:58 -0000

Hi Ramkumar,

On Thu, Aug 23, 2012 at 10:43 PM, Ramkumar Parameswaran (ramkumar)
<> wrote:
> Hi Donald,
> Thanks for the consideration. One comment inline, tagged [**]:
> ________________________________________
>> From: [] on behalf of
  Donald Eastlake []
>> Sent: Wednesday, August 22, 2012 11:12 AM
>> To: Ramkumar Parameswaran (ramkumar)
>> Cc:
>> Subject: Re: [trill] Suggestion on RFC 6325 section 4.5.2, single
  parent ECMP link selection.
>> Hi Ramkumar,
>> On Wed, Aug 22, 2012 at 11:34 AM, Ramkumar Parameswaran (ramkumar)
>> <> wrote:
>>> Hi,
>>> ...
>>> For P2P links and pseudonode suppressed LAN links, we were wondering
>>> whether the following variant can be accommodated, so that
>>> multi-destination traffic can be spread over more links, with better
>>> link utilization:
>> When there are multiple parallel unlabeled links visible to TRILL
>> between two RBridges, this is reported to the campus as a single
>> adjacency. If the two RBridges can come to agreement, it is really a
>> private matter between them how those RBridges handle it. The
>> provisions in RFC 6325 are just a minimum to assure that the default
>> behavior will interoperate, taking into account the hardware
>> limitations that, at the time, indicated only one of such links could
>> be used for multi-destinaiton frames.
>> There are lots of possible optimizations that are left out so that
>> implementations have room to distinguish themselves. For example, the
>> sending RBridge gets to chose which of the unlabeled parallel links to
>> send unicast frames on. In doing so, it might want to avoid the link
>> chosen for multi-destination.
>>> Modified Approach: Once a dominant link-type is identified using the
>>> tie-break rule specified in item 3 (P2P or LAN link with suppressed
>>> pseudo node), if there is more than one link of the dominant type,
>>> then such links are arranged in ascending order of parent's Extended
>>> Circuit ID or Pseudonode ID, given serial numbers starting with 0,
>>> and assigned to trees with a logic similar to the multi-parent ECMP
>>> case:
>>> Link on Tree T = (T-1) mod N where N is the number of parallel links
>>> The parallel link check in section 4.5.2 must be modified to allow
>>> only the adjacency that is selected for the tree that the packet
>>> arrived on.
>> I don't see any particular problem with such a change if we could get
>> assurance that the previous reported hardware limitation was no longer
>> a factor. However, it is pretty hard to prove a negative. And if you
>> could, so different multi-destination frames could be sent over
>> different links of such a bundle of unlabeled links, then why have any
>> limiations at all? Just permit the sending RBridge to send over
>> whichever link its private hash algorithm and weighting indicate...

> [**]: Agree in general, but wanted to point out the following -
> the unrestricted approach of sending on any link in the bundle (on a
> tree) may have issues with RPF (Parallel link) check.  Waiving the
> RPF check may not be a good option, for whatever reasons that RPF
> check exists in networks today. If the RPF check is retained, the
> hardware may have to maintain a list of interfaces to cross check
> against and this would be expensive - Table space for a single
> interface is probably the norm.  It may also dilute the check. Or,
> the load-distribution hash may need to be applied on the receive
> side before the RPF check as well, again this may be expensive in
> hardware.

I think you have re-discovered the reason for the current provisions
in the specification. Of course, if the RPF check used a pseudo port
number which was the same for all ports in such a TRILL port group,
then it would not take any more RPF table space... So this sort of
thing is critically dependent on just how the fast path hardware is

> When it was discussed in the past, were you considering multiple
> links in the same tree, or one link per tree?

I don't remember. You can go back and look in the mail archives if you

>  The approach we are suggesting needs only one RPF interface (per
> tree), and makes the P2P and pseudo-node suppressed LAN cases
> symmetric with the treatment of regular LAN.  Treating it as
> optional tied to a capability bit is fine.

I think in practice this situation would usually be solved by either
using link aggregation or, for LAN links, by enabling pseudo-nodes on
the parallel links. (With pseudo-nodes, you would typically get ECMP
between the fastest links for unicast and different trees on different
ones of the fastest links for multi-destination, with slower links
ignored.) Furthermore, if both RBridges want to cooperate (like they
are from the same manufacturer or something), they can do whatever
fancy thing they want.

As I said, the current provisions are to get something that will
operate and interoperate correctly if such a configuration actually
occurs. Is it worth the complexity to have additional logic, a
configuration bit, etc., to make it somewhat better?

In fact, you would need additional suggestions or provisions to really
improve things for a wide variety of conditions. For example, what
should you do if there are four unlabeled links between RB1 and RB2
that are 10Meg, 100Meg, 1Gig, and 10Gig bits per second repsectively?
Auto-configuration can only get you so far before it becomes
unreasonably complex and, to get the best performance, you need
configuration. For example, one factor in such configuration might be
what percentage of traffic overall is multi-destiation, something it
would be a pain for TRILL to figure out.

 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA

>> thanks,
>> ramkumar
>> In fact, the capability of receiving multi-destination frames over any
>> link in such a unlabeled bundle could be advertised by a capability
>> bit, which would make such a relaxation of the Parallel Links Check
>> backwards compatible. If the capability was not signalled, you would
>> still have to do what is in RFC 6325. This seems like a reasonable way
>> to go to me.
>> Thanks,
>> Donald
>> =============================
>>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>  155 Beaver Street, Milford, MA 01757 USA
>>> No change in metric advertisement is needed with regard to the rest of the
>>> network, and in line with what is mentioned in 6325, the notion of mapping
>>> continues to remain local to the two adjacent nodes RB1-RB2.
>>> Please let us know if the approach suggested above is feasible or if we may
>>> have missed something.
>>> Thanks,
>>> Ramkumar