Re: [trill] Fwd: Multidestination frames over parallel P2P links

Ayan Banerjee <ayabaner@gmail.com> Tue, 10 November 2015 07:14 UTC

Return-Path: <ayabaner@gmail.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A1FAC1B3440 for <trill@ietfa.amsl.com>; Mon, 9 Nov 2015 23:14:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aFypdFk3QLh8 for <trill@ietfa.amsl.com>; Mon, 9 Nov 2015 23:14:10 -0800 (PST)
Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8A4721B343F for <trill@ietf.org>; Mon, 9 Nov 2015 23:14:10 -0800 (PST)
Received: by igvi2 with SMTP id i2so91915300igv.0 for <trill@ietf.org>; Mon, 09 Nov 2015 23:14:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/FPFR283gmuB+PBaIhU5FBXYhDSwc/gPhi7wR/q4QpM=; b=dQC9GzbnKKldVPt+IUqDvQpJ1TDekIpzyR7ZfD78Ttz6pBWMnrO6kaT1+9t0HWzsFP CL4697D2kswe5ys0gjUb5OK8vbsen8HJovPLexw62t7EcsNpeVvsYWpT3tQI2UsyumUS YsCDopbkbnFfkOxS+aoNsDzBr4k+aWFsoC6BroYRptyDkIpaIh48uTNq0i/NYXl+fC8L kn8YDpLYtfcwVZsegRFQJo214EUl3XJBeASKa2SAf6a56fv4PEaS08Tp0HzNCZFD/UPB WPtld5qN96B2YT1mv4hwYUHPv08ZebUsUPZZ1zTMfzIejj+SXCXqpehSqSiIEwUShX8c HDkw==
MIME-Version: 1.0
X-Received: by 10.50.50.137 with SMTP id c9mr23455291igo.23.1447139649819; Mon, 09 Nov 2015 23:14:09 -0800 (PST)
Received: by 10.36.79.11 with HTTP; Mon, 9 Nov 2015 23:14:09 -0800 (PST)
In-Reply-To: <4552F0907735844E9204A62BBDD325E78721C4BD@nkgeml512-mbx.china.huawei.com>
References: <CANi4_5fGOpRYeKxDZApR-CF1ZsnhvSB5p=fBs1ORE0e19trVcw@mail.gmail.com> <CAF4+nEFRN_ko-7MF8+f+ZxwY9wLaN-AUnQN+jLPPRpWyp14FOg@mail.gmail.com> <CANi4_5dYZny=EJYV_S5FYVopYmiQZOsL8DYEtDn67zyiu1WUNw@mail.gmail.com> <CANi4_5c05=h2uOauV3w1n2YjbHMkpbO=yxE41YAijf_CkY51nQ@mail.gmail.com> <CAF4+nEHgVdFGHf8nY6urRa4ySq7NOD2+LzcX7o0sDT8iBZWB6A@mail.gmail.com> <4552F0907735844E9204A62BBDD325E78721C4BD@nkgeml512-mbx.china.huawei.com>
Date: Mon, 09 Nov 2015 23:14:09 -0800
Message-ID: <CAHD03N_d+6XwVBQzQYuDp3aWkvYK+eKdHNJ1_27pFG11pn9LSQ@mail.gmail.com>
From: Ayan Banerjee <ayabaner@gmail.com>
To: Mingui Zhang <zhangmingui@huawei.com>
Content-Type: multipart/alternative; boundary="047d7bdc07b2c49dff05242a74f6"
Archived-At: <http://mailarchive.ietf.org/arch/msg/trill/8PNIvQ4VFMwcecP5s10ocd0SaIU>
Cc: Donald Eastlake <d3e3e3@gmail.com>, Petr Hroudný <petr.hroudny@gmail.com>, "trill@ietf.org" <trill@ietf.org>
Subject: Re: [trill] Fwd: Multidestination frames over parallel P2P links
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2015 07:14:14 -0000

I agree. It would be best to have that clarified.

Thanks,
Ayan

On Mon, Nov 9, 2015 at 12:33 AM, Mingui Zhang <zhangmingui@huawei.com>
wrote:

> Hi Donald and Petr,
>
> It is interesting that similar problem (regards to the same text in
> Section 4.5.2, bullet 3) was proposed when I talked with implementers a few
> IETF meetings ago.
>
> I agree with Petr. With the current text, implementers might interpret the
> tie-breaking mechanism as not considering the link cost. I also agree that
> we can specify to remove that ambiguity of in the tie-breaker mechanism. So
> the two adjacent RBridges always choose the most preferred link from those
> subset of links with the same lowest cost.
>
> Thanks
> Mingui
>
> > -----Original Message-----
> > From: trill [mailto:trill-bounces@ietf.org] On Behalf Of Donald Eastlake
> > Sent: Friday, November 06, 2015 3:32 PM
> > To: Petr Hroudný
> > Cc: trill@ietf.org
> > Subject: Re: [trill] Fwd: Multidestination frames over parallel P2P links
> >
> > Hi Petr,
> >
> > I have gone back and researched the old email as to why these provisions
> are in
> > RFC 6325. It appears to have primarily been the result of a desire for
> simplicity.
> >
> > When building a multi-destination traffic distribution tree, a single
> link is
> > selected from each non-root node towards the root. If there are multiple
> links
> > between N and potential parent P that have pseudonodes, then they are all
> > distinguished in the global topology -- the least cost such
> distinguished link is
> > chosen or, if there are multiple equal cost links, one is chose as
> specified in RFC
> > 6325.
> >
> > When there are multiple links between N and P that do not have
> pseudonodes,
> > I think it was assumed they would usually be aggregated and appear to
> TRILL
> > as a single link with the cost of the least cost member of the
> aggregation or
> > lower. The corner case where they are not aggregated and have different
> costs
> > is not handled optimally/clearly.
> >
> > See further comments below:
> >
> > On Tue, Nov 3, 2015 at 12:23 PM, Petr Hroudný <petr.hroudny@gmail.com>
> > wrote:
> > > Hi Donald,
> > >
> > > many thanks for your response and explanation. However, let me present
> > > two scenarios where this behavior might cause problems.
> > >
> > > 1) two P2P links between switches, one primary (10GE), another one
> > > (1GE) just for backup
> > > 2) two parallel P2P links with equal BW, when one of them needs to be
> > > taken down for e.g. fiber maintenance
> > >
> > > The usual approach for 1) would be to assign higher metric to the
> > > backup link The usual approach for 2) would be to assign highest
> > > possible metric to the link undergoing maintenance.
> > >
> > > Both approaches will work correctly for unicast traffic, but might not
> > > work for multi-destination traffic, depending on Extended Circuit ID
> > > assignment.
> > >
> > > It looks like there's some discrepancy between sections 4.5.1 and
> > > 4.5.2 of RFC6325 - the former says:
> > >
> > > "Note that there might be multiple **equal cost** links between N and
> > > potential parent P that have no pseudonodes, because they are either
> > > point-to-point links or pseudonode-suppressed links."
> >
> > Yes, thanks for spotting that. The text continues after your quote to
> say "...
> > Such links will be treated as a single link for the purpose of tree
> building ..."
> >
> > > while the latter doesn't take cost into consideration during
> > > tiebreaking.
> >
> >
> > Well, 4.5.2 does not mention cost but says "If the tree-building and
> tiebreaking
> > for a particular multi-destination frame distribution tree selects a
> > non-pseudonode link between RB1 and RB2, that "RB1-RB2 link"
> > might actually consist of multiple links. ..."
> >
> > If you interpret the words "non-pseudonode link" in 4.5.2 to refer to
> the "single
> > link' referred to in 4.5.1, then the tie breaking would only apply to
> the equal
> > lowest cost non-pseudonode links. (The different uses of "link" are
> somewhat
> > confusing.) At best, 4.5.1 and
> > 4.5.2 considered together seem somewhat ambiguous.
> >
> > > Do you think it would be possible to start a discussion about
> > > modifying section 4.5.2 item 3) to only perform tiebreaking among a
> > > set of adjacencies with minimal equal cost ? In my example:
> > >
> > > link #1    cost: 1000      Extended Circuit ID: 11
> > > link #2    cost: 1000      Extended Circuit ID: 12
> > > link #3    cost: 10000    Extended Circuit ID: 13
> > > link #4    cost: 2^24-1   Extended Circuit ID: 14
> > >
> > > it would mean, that only links #1 and #2 will be selected for
> > > tie-breaking and based on Extended Circuit ID, link #2 would be used
> > > for multi-destination traffic.
> >
> > I think that would be a reasonable thing to specify to remove the
> ambiguity of
> > Sections 4.5.1 and 4.5.2 of RFC 6325.
> >
> > Does anyone else have any comments on this?
> >
> > Thanks,
> > Donald
> > =============================
> >  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
> >  155 Beaver Street, Milford, MA 01757 USA  d3e3e3@gmail.com
> >
> > >    Thanks, Petr
> > >
> > >
> > > 2015-11-01 15:08 GMT+01:00 Donald Eastlake <d3e3e3@gmail.com>:
> > >>
> > >> Hi Petr,
> > >>
> > >> Sorry for my delay in responding. I was traveling.
> > >>
> > >> On Wed, Oct 28, 2015 at 5:13 PM, Petr Hroudný
> > >> <petr.hroudny@gmail.com>
> > >> wrote:
> > >> > Hi all,
> > >> >
> > >> > I'm looking for the WG opinion about forwarding of multidestination
> > >> > frames over parallel P2P links between TRILL switches.
> > >> >
> > >> > According to RFC6325, section 4.5.2, item 3 a), when multiple
> > >> > parallel P2P links exist between RB1 and RB2:
> > >> >
> > >> > "Most preferred are those established by P2P Hellos.Tiebreaking
> > >> > among those is based on preferring the one with the numerically
> > >> > highest Extended Circuit ID as associated with the adjacency by the
> > >> > RBridge with the highest System ID."
> > >>
> > >> Of course, as stated in RFC 6325, the above only applies to
> > >> multi-destination TRILL Data frames. Known unicast TRILL Data frames
> > >> can be allocated among the parallel links however the implementation
> > >> wants.
> > >>
> > >> > Does it mean that link cost should be completely ignored during the
> > >> > above tiebreaking?
> > >>
> > >> The TRILL specification is primarily focused on correct operation,
> > >> not necessarily optimal performance when there is a peculiar
> > >> configuration. Parallel links would most commonly be of equal
> > >> bandwidth. If there is a truly extreme difference in bandwidth, say
> > >> 100 to 1 or more, you might be probably better of to just treat the
> > >> comparatively very slow links as if they were down. Just how extreme
> > >> the difference would have to be is an implementation choice. But
> > >> TRILL will act "correctly", even if it might be slowly, in delivering
> > >> data even if you use all of set of links of extremely different speed.
> > >>
> > >> I think a common thing to do would be to use 802.1AX Link aggregation
> > >> to aggregate these links so they will then appear to TRILL as a
> > >> single link. IEEE Std 802.1AX-2008 required that all the links be of
> > >> equal bandwidth. In IEEE Std 802.1AX-2014, that restriction is
> > >> removed and 802.1AX says "Aggregation of links of different data
> > >> rates is not prohibited nor required by this standard. Determining
> > >> how to distribute traffic across links of different data rates is
> > >> beyond the scope of this standard." So you have the same thing there.
> > >> Should you aggregate a 10Gbps link with a 10Mbps link? I would say
> > >> that if you do, you should consider allocating 100% of the traffic to
> > >> the faster link (until it fails).
> > >>
> > >> Another thing you could do in a TRILL implementation is allocate
> > >> Extended Circuit IDs so that higher bandwidth links had high IDs.
> > >> This would cause the faster link to be selected for multi-destination
> data.
> > >>
> > >> > Suppose the following setup:
> > >> >
> > >> > link #1    cost: 1000      Extended Circuit ID: 11
> > >> > link #2    cost: 1000      Extended Circuit ID: 12
> > >> > link #3    cost: 10000    Extended Circuit ID: 13
> > >> > link #4    cost: 2^24-1   Extended Circuit ID: 14
> > >> >
> > >> > Should the switch really prefer link #4 because of the highest
> > >> > Extended Circuit ID, regardless of the fact that the link cost was
> > >> > set to maximum in order to take the link out of service?
> > >>
> > >> Link cost 2^24-1 is special and such a link is only used for data
> > >> allocated to that link by traffic engineering. So you can forget
> > >> about link #4 with that cost, but it could be 2^24-2.
> > >>
> > >> Note that Extended Circuit IDs are not symmetric and you have a
> > >> different ID in each direction, which is why RFC 6325 says to use the
> > >> ID from the end where the TRILL switch has the highest System ID.
> > >>
> > >> Anyway, assuming link #4 was 2^24-2, the TRILL standard says to use
> > >> it for multi-destination traffic, but I think a good implementation
> > >> would just be ignoring that link.
> > >>
> > >> Thanks,
> > >> Donald
> > >> =============================
> > >>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
> > >>  155 Beaver Street, Milford, MA 01757 USA  d3e3e3@gmail.com
> > >>
> > >> >     Thanks, Petr
> >
> > _______________________________________________
> > trill mailing list
> > trill@ietf.org
> > https://www.ietf.org/mailman/listinfo/trill
> _______________________________________________
> trill mailing list
> trill@ietf.org
> https://www.ietf.org/mailman/listinfo/trill
>