Re: [mpls] draft-villamizar-mpls-tp-multipath

Yong Lucy <lucyyong@huawei.com> Wed, 28 July 2010 08:48 UTC

Return-Path: <lucyyong@huawei.com>
X-Original-To: mpls@core3.amsl.com
Delivered-To: mpls@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7521E3A69F7 for <mpls@core3.amsl.com>; Wed, 28 Jul 2010 01:48:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.645
X-Spam-Level:
X-Spam-Status: No, score=-0.645 tagged_above=-999 required=5 tests=[AWL=-0.150, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_COM=0.553, RDNS_NONE=0.1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G7bDgemBNygD for <mpls@core3.amsl.com>; Wed, 28 Jul 2010 01:48:49 -0700 (PDT)
Received: from szxga05-in.huawei.com (unknown [119.145.14.67]) by core3.amsl.com (Postfix) with ESMTP id CB40B3A6947 for <mpls@ietf.org>; Wed, 28 Jul 2010 01:48:48 -0700 (PDT)
Received: from huawei.com (szxga05-in [172.24.2.49]) by szxga05-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0L6900MIWECGY5@szxga05-in.huawei.com> for mpls@ietf.org; Wed, 28 Jul 2010 16:31:28 +0800 (CST)
Received: from huawei.com ([172.24.2.119]) by szxga05-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0L6900HOPECFM4@szxga05-in.huawei.com> for mpls@ietf.org; Wed, 28 Jul 2010 16:31:28 +0800 (CST)
Received: from y736742 (dhcp-72c7.meeting.ietf.org [130.129.114.199]) by szxml02-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTPA id <0L69001I6EC7WL@szxml02-in.huawei.com>; Wed, 28 Jul 2010 16:31:27 +0800 (CST)
Date: Wed, 28 Jul 2010 03:31:21 -0500
From: Yong Lucy <lucyyong@huawei.com>
In-reply-to: <201007271905.o6RJ5Xrl069109@harbor.orleans.occnc.com>
To: curtis@occnc.com
Message-id: <04c701cb2e2f$47176fd0$c7728182@china.huawei.com>
MIME-version: 1.0
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3350
X-Mailer: Microsoft Office Outlook 11
Content-type: text/plain; charset="us-ascii"
Content-transfer-encoding: 7bit
Thread-index: Acstvs+Wp1J+pWIoQRuvm0YU8kyRuQAbpE6A
References: "Your message of Tue, 27 Jul 2010 11:02:05 CDT." <03e201cb2da5$15f4a740$c7728182@china.huawei.com> <201007271905.o6RJ5Xrl069109@harbor.orleans.occnc.com>
Cc: mpls@ietf.org
Subject: Re: [mpls] draft-villamizar-mpls-tp-multipath
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2010 08:48:50 -0000

Curtis,

You are right. This is not thread for large flow classification. Sorry.

I agree that the flow size can't be larger than component link size minus
the epsilon.

PS: you missed my point about large flow mark in previous mail.

Regards,
Lucy

> -----Original Message-----
> From: curtis@occnc.com [mailto:curtis@occnc.com]
> Sent: Tuesday, July 27, 2010 2:06 PM
> To: Yong Lucy
> Cc: curtis@occnc.com; mpls@ietf.org
> Subject: Re: draft-villamizar-mpls-tp-multipath
> 
> 
> In message <03e201cb2da5$15f4a740$c7728182@china.huawei.com>
> Yong Lucy writes:
> >
> > Cutis,
> >
> > Please see inline.
> >
> > > >
> > > > Comment:
> > > >
> > > > 1) Adjusting load over component link may cause flow reordering.
> > >
> > > Yes.  That is why adjustments should be minimizaed and the why
> > > adjustments should avoid moving the same traffic back and forth if it
> > > sits on some sort of mathematical boundary.
> > >
> > > > 2) If a huge amount of micro flows mix with few large and long live
> > > flows,
> > > > the load per entry can be out of balance
> > >
> > > For a hash space size of 24 bits, 1/16,000,000 of the tiny flows will
> > > be mixed in the same hash bucket as a large flow.  This is why large
> > > flows can only be sized at max-component BW minus epsilon, where
> > > epsilong depends on the hash space size.
> > [LY] This is true when only hashing is used.
> > >
> > > Perhaps you haven't figured out how a very large hash space can be
> > > supported with a modest amount of hardware.  It can.  You might want
> > > to look at the expired optimized multipath drafts in data-tracker.
> > > There is a little more detail there.
> > [LY] hashing is stateless and provides the simple way to support the
> large
> > hash space. This is not the problem we address.
> 
> This thread is not about your draft.  It is about a draft which
> extends an existing widely implemented technique used with MPLS and
> multipath to support MPLS-TP.
> 
> > > > 3) Measure component link load does not provide the info. for table
> > > entry
> > > > mapping adjustment
> > >
> > > The measurement is much finer that component link.  Again, look at the
> > > expired optimized multipath drafts in data-tracker.
> > >
> > > This is a solved problem.  The sections that describe how it was
> > > successfully done a decade ago is intended only to provide an
> > > existance proof that there is a simple solution.  There are other
> > > solutions that make use of the same properties of the hash but build
> > > the hardware somewhat differently than described in the OMP drafts.
> > >
> > > > Internet traffic pattern today is different from decade or more
> years
> > > ago.
> > > > Hashing works well under the condition
> > >
> > > Yes.  I know that.
> > >
> > > > 1) # of flows is very large and flow IDs are statistically random;
2)
> > > flow
> > > > BWs are pretty balanced.
> > >
> > > The number of flows observed within a minute interval 15 years ago was
> > > in the millions.  Today in LAGs of tens of 10GbE, it is likely to be
> > > in the billion range.  I am aware of this.  If 2% of the flows are
> > > large then 2% of a billion is also large meaning that they are very
> > > numerous.  Building core equipment support 20 million hardware entries
> > > to handle such a case is not a good suggestion.
> > [LY] If large flow has the notion that its rate is about the threshold,
> it
> > is easy to count how many large flows a link can support. Be honest, we
> > don't see such many large flows in today's network.
> > For a 10x10GE link,
> > If 500Mbps is the threshold for large flow, the link can not take more
> than
> > 200 large flows.
> 
> 500 Mbps x 200 is 100Gb/s is the whole 10x10G.
> 
> > If 50M is the mark, the link can not take more than 2000 large flows.
> 
> 50 Mbps x 2000 is 100Gb/s is the whole 10x10G.
> 
> > If 5M is the mark, the link can not take more than 20000 large flows.
> 
> 5 Mbps x 20000 is 100Gb/s is the whole 10x10G.
> 
> So total is 300G.  Is something wrong with your math here?
> 
> The large flows come and go.  A flow for an enterprise VPN may be
> small all day and then run a database backup.  They are not going to
> signal any LSP differently before this happens.  In the core the PW
> LSP associated with this flow may be inside another MPLS LSP used to
> improve network scalability.  Most likely the PW with be inside an LDP
> LSP which is inside a TE LSP in the core.
> 
> > [LY] We don't think the flow with 5Mpbs belonging to large flow class.
> We
> > need to speak to reality. It seems that we can different ways to count
> ratio
> > of large flow ratio.
> 
> Once again, this is not about your draft.  Start another thread for
> that.  This is about small changes to an existing MPLS technique that
> allow MPLS (without the TP) to be used as a server layer for MPLS-TP.
> There is no signaling of large or small flows so that topic is
> irrelevant to this draft.
> 
> Curtis