Re: [mpls] draft-villamizar-mpls-tp-multipath

Curtis Villamizar <curtis@occnc.com> Tue, 27 July 2010 14:50 UTC

Return-Path: <curtis@occnc.com>
X-Original-To: mpls@core3.amsl.com
Delivered-To: mpls@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 92FBD3A689F for <mpls@core3.amsl.com>; Tue, 27 Jul 2010 07:50:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.098
X-Spam-Level:
X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[AWL=-1.099, BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bisXs4E10RjE for <mpls@core3.amsl.com>; Tue, 27 Jul 2010 07:50:53 -0700 (PDT)
Received: from harbor.orleans.occnc.com (harbor.orleans.occnc.com [173.9.106.135]) by core3.amsl.com (Postfix) with ESMTP id 499BC3A6BA6 for <mpls@ietf.org>; Tue, 27 Jul 2010 07:50:53 -0700 (PDT)
Received: from harbor.orleans.occnc.com (harbor.orleans.occnc.com [173.9.106.135]) by harbor.orleans.occnc.com (8.13.6/8.13.6) with ESMTP id o6REpBup059259; Tue, 27 Jul 2010 10:51:11 -0400 (EDT) (envelope-from curtis@harbor.orleans.occnc.com)
Message-Id: <201007271451.o6REpBup059259@harbor.orleans.occnc.com>
To: Yong Lucy <lucyyong@huawei.com>
From: Curtis Villamizar <curtis@occnc.com>
In-reply-to: Your message of "Thu, 22 Jul 2010 15:19:03 CDT." <011101cb29db$22185df0$380c7c0a@china.huawei.com>
Date: Tue, 27 Jul 2010 10:51:11 -0400
Sender: curtis@occnc.com
Cc: mpls@ietf.org
Subject: Re: [mpls] draft-villamizar-mpls-tp-multipath
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: curtis@occnc.com
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 14:50:55 -0000

In message <011101cb29db$22185df0$380c7c0a@china.huawei.com>
Yong Lucy writes:
>  
> Hi Curtis,
>  
> Thank you for summarizing hashing based multipath solution in this draft.
> Share my opinion here.

Thanks for reading the draft.

> Draft Text:
>  
>    An alternate simple multipath technique uses a table
>    generally with a power of two size, and distributes the table entries
>    proportionally among component links according to the capacity of
>    each component link.
>  
>    An adaptive multipath technique is one where the traffic bound to
>    each component link is measured and the load split is adjusted
>    accordingly.   
>  
> End Text
>  
> Comment:
>  
> 1) Adjusting load over component link may cause flow reordering. 

Yes.  That is why adjustments should be minimizaed and the why
adjustments should avoid moving the same traffic back and forth if it
sits on some sort of mathematical boundary.

> 2) If a huge amount of micro flows mix with few large and long live flows,
> the load per entry can be out of balance

For a hash space size of 24 bits, 1/16,000,000 of the tiny flows will
be mixed in the same hash bucket as a large flow.  This is why large
flows can only be sized at max-component BW minus epsilon, where
epsilong depends on the hash space size.

Perhaps you haven't figured out how a very large hash space can be
supported with a modest amount of hardware.  It can.  You might want
to look at the expired optimized multipath drafts in data-tracker.
There is a little more detail there.

> 3) Measure component link load does not provide the info. for table entry
> mapping adjustment

The measurement is much finer that component link.  Again, look at the
expired optimized multipath drafts in data-tracker.

This is a solved problem.  The sections that describe how it was
successfully done a decade ago is intended only to provide an
existance proof that there is a simple solution.  There are other
solutions that make use of the same properties of the hash but build
the hardware somewhat differently than described in the OMP drafts.

> Internet traffic pattern today is different from decade or more years ago.
> Hashing works well under the condition 

Yes.  I know that.

> 1) # of flows is very large and flow IDs are statistically random; 2) flow
> BWs are pretty balanced. 

The number of flows observed within a minute interval 15 years ago was
in the millions.  Today in LAGs of tens of 10GbE, it is likely to be
in the billion range.  I am aware of this.  If 2% of the flows are
large then 2% of a billion is also large meaning that they are very
numerous.  Building core equipment support 20 million hardware entries
to handle such a case is not a good suggestion.

> Otherwise, hashing has a trouble to make even load balance.  Hashing +
> bucket mapping brings a room to deal with the unbalance if each bucket load
> is measured. 

You clearly don't understand how this works.  There is a finite
probability that too many large flows end up in the same hash bucket
but with a large hash space this probability becomes extremely small.
The consequence is modestly disruptive, a hash reseed, but the
probability of needing two consecutive reseeds is miniscule.  I've
studied this problem but didn't put anything in the draft.

> If the equipment understands flows better rather than just flow ID, more
> proactive way can apply to multipath load balance. This is why we suggest
> having large flow classification (draft-yong-pwe3-lfc-fat-pw-01)

I don't see any support at all for that.  I didn't see any signs of
support for it this morning in PWE.

I also didn't see any considerations for MPLS-TP in your draft.

This draft is about extending a current practice to support a new
requirement - having MPLS act as a server layer that can carry fully
compliant MPLS-TP client layer LSP.

> Regards,
>  
> Lucy

Curtis