Re: [Fwd: I-D Action:draft-ietf-6man-flow-ecmp-01.txt]

Thomas Narten <narten@us.ibm.com> Tue, 05 April 2011 17:43 UTC

Return-Path: <narten@us.ibm.com>
X-Original-To: ipv6@core3.amsl.com
Delivered-To: ipv6@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1457A3A6952 for <ipv6@core3.amsl.com>; Tue, 5 Apr 2011 10:43:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.71
X-Spam-Level:
X-Spam-Status: No, score=-106.71 tagged_above=-999 required=5 tests=[AWL=-0.111, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EGwgi8gyL+ni for <ipv6@core3.amsl.com>; Tue, 5 Apr 2011 10:43:56 -0700 (PDT)
Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.141]) by core3.amsl.com (Postfix) with ESMTP id 3BBEE3A68DF for <ipv6@ietf.org>; Tue, 5 Apr 2011 10:43:56 -0700 (PDT)
Received: from d01dlp02.pok.ibm.com (d01dlp02.pok.ibm.com [9.56.224.85]) by e1.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p35HZHeH022843 for <ipv6@ietf.org>; Tue, 5 Apr 2011 13:35:17 -0400
Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 629A36E803F for <ipv6@ietf.org>; Tue, 5 Apr 2011 13:45:39 -0400 (EDT)
Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p35Hjd0f382220 for <ipv6@ietf.org>; Tue, 5 Apr 2011 13:45:39 -0400
Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p35HjcaE027493 for <ipv6@ietf.org>; Tue, 5 Apr 2011 13:45:39 -0400
Received: from cichlid.raleigh.ibm.com (sig-9-65-200-167.mts.ibm.com [9.65.200.167]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p35HjbMM027340 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 5 Apr 2011 13:45:38 -0400
Received: from cichlid.raleigh.ibm.com (cichlid.raleigh.ibm.com [127.0.0.1]) by cichlid.raleigh.ibm.com (8.14.4/8.12.5) with ESMTP id p35Hjacn018215; Tue, 5 Apr 2011 13:45:36 -0400
Message-Id: <201104051745.p35Hjacn018215@cichlid.raleigh.ibm.com>
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
Subject: Re: [Fwd: I-D Action:draft-ietf-6man-flow-ecmp-01.txt]
In-reply-to: <4D52FC43.90203@gmail.com>
References: <4D52FC43.90203@gmail.com>
Comments: In-reply-to Brian E Carpenter <brian.e.carpenter@gmail.com> message dated "Thu, 10 Feb 2011 09:42:43 +1300."
Date: Tue, 05 Apr 2011 13:45:36 -0400
From: Thomas Narten <narten@us.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
Cc: 6man <ipv6@ietf.org>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Apr 2011 17:43:57 -0000

Looking at the revised document, here are some additional comments.

   One lightweight approach to ECMP or LAG is this: if there are N
   equally good paths to choose from, then form a modulo(N) hash
   [RFC2991] from a consistent set of fields in each packet header
   that are certain to have the same values throughout the duration of
   a flow, and use the resulting output hash value to select a
   particular

would be nice to have a term better than "consistent". The point is,
you want to use fields that stay constant for a given flow.

   distribution, due to the pseudo-random nature of ephemeral ports.
   Ephemeral port numbers are quite well distributed [Lee10] and will

is "pseudo-random" right here? IN fact, do we even need that last part
of the sentence?

   o The flow label in the outer packet SHOULD be set by the sending
      TEP to a pseudo-random 20-bit value in accordance with [RFC3697]
      or its replacement.  The same flow label value MUST be used for

Don't like this psuedo-random requirement here. And, the TEP should be
setting the Flow Label in *exactly* the same way as 3697bis
recommends. Tunnels are no different...

      * Note that this rule is a recommendation, to permit individual
         implementers to take an alternative approach if they wish to
         do so.  For example, a simpler solution than a pseudo-random
         value might be adopted if it was known that the load balancer
         would

	 Carpenter & Amante Expires August 14, 2011 [Page 6]
	 
	 Internet-Draft Flow Label for tunnel ECMP/LAG February 2011


         continue to provide uniform distribution of flows with it.
         Such an alternative MUST conform to [RFC3697] or its
         replacement.


This is too wishy washy. It also suggests that the TEP setting the
Flow Label knows about the algorithm used by the load balancer. That
will rarely (never?) be the case and this document shouldn't suggest
this. 

      the relevant flow label into the outer IPv6 header.  A user flow
      could be identified by the ingress TEP most simply by its
      {destination, source} address 2-tuple (coarse) or by its 5-tuple
      {dest addr, source addr, protocol, dest port, source port} (fine).
      At present, ironically, there would be little advantage for IPv6
      packets in using the {dest addr, source addr, flow label} 3-tuple.

Ambiguous. Advantages compared to what?

Also, the Flow classification should simply follow the recommendation
in 3697bis, which says use  the 5 tuple, or, at a minimum, the 3
tuple. The 

      The choice of n-tuple is an implementation detail in the sending
      TEP.

No it's not. What may be a detail is the actual algorithm used. But
which fields to use should be a clear recommendation (e.g., taken from
3697bis).

      *  This stateless method creates a small probability of two
         different user flows hashing to the same flow label.  Since RFC
         3697 allows a source (the TEP in this case) to define any set
         of packets that it wishes as a single flow, occasionally
         labeling two user flows as a single flow through the tunnel is
         acceptable.

This should be fine. There is no problem with treating packets from 2
different flows the same way. The problem occurs if packets from
within one flow are treated differently.

   o  At intermediate router(s) that perform load distribution, the hash
      algorithm used to determine the outgoing component-link in an ECMP
      and/or LAG toward the next-hop MUST minimally include the 3-tuple
      {dest addr, source addr, flow label}.  This applies whether the
      traffic is tunneled traffic only, or a mixture of normal traffic
      and tunneled traffic.

Be more clear: should be 5 tuple, next best is 3. Defer to 3697bis.

Thomas