Re: [Fwd: I-D Action:draft-ietf-6man-flow-ecmp-01.txt]

Brian E Carpenter <brian.e.carpenter@gmail.com> Mon, 02 May 2011 21:39 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EB1B0E06BD for <ipv6@ietfa.amsl.com>; Mon, 2 May 2011 14:39:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.471
X-Spam-Level:
X-Spam-Status: No, score=-103.471 tagged_above=-999 required=5 tests=[AWL=0.128, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NVYcBRDtwaFw for <ipv6@ietfa.amsl.com>; Mon, 2 May 2011 14:39:47 -0700 (PDT)
Received: from mail-pw0-f44.google.com (mail-pw0-f44.google.com [209.85.160.44]) by ietfa.amsl.com (Postfix) with ESMTP id 261C7E0674 for <ipv6@ietf.org>; Mon, 2 May 2011 14:39:47 -0700 (PDT)
Received: by pwi5 with SMTP id 5so3586874pwi.31 for <ipv6@ietf.org>; Mon, 02 May 2011 14:39:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:organization:user-agent :mime-version:to:cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=0zjTh+4PVt2cOd1hwcOhRpaXTOtEM12YH4yednJbFzc=; b=vybteH3ckBeqEFPTgNNKJCwwxB+Zav915R86cfl+WamokS4T7M4B/ky1dY39V2d+La ny2h+lhr7w0L2CI+1uRQXOF3Udf1RwMbJ+EHa6Teu3pFmSYrRYdxUhEPXTg+uyohQcm4 W+WeNi7tDVUwWK+kyDsqz4OXi1v22Y4Cq2Tn8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:organization:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; b=yA9ed2MEHEB0CkU+djnITR7Z6MsUaeSwoLJgW+nEQB9JrXXaZF2bS7R5FeHkO8ZWwP oGZ/wfLCnYwEtdYJgUxlcF+9K4GOBTlpcyaMTkQWQjPFrMvK3rgLMkRSFf9WO2TELAnC VXzRcOht/aGVOUb/iNduY3DGeTD1UjK5eFfr8=
Received: by 10.142.142.4 with SMTP id p4mr3549770wfd.43.1304372386943; Mon, 02 May 2011 14:39:46 -0700 (PDT)
Received: from [130.216.38.124] (stf-brian.sfac.auckland.ac.nz [130.216.38.124]) by mx.google.com with ESMTPS id k7sm7545552wfa.14.2011.05.02.14.39.44 (version=SSLv3 cipher=OTHER); Mon, 02 May 2011 14:39:46 -0700 (PDT)
Message-ID: <4DBF249E.4020200@gmail.com>
Date: Tue, 03 May 2011 09:39:42 +1200
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Organization: University of Auckland
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Thomas Narten <narten@us.ibm.com>
Subject: Re: [Fwd: I-D Action:draft-ietf-6man-flow-ecmp-01.txt]
References: <4D52FC43.90203@gmail.com> <201104051745.p35Hjacn018215@cichlid.raleigh.ibm.com>
In-Reply-To: <201104051745.p35Hjacn018215@cichlid.raleigh.ibm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Cc: 6man <ipv6@ietf.org>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 21:39:48 -0000

On 2011-04-06 05:45, Thomas Narten wrote:
> Looking at the revised document, here are some additional comments.
> 
>    One lightweight approach to ECMP or LAG is this: if there are N
>    equally good paths to choose from, then form a modulo(N) hash
>    [RFC2991] from a consistent set of fields in each packet header
>    that are certain to have the same values throughout the duration of
>    a flow, and use the resulting output hash value to select a
>    particular
> 
> would be nice to have a term better than "consistent". The point is,
> you want to use fields that stay constant for a given flow.

s/consistent/defined/

> 
>    distribution, due to the pseudo-random nature of ephemeral ports.
>    Ephemeral port numbers are quite well distributed [Lee10] and will
> 
> is "pseudo-random" right here? IN fact, do we even need that last part
> of the sentence?

s/pseudo-random/variable/

> 
>    o The flow label in the outer packet SHOULD be set by the sending
>       TEP to a pseudo-random 20-bit value in accordance with [RFC3697]
>       or its replacement.  The same flow label value MUST be used for
> 
> Don't like this psuedo-random requirement here. And, the TEP should be
> setting the Flow Label in *exactly* the same way as 3697bis
> recommends. Tunnels are no different...

That's true in respect of 3697bis; 3697 was non-specific on this. Fixed by updating the
reference; now that the three drafts are bunched together, this makes sense anyway.

> 
>       * Note that this rule is a recommendation, to permit individual
>          implementers to take an alternative approach if they wish to
>          do so.  For example, a simpler solution than a pseudo-random
>          value might be adopted if it was known that the load balancer
>          would
> 
> 	 Carpenter & Amante Expires August 14, 2011 [Page 6]
> 	 
> 	 Internet-Draft Flow Label for tunnel ECMP/LAG February 2011
> 
> 
>          continue to provide uniform distribution of flows with it.
>          Such an alternative MUST conform to [RFC3697] or its
>          replacement.
> 
> 
> This is too wishy washy. It also suggests that the TEP setting the
> Flow Label knows about the algorithm used by the load balancer. That
> will rarely (never?) be the case and this document shouldn't suggest
> this. 

Agree, the "For example..." sentence doesn't really add any value.

> 
>       the relevant flow label into the outer IPv6 header.  A user flow
>       could be identified by the ingress TEP most simply by its
>       {destination, source} address 2-tuple (coarse) or by its 5-tuple
>       {dest addr, source addr, protocol, dest port, source port} (fine).
>       At present, ironically, there would be little advantage for IPv6
>       packets in using the {dest addr, source addr, flow label} 3-tuple.
> 
> Ambiguous. Advantages compared to what?

s/advantage for IPv6 packets/point/

> 
> Also, the Flow classification should simply follow the recommendation
> in 3697bis, which says use  the 5 tuple, or, at a minimum, the 3
> tuple. The 
> 
>       The choice of n-tuple is an implementation detail in the sending
>       TEP.
> 
> No it's not. What may be a detail is the actual algorithm used. But
> which fields to use should be a clear recommendation (e.g., taken from
> 3697bis).

We maintain that whether it uses just the addresses (2-tuple) or up to
the whole 5-tuple is not something we can recommend. There could be a major
efficiency impact, depending on the design of the router acting as TEP (and
we might well be talking about 10 Gbit lines speeds or more).

s/detail/choice/

However, re-reading the text made us realise that it doesn't flow quite logically
and that now we have clarified 3697bis somewhat, you are correct that we can
depend on it more. So the text has been shortened and reorganised, as well as
making the above changes.

> 
>       *  This stateless method creates a small probability of two
>          different user flows hashing to the same flow label.  Since RFC
>          3697 allows a source (the TEP in this case) to define any set
>          of packets that it wishes as a single flow, occasionally
>          labeling two user flows as a single flow through the tunnel is
>          acceptable.
> 
> This should be fine. There is no problem with treating packets from 2
> different flows the same way. The problem occurs if packets from
> within one flow are treated differently.

We agree.

> 
>    o  At intermediate router(s) that perform load distribution, the hash
>       algorithm used to determine the outgoing component-link in an ECMP
>       and/or LAG toward the next-hop MUST minimally include the 3-tuple
>       {dest addr, source addr, flow label}.  This applies whether the
>       traffic is tunneled traffic only, or a mixture of normal traffic
>       and tunneled traffic.
> 
> Be more clear: should be 5 tuple, next best is 3. Defer to 3697bis.

No, this is a different hash from the one in 3697bis - this is the actual
load balancing hash, which is not mentioned there. And again: it's an
implementation choice. Some vendors may actively prefer to limit it
to the 3-tuple and forget all about transport headers.

Added a MAY for components of the 5-tuple (which is slightly redundant
with the following sub-bullet).

  Brian + Shane