Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud-ecmp-problem-00.txt

joel jaeggli <joelja@bogus.com> Wed, 04 June 2014 04:49 UTC

Return-Path: <joelja@bogus.com>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B2C541A0075 for <v6ops@ietfa.amsl.com>; Tue, 3 Jun 2014 21:49:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.551
X-Spam-Level:
X-Spam-Status: No, score=-2.551 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OOBc-mPwBXFj for <v6ops@ietfa.amsl.com>; Tue, 3 Jun 2014 21:49:08 -0700 (PDT)
Received: from nagasaki.bogus.com (nagasaki.bogus.com [IPv6:2001:418:1::81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0B18B1A0072 for <v6ops@ietf.org>; Tue, 3 Jun 2014 21:49:07 -0700 (PDT)
Received: from mbp.local (c-67-188-0-113.hsd1.ca.comcast.net [67.188.0.113]) (authenticated bits=0) by nagasaki.bogus.com (8.14.7/8.14.7) with ESMTP id s544mrus020003 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 4 Jun 2014 04:48:57 GMT (envelope-from joelja@bogus.com)
Message-ID: <538EA522.4060507@bogus.com>
Date: Tue, 03 Jun 2014 21:48:34 -0700
From: joel jaeggli <joelja@bogus.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Thunderbird/30.0
MIME-Version: 1.0
To: Brian E Carpenter <brian.e.carpenter@gmail.com>, IPv6 Operations <v6ops@ietf.org>, draft-jaeggli-v6ops-pmtud-ecmp-problem@tools.ietf.org
References: <20140602072659.7433.89475.idtracker@ietfa.amsl.com> <538E73EE.8050409@gmail.com>
In-Reply-To: <538E73EE.8050409@gmail.com>
X-Enigmail-Version: 1.6
Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="HwkPaMhCHSB8umCej6Lj5Pl1xXf2ptxlT"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (nagasaki.bogus.com [147.28.0.81]); Wed, 04 Jun 2014 04:48:58 +0000 (UTC)
Archived-At: http://mailarchive.ietf.org/arch/msg/v6ops/__muolVJmn9hgijSiglfZSytulg
Subject: Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud-ecmp-problem-00.txt
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jun 2014 04:49:09 -0000

On 6/3/14, 6:18 PM, Brian E Carpenter wrote:
> Hi,
> 
>>    A problem common to the approach of distribution through hashing is
>>    its impact on path MTU discovery.  An ICMPv6 type 2 PTB message
>>    generated on the path between a client and an ECMP load balanced
>>    server will have the anycast address as the destination and will be
>>    statelessly load balanced to one of the anycast servers. 
> 
> This may seem picky, but I think the reader's brain will run
> more smoothly if it's explicit that it's a PTB triggered by
> a packet *from* the server and therefore directed *to* the server.
> "on the path" doesn't quite contain that information.

I think that's a reasonable observation and fairly straight forward to
adjust. To my mind the destination address being the anycast address
does communicate the direction of this ptb message.

>>             Because of this, the results of
>>    the ICMPv6 ECMP hash do not match that of the corresponding TCP or
>>    UDP ECMP hash.
> 
> Again picky, but if there are (say) 2 paths, there's a 50% chance
> that it gets the right path, etc. So "might not match" is probably
> better.

If there are 64 hash buckets the probability of ending up in the same
one is 1.5% if there are 255 it's ~.4%. so may not is potentially a near
certainty. It's entirely possible to have more hash buckets than
next-hops in order to facilitate load balancing without rehashing when
adding and removing devices in which case the distribution may vary, and
on those cases a packet hashed to the wrong bucket may well arrive on
the correct server/load-balancer.

> General comment: I'm not sure what is specific to ECMP
> about this problem. We identified it as a general (but out of
> scope) problem in RFC 7098. We did include this comment there:

So it may be out of scope for your rfc, I'm pretty sure that doesn 't
make the problem go away. I've been dinking around with large scale
load-balancing of this flavor for some time, more than a decade in on
form or another. We found it to be a commercial necessity to address the
problem of how to make PTB work as part of deploying consumer facing
internet applications services. it's not unique to ICMP, e.g. my
interest in the fragdrop discussion is due to similar problems.

>  o  Note that correct handling of ICMPv6 for Path MTU Discovery
>     requires the layer 3/4 balancer to keep state for the client
>     source address, independently of either the port numbers or the
>     flow label.

So there are two problems with this assertion.

1. state sharing is presently impossible at that scale I'm operating at.
so any solution that presumes it across an entire population of devices
isn't going to work. state sharing between pairs of devices in otherwise
stateless clusters (which describes the internal architecture of a
number of high-end firewalls and load-balancer products doesn't solve
this either).

2. the ptb packet doesn't have the parts of the flow associated within
the ip and icmp header so the flow that is was associated with cannot be
reconstructed by a pure l3/l4 device stateful or otherwise, it can as
noted in the draft be derived by inspecting the icmp payload for the
IP/TCP/UDP header of outgoing packet which is pretty deep packet
inspection (normally done by the end system). Finding the ICMP header is
not the same as being able to parse the payload. In any event we were
using rather high-end but but otherwise normal router silicon, so we get
to do this the same as anyone else who fronts a load-balancer or server
tier containing ecmp nexhops with a router...

> because, indeed, the PMTU is a function of the address pair only.

The ptb packet has the source address of the device that emitted it.

In any event, our goal isn't really to boil the ocean, with respect to
parsing the packet or deriving correct host, if we could get close
enough, it was to not break PMTUD, and therefore customers in a service
that supported millions of  users.

>     Brian
> 
> _______________________________________________
> v6ops mailing list
> v6ops@ietf.org
> https://www.ietf.org/mailman/listinfo/v6ops
>