Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud-ecmp-problem-00.txt
joel jaeggli <joelja@bogus.com> Wed, 04 June 2014 04:49 UTC
Return-Path: <joelja@bogus.com>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B2C541A0075 for <v6ops@ietfa.amsl.com>; Tue, 3 Jun 2014 21:49:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.551
X-Spam-Level:
X-Spam-Status: No, score=-2.551 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.651] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OOBc-mPwBXFj for <v6ops@ietfa.amsl.com>; Tue, 3 Jun 2014 21:49:08 -0700 (PDT)
Received: from nagasaki.bogus.com (nagasaki.bogus.com [IPv6:2001:418:1::81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0B18B1A0072 for <v6ops@ietf.org>; Tue, 3 Jun 2014 21:49:07 -0700 (PDT)
Received: from mbp.local (c-67-188-0-113.hsd1.ca.comcast.net [67.188.0.113]) (authenticated bits=0) by nagasaki.bogus.com (8.14.7/8.14.7) with ESMTP id s544mrus020003 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 4 Jun 2014 04:48:57 GMT (envelope-from joelja@bogus.com)
Message-ID: <538EA522.4060507@bogus.com>
Date: Tue, 03 Jun 2014 21:48:34 -0700
From: joel jaeggli <joelja@bogus.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Thunderbird/30.0
MIME-Version: 1.0
To: Brian E Carpenter <brian.e.carpenter@gmail.com>, IPv6 Operations <v6ops@ietf.org>, draft-jaeggli-v6ops-pmtud-ecmp-problem@tools.ietf.org
References: <20140602072659.7433.89475.idtracker@ietfa.amsl.com> <538E73EE.8050409@gmail.com>
In-Reply-To: <538E73EE.8050409@gmail.com>
X-Enigmail-Version: 1.6
Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="HwkPaMhCHSB8umCej6Lj5Pl1xXf2ptxlT"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (nagasaki.bogus.com [147.28.0.81]); Wed, 04 Jun 2014 04:48:58 +0000 (UTC)
Archived-At: http://mailarchive.ietf.org/arch/msg/v6ops/__muolVJmn9hgijSiglfZSytulg
Subject: Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud-ecmp-problem-00.txt
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Jun 2014 04:49:09 -0000
On 6/3/14, 6:18 PM, Brian E Carpenter wrote: > Hi, > >> A problem common to the approach of distribution through hashing is >> its impact on path MTU discovery. An ICMPv6 type 2 PTB message >> generated on the path between a client and an ECMP load balanced >> server will have the anycast address as the destination and will be >> statelessly load balanced to one of the anycast servers. > > This may seem picky, but I think the reader's brain will run > more smoothly if it's explicit that it's a PTB triggered by > a packet *from* the server and therefore directed *to* the server. > "on the path" doesn't quite contain that information. I think that's a reasonable observation and fairly straight forward to adjust. To my mind the destination address being the anycast address does communicate the direction of this ptb message. >> Because of this, the results of >> the ICMPv6 ECMP hash do not match that of the corresponding TCP or >> UDP ECMP hash. > > Again picky, but if there are (say) 2 paths, there's a 50% chance > that it gets the right path, etc. So "might not match" is probably > better. If there are 64 hash buckets the probability of ending up in the same one is 1.5% if there are 255 it's ~.4%. so may not is potentially a near certainty. It's entirely possible to have more hash buckets than next-hops in order to facilitate load balancing without rehashing when adding and removing devices in which case the distribution may vary, and on those cases a packet hashed to the wrong bucket may well arrive on the correct server/load-balancer. > General comment: I'm not sure what is specific to ECMP > about this problem. We identified it as a general (but out of > scope) problem in RFC 7098. We did include this comment there: So it may be out of scope for your rfc, I'm pretty sure that doesn 't make the problem go away. I've been dinking around with large scale load-balancing of this flavor for some time, more than a decade in on form or another. We found it to be a commercial necessity to address the problem of how to make PTB work as part of deploying consumer facing internet applications services. it's not unique to ICMP, e.g. my interest in the fragdrop discussion is due to similar problems. > o Note that correct handling of ICMPv6 for Path MTU Discovery > requires the layer 3/4 balancer to keep state for the client > source address, independently of either the port numbers or the > flow label. So there are two problems with this assertion. 1. state sharing is presently impossible at that scale I'm operating at. so any solution that presumes it across an entire population of devices isn't going to work. state sharing between pairs of devices in otherwise stateless clusters (which describes the internal architecture of a number of high-end firewalls and load-balancer products doesn't solve this either). 2. the ptb packet doesn't have the parts of the flow associated within the ip and icmp header so the flow that is was associated with cannot be reconstructed by a pure l3/l4 device stateful or otherwise, it can as noted in the draft be derived by inspecting the icmp payload for the IP/TCP/UDP header of outgoing packet which is pretty deep packet inspection (normally done by the end system). Finding the ICMP header is not the same as being able to parse the payload. In any event we were using rather high-end but but otherwise normal router silicon, so we get to do this the same as anyone else who fronts a load-balancer or server tier containing ecmp nexhops with a router... > because, indeed, the PMTU is a function of the address pair only. The ptb packet has the source address of the device that emitted it. In any event, our goal isn't really to boil the ocean, with respect to parsing the packet or deriving correct host, if we could get close enough, it was to not break PMTUD, and therefore customers in a service that supported millions of users. > Brian > > _______________________________________________ > v6ops mailing list > v6ops@ietf.org > https://www.ietf.org/mailman/listinfo/v6ops >
- Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud… Brian E Carpenter
- Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud… joel jaeggli
- Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud… Brian E Carpenter
- Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud… Nick Hilliard
- Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud… Mark Andrews
- Re: [v6ops] I-D Action: draft-jaeggli-v6ops-pmtud… joel jaeggli