Re: [dhcwg] Load Balancing for DHCPv6

"Bernie Volz (volz)" <volz@cisco.com> Fri, 21 September 2012 14:37 UTC

Return-Path: <volz@cisco.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4396621F86FD for <dhcwg@ietfa.amsl.com>; Fri, 21 Sep 2012 07:37:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.374
X-Spam-Level:
X-Spam-Status: No, score=-10.374 tagged_above=-999 required=5 tests=[AWL=0.225, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 13FeEcLRAPpG for <dhcwg@ietfa.amsl.com>; Fri, 21 Sep 2012 07:36:58 -0700 (PDT)
Received: from rcdn-iport-4.cisco.com (rcdn-iport-4.cisco.com [173.37.86.75]) by ietfa.amsl.com (Postfix) with ESMTP id 1606D21F845D for <dhcwg@ietf.org>; Fri, 21 Sep 2012 07:36:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=6193; q=dns/txt; s=iport; t=1348238218; x=1349447818; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=0dUcIiDp8YTTE7EGNURZn6fasVX6K54R9GQd7eTdRtU=; b=eEu1CXKzXfYNlXpJzOfgOE7RUzKWvJDVa5vDAWFJ9bXPkPEtWaOG5UdL qOX/DbgD2okQivk5rXLwHSlntqmrHEl+qWb94HDCtEfFfa2hKnw7KYGWl IrRdVBg+I61yvlETMk99tBCDFHqJpQAMXYuYogfv4P9xpfAv4ORpLrnj3 M=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgAFAKN6XFCtJV2c/2dsb2JhbABFvhGBCIIgAQEBAwEBAQELBAEnKwkDCAUHBAIBCBEEAQEBChQJByEGCxQJCAIEAQ0FCBqHUQMJBguZKpY1DYlPBIo6YhQBhTFgA5QPjG2DIYFpgmeBWgE8
X-IronPort-AV: E=Sophos;i="4.80,463,1344211200"; d="scan'208";a="124031491"
Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rcdn-iport-4.cisco.com with ESMTP; 21 Sep 2012 14:36:57 +0000
Received: from xhc-aln-x04.cisco.com (xhc-aln-x04.cisco.com [173.36.12.78]) by rcdn-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id q8LEavnB023000 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Fri, 21 Sep 2012 14:36:57 GMT
Received: from xmb-rcd-x04.cisco.com ([169.254.8.159]) by xhc-aln-x04.cisco.com ([173.36.12.78]) with mapi id 14.02.0298.004; Fri, 21 Sep 2012 09:36:57 -0500
From: "Bernie Volz (volz)" <volz@cisco.com>
To: Bud Millwood <budmillwood@gmail.com>, Andre Kostur <akostur@incognito.com>
Thread-Topic: [dhcwg] Load Balancing for DHCPv6
Thread-Index: AQHNl/2VaJ+khP2NkUiXQs2wUqwOeJeU1mtw
Date: Fri, 21 Sep 2012 14:36:56 +0000
Message-ID: <489D13FBFA9B3E41812EA89F188F018E0F50321D@xmb-rcd-x04.cisco.com>
References: <CAL10_BqbUrhzYJMSLBGsFDR_kFth2SbdC9AOHyOfyKdhNyzNkw@mail.gmail.com> <CC7E13F1.2549%volz@cisco.com> <CAL10_BpPKCocKM1rzcxwv8LXQHxXUw9rOEfycRWyiRMLTsGsaw@mail.gmail.com> <CAOpJ=k2wKfnVswbjT2Rf43Dnk=xBqWkbr=Az5--8tE=ca3XxHA@mail.gmail.com>
In-Reply-To: <CAOpJ=k2wKfnVswbjT2Rf43Dnk=xBqWkbr=Az5--8tE=ca3XxHA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.86.250.189]
x-tm-as-product-ver: SMEX-10.2.0.1135-7.000.1014-19200.001
x-tm-as-result: No--59.264900-8.000000-31
x-tm-as-user-approved-sender: No
x-tm-as-user-blocked-sender: No
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dhcwg@ietf.org" <dhcwg@ietf.org>, Ted Lemon <Ted.Lemon@nominum.com>
Subject: Re: [dhcwg] Load Balancing for DHCPv6
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dhcwg>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 14:37:19 -0000

Bud:

> the server could allocate a lease on the spot in response to the Request (i.e., without having seen a Solicit)?

Correct.

As this address (or prefix) is likely in other-available, the server would not be able to assign it. But with DHCPv6, it can provide an alternative -- it is the Reply that matters (this is one difference over DHCPv4 where the server would be forced to NAK - or just drop the packet).

---

However, Andre's proposal would be that if the server-duid matches but load balancing says that this is not the server that should be responding, to drop the request. This would thus force the client to return to Solicit (when the Requesting times out) or to advance to Rebind (when renewing between T1 and T2). This would work to move the client to the "correct" server but it does increase the traffic and load for those packets that are dropped.

This would generally only impact Request and Renew packets. I would think that the server should always process Release and Decline since those are limited in their retransmissions (and the client fallback is to stop sending).

So, I think when both failover partners are responsive the load balancing processing would be:
- If server-duid not represent in request (Solicit, Rebind, Confirm, Information-Request), use load balancing to determine if server responds.
- If server-duid present and doesn't match this server, drop packet (standard RFC 3315).
- (server-duid present and matches server), if Request or Renew still use load balancing to determine if server responds.
- Otherwise respond to client.

- Bernie

-----Original Message-----
From: Bud Millwood [mailto:budmillwood@gmail.com] 
Sent: Friday, September 21, 2012 9:33 AM
To: Andre Kostur
Cc: Bernie Volz (volz); dhcwg@ietf.org; Ted Lemon
Subject: Re: [dhcwg] Load Balancing for DHCPv6

Andre:

I think the only reasonable way to move clients back to server A is what you suggest. You are essentially doing the same thing as the failed server A did when it failed - going offline - but you're just doing it administratively, and just for a subset of clients. It causes a degraded performance mode on your network, but you have to balance that against the desire to split the load after server A has come back online.

It's not a failover problem so much as it's a load balancing problem.
For example, suppose you wanted to change the load balancing split to
70/30 instead of 50/50 on a running network - you'd use the same mechanism.

Bernie:

> However, it does complicate Request packets because failover doesn't 
> usually exchange tentative bindings from a Solicit and so you really 
> would only want the server that Solicited to respond to the Request 
> (this could be done by looking to see if you had a tentative binding - 
> but if neither server does, the client has to suffer a Request timeout 
> to get back to the Solicit phase).

Is this a problem because under normal circumstances, the server could allocate a lease on the spot in response to the Request (i.e., without having seen a Solicit)? It seems like a reasonable tradeoff - to disallow a failover partner from short-circuiting the full lease transaction.

- Bud

On Tue, Sep 18, 2012 at 6:42 PM, Andre Kostur <akostur@incognito.com> wrote:
> On Tue, Sep 18, 2012 at 9:24 AM, Bernie Volz (volz) <volz@cisco.com> wrote:
>> First, I think this would be a rather bad change to DHCPv6 operation.
>> Client may be checking the server-identifier option they get back and 
>> so you would have to lie about that in the response which just seems 
>> like a very bad thing to do. (I don't believe RFC 3315 actually ever 
>> states the clients should be checking the server-identifier, but I 
>> could be wrong -- section 15 does say they check for the presence of 
>> the option, but do not check the contents). One could also think of 
>> situations where a relay agent might forward the packet to explicit 
>> destinations based on the server identifier (though again, I doubt 
>> anyone does this) -- much like switches limit traffic based on mac-addresses.
>
> Note that I didn't suggest that server A preemptively answer the Renew 
> coming in from the client (or impersonate server B).  Under my 
> suggestion, the Renew would arrive at both servers A and B.  B would 
> ignore it due to load balancing, and A would also ignore it due to the 
> server ID.  The client continues to retry the Renews for the 
> appropriate time, until it is time to Rebind.  At that point server B 
> is still ignoring the client, but now A can answer the client as the 
> Rebind isn't carrying a Server ID.  (And if we are talking about a 
> Failover pair, then server A would already have a lease record for 
> this client, and can extend that same lease.)
>
> For the relay piece, it would still need to forward the Rebind to both servers.
>
> [snip]
>
>> Anyway, I think this needs to be thought through much more carefully 
>> as to the potential consequences and I think the best is to keep it 
>> simple - once a client has 'bound' to a server, it stays with that 
>> server until it issues a request message which does not contain a server-identifier option.
>
> My scenario doesn't result in a client switching servers except under 
> already defined RFC 3315 behaviour when the client has transitioned 
> into Rebind.
>
> One piece that may have funny consequences is if the pair of DHCP 
> servers aren't in some sort of cooperation mode, and this scenario 
> would result in the client possibly changing IP addresses.  Client has 
> to rebind, and server A does the DHCPv6 equivalent of a NAK and forces
> the client to come back to get a new IP.    Perhaps a server MAY skip
> the load balancing if the packet contains a Server ID (or even 
> limiting it to the server's Server ID)?
>
> --
> Andre Kostur
> _______________________________________________
> dhcwg mailing list
> dhcwg@ietf.org
> https://www.ietf.org/mailman/listinfo/dhcwg