Re: [dhcwg] AD review of draft-ietf-dhc-dhcpv6-load-balancing

"Bernie Volz (volz)" <volz@cisco.com> Wed, 17 December 2014 22:18 UTC

Return-Path: <volz@cisco.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 01AE81A8736 for <dhcwg@ietfa.amsl.com>; Wed, 17 Dec 2014 14:18:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.511
X-Spam-Level:
X-Spam-Status: No, score=-14.511 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tmQpChO593sl for <dhcwg@ietfa.amsl.com>; Wed, 17 Dec 2014 14:18:47 -0800 (PST)
Received: from alln-iport-2.cisco.com (alln-iport-2.cisco.com [173.37.142.89]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8019B1A87C5 for <dhcwg@ietf.org>; Wed, 17 Dec 2014 14:18:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=6855; q=dns/txt; s=iport; t=1418854728; x=1420064328; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=ID/x48tCjyO9a2kp8Y5nmJmUy8WVEerVDWPP0B63U5g=; b=ldmdzPkBpFUR4CwFeECFUyXgw8U4manEcoji17m50VYjUs+P5JKPWxPG vKXK0qdE7qfVJa0tjkThX5uwKtHbKlmz8Pp6Yf4nbw8g5kK4PClUz+ARq UuhhblUz/MNV/Hm3LFyAAE/jrDzQVQ53ZU7dqY57s/KIMvo9MaZRD1xjb 0=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApkFADcAklStJV2P/2dsb2JhbABQCoMGUlgExXMKhXICgSMWAQEBAQF9hAwBAQEDAQEBATc0Aw0HBAIBCBEEAQELFAkHJwsUCQgCBAESCBOICQgN1RoBAQEBAQEBAQEBAQEBAQEBAQEBAQETBI8WKzgGgxCBEwWMMoFWiXuMfYM4IoIAHIFQb4FFfgEBAQ
X-IronPort-AV: E=Sophos;i="5.07,596,1413244800"; d="scan'208";a="106434984"
Received: from rcdn-core-7.cisco.com ([173.37.93.143]) by alln-iport-2.cisco.com with ESMTP; 17 Dec 2014 22:18:46 +0000
Received: from xhc-rcd-x05.cisco.com (xhc-rcd-x05.cisco.com [173.37.183.79]) by rcdn-core-7.cisco.com (8.14.5/8.14.5) with ESMTP id sBHMIjPq017183 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 17 Dec 2014 22:18:45 GMT
Received: from xmb-rcd-x04.cisco.com ([169.254.8.84]) by xhc-rcd-x05.cisco.com ([173.37.183.79]) with mapi id 14.03.0195.001; Wed, 17 Dec 2014 16:18:45 -0600
From: "Bernie Volz (volz)" <volz@cisco.com>
To: Ted Lemon <Ted.Lemon@nominum.com>, dhcwg <dhcwg@ietf.org>
Thread-Topic: [dhcwg] AD review of draft-ietf-dhc-dhcpv6-load-balancing
Thread-Index: AQHQGjSrui+8J4NAqkiN107VmBn0eJyUTGIQ
Date: Wed, 17 Dec 2014 22:18:45 +0000
Message-ID: <489D13FBFA9B3E41812EA89F188F018E1B7828B1@xmb-rcd-x04.cisco.com>
References: <0FE7102D-39A6-4245-A07A-B70C945FAE8F@nominum.com>
In-Reply-To: <0FE7102D-39A6-4245-A07A-B70C945FAE8F@nominum.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.131.36.209]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/dhcwg/cjMlL3-wkciiXnokLut9WW21S50
Subject: Re: [dhcwg] AD review of draft-ietf-dhc-dhcpv6-load-balancing
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dhcwg/>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Dec 2014 22:18:50 -0000

Ted:

While I'd "vote" to remove Section 3.2 ... (personally) I didn't like this concept. If I recall, Andre (perhaps others) wanted this to move the client's interaction to the "correct" server. But I always felt that having the client retransmit Renews and having the elements (relay, servers) have to process these far outweighed any benefit from it -- obviously this happens when that server is down, but then only one server is dropping the packets.

The server I work on does not do this for v4 (or v6) - for packets where the Server-ID is included (and mandated by RFC 3315), load balancing doesn't apply and the requested server answers - period -- this is as described in 3.1 (without the exception).

But I think your statement "Second, a REBIND would presumably be answered by both servers, so it wouldn't necessarily correct the problem." is not correct. See section 3.1 - a REBIND does not contain a Server ID option so the server that is supposed to respond to this client's hash bucket is the only that responds. So, both server would NOT answer this request. (This does assume that the servers know when each other is operational - hence why this load-balancing feature is tied to failover -- and it says as much "Since the servers were exchanging lease information,". When the failover communication is interrupted, both servers would indeed be likely to respond as load balancing would not be used.)

Also, at least in a failover situation, even if all clients were serviced by only one of the servers (say they all obtained their lease when only one of the servers was running and are happily renewing their leases), what's the harm? Failover doesn't mean you have twice the throughput - since you must have proper server capacity to handle all of the clients from a single server (the partner server may be down for an extended period). And, eventually clients will SOLICIT and then move to the proper server.

If section 3.2 is dropped, I also think we should fix 3.1 to change the SHOULD into a MUST, and drop ", with the exception of RENEWs." Perhaps the MUST should be used anyway?



I'd just like to also confirm:

>The main problem is that it's woefully underspecified

Is there more than what followed (Section 2 & 3.2) that needs to be "specified"?

And:

> Section 2 of this document suggests that load balancing is done for _all_ client-sourced messages

   2.  Background and External Requirements

      The requirements for DHCPv6 are substantially the same as for DHCPv4,
      replacing DHCPDISCOVER with SOLICIT, DHCPREQUEST with REQUEST,
      CONFIRM, RENEW, or REBIND (as appropriate), etc.

Did you have any suggestion for how to deal with this? It was I believe intended to indicate that DHCPREQUEST in v4 could mean multiple things. Note that this is addressing the requirements, not the operation (though it is interesting that this section in 3074 doesn't mention DHCPREQUEST). Perhaps this can just be: "The requirements for DHCPv6 are substantially the same as for DHCPv4."?



Note that when 3074 was done, we were thinking there were two models for load balancing:
- One is that you had a server (or pair of failover partners) that answered DHCP requests for a subset of the client pool. In this model, there was no desire to have a server (or failover partners) handle clients from another hash bucket. Ignoring failover, suppose you had 4 servers - each would take 1/4 the clients. You could replace those 4 servers with 4 failover -pairs (each failover pair handling its 1/4 of the clients -- note that you could do load balancing between the 1/4 clients (each servicing 1/8) when using failover). 
- The other is that you had one failover pair servicing your clients and you wanted to load balancing between these two failover servers (when they were both running and in communication).

I think the first model is probably of much less interest these days (though it may be interesting for a scalable cloud solution where the hash bucket is managed dynamically to distribute the load to various "request" processors.)



Thanks for your review!

- Bernie

-----Original Message-----
From: dhcwg [mailto:dhcwg-bounces@ietf.org] On Behalf Of Ted Lemon
Sent: Wednesday, December 17, 2014 3:03 PM
To: dhcwg
Subject: [dhcwg] AD review of draft-ietf-dhc-dhcpv6-load-balancing

This document is not ready for publication as a proposed standard.   The main problem is that it's woefully underspecified, because RFC 3074 was woefully underspecified, and so it gets some key points wrong.

In section 3.2, the text appears to be saying that in a DHCPv6 failover setup, a RENEW with a server identifier should be ignored if the load balancing algorithm doesn't identify the client as belonging to that server.   This is problematic, for two reasons.   First, it leads to the client attempting to renew multiple times, resulting in an increased load on the server.   Second, a REBIND would presumably be answered by both servers, so it wouldn't necessarily correct the problem.

But this leads to the reason I think this is not ready for publication: RFC 3074 never actually specifies under what circumstances load balancing is done: it leaves it up to the implementation.   It kind of implies that load balancing should be done on all requests.   But in fact it only ever mentions DHCPDISCOVER.   And indeed, I just looked at the ISC implementation, and it only does load balancing on DHCPDISCOVER.

Section 2 of this document suggests that load balancing is done for _all_ client-sourced messages, but that would mean that a REBIND message wouldn't get a response from the non-balancing server, which in turn would mean that the client's lease would have to expire before it could rebind to the correct server.

There are a couple of ways this could be addressed.   One way would be to just take out section 3.2.  Another would be to say in section 3.2 that if the server identifier doesn't match the recipient's server identifier, but the load balancing algorithm says that the server should answer for this client, and the server can answer, it should answer, and leave the rest of the text the same.   Of course, there's no way to be sure the _client_ will do the right thing with such a response.

Do we have any data about operational problems in this scenario with DHCPv4?

Anyway, this needs to be resolved before I can last-call the document.   Sorry for not catching this in previous reviews.

_______________________________________________
dhcwg mailing list
dhcwg@ietf.org
https://www.ietf.org/mailman/listinfo/dhcwg