[dhcwg] Comments on draft-ietf-dhc-dhcpv6-load-balancing-00

Tomek Mrugalski <tomasz.mrugalski@gmail.com> Tue, 11 June 2013 19:28 UTC

Message-ID: <51B77A69.7010602@gmail.com>
Date: Tue, 11 Jun 2013 21:28:41 +0200
From: Tomek Mrugalski <tomasz.mrugalski@gmail.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: DHC WG <dhcwg@ietf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-load-balancing-00
Precedence: list

As part of my shepherd duties for this document, I did review it. Here
are my comments:

Abstract: Please remove duplicate "(DHCPv6") in 6th line.

Abstract should mention that it extends already defined and well proven
mechanism for DHCPv4 to cover DHCPv6.

"The same method is proposed to select the target server of a DHCPv6
relay." Not true! The text does not mention relays at all (just in the
abstract). See my comments below on relays.

Section 1: The load balancing is not really a protocol, as it doesn't
send or receive anything on its own, does not define packets or options.
It defines certain behaviors. I would recommend to replacing "this
protocol" with "this extension".

I would like to see a brief text in the Introduction that explains the
differences or relation between load balancing and failover. You may put
informative reference to draft-ietf-dhc-failover-requirements (currently
in IESG) or RFC6853 if you find that useful. It would be good to debunk
some common misconceptions (e.g. I have 2 servers => I have high
availability, no need for failover!).

It would be helpful to add a paragraph in section 1 that gives an very
short overview of the mechanims specified in RFC3074. Something along
the lines: "As a convenience for the user, we mention here that the
servers supporting load balancing calculate hash values for service
transaction id (or STID), which are client-specific values (client-id
for DHCPv6 and client-id or chaddr field for DHCPv4) for each incoming
packet. Calculated value is then segregated to specific bucket. Two or
more servers are configured to serve certain buckets."

Section 2 is not precise enough. Please explicitly say which packets are
load balanced and say that others are not. There are more packet types
in DHCPv6 than there are in DHCPv4. Couple tricky examples to consider:
LEASEQUERY (RFC5007), BOOTREQUESTV6
(draft-ietf-dhc-dhcpv4-over-dhcpv6-00), RELAY-FORW, RECONFIGURE-REQUEST etc.

The text should be clear that the packets received over TCP (see
RFC5007) are not load-balanced (or at that only the packets coming from
the client are load balanced). This is tricky, because some consider
requestor (see RFC5007) as special type of a client, so please word your
text carefully.

Section 3.1.1 says that the server should respond. I don't like the
"respond" word. RFC3315 has many cases when invalid packets should be
discarded. That would introduce a conflicting requirements. It is much
better to say "process" here.

Section 3.1.2: A question about "A DHCPv6 server receiving a REQUEST or
RENEW with the server's Server ID specified MAY answer the request even
if the request would normally be ignored by load balancing.". Does it
mean that the opposite is true - server MAY chose to not answer if the
load balancing says so, even when server-id matches? That's what rest of
the first paragraph suggests. If that is what you meant, then I strongly
oppose. Here's why:

The text mentions a case of 2 servers, with the first one going offline
and the second server taking over all clients. The text says that once
the first server recovers and the second server ignores RENEWs,
eventually the first server will be able to respond clients that start
REBINDing. This will cause the clients to constantly go through
REQUEST->failed RENEW attempts->REBIND cycles, even when both servers
are fine. Nothing really gained, but you just introduced REBINDs to
perfectly healthy network (with the extra processing needed for REBINDs
and possible warnings all around, as REBIND is usually a sign of troubles).

Section 3.2 mentions STID, but that acronym is not explained. Please at
least refer to RFC3074.

There are packets where client-id is optional: INFORMATION-REQUEST.
Although my understanding is that it is rare (I'm aware of only one
implementation - Dibbler - that does that and only if explicitly
configured to do so), such a case of "anonymous" INFORMATION-REQUEST is
perfectly valid. The text does not explain what to do in such a case. If
you don't have any other preference, I would recommend to assign such
packets to the first bucket.

Security considerations section is nice, but not sufficient. You also
mention that by misconfiguring HBA, administrators can fail to service
certain group of users and finding a pattern in such a group would be
difficult. As a practical aspect, debugging off-by-1 issues would be
tricky e.g. first server handles 1-127 and the second handles 128-255
(who is serving bucket 0?). Some warning about it - that the sum of all
Hash Bucket Assignments of all servers must cover the whole range -
would be useful.

Another concern that security considerations is missing is
misconfiguration between relays and servers. If you do load balancing on
relays, then you should not do another one on servers. And if you do,
you just increase you chances to make configuration error, because load
balancing configuration on relays must match load balancing
configuration on servers. It would be good to have some trade-offs
discussed for LB done on servers vs on relays.

And speaking of relays, RFC3074 mentions a mechanism for DHCPv4 relays
to do load balancing and sending the packets to specific servers only.
This draft mentions relay in the abstract only, but not in the text. As
a potential implementor, I'm confused - how should DHCPv6 relays do LB?
They work radically different than in DHCPv4. A separate chapter
explaining that would be very useful.

Finally, I'm not sure if updates 3074 is needed here. Nothing in 3074 is
changed - the DHCPv4 load balancing works exactly the same as before. I
would argue that your draft updates 3315, as it changes some of the
behaviors defined there. 3315 says that server responds in certain
cases, but your drafts says that the server does not. Changing existing
behavior constitutes an update in my opinion.

There are some idnits issues. Please run your draft through idnits tool
(http://tools.ietf.org/tools/idnits/) and make sure it is clean. IESG
frowns upon drafts that have idnits issues.

Please update the draft and upload -01. Depending on the scope of
changes, we'll see what the next steps will be. (write-up and send out
to IESG or quick WGLC to confirm changes if they are large).

Hope that helps,
Tomek

[dhcwg] Comments on draft-ietf-dhc-dhcpv6-load-ba… Tomek Mrugalski
Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-loa… Andre Kostur
Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-loa… Bernie Volz (volz)
Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-loa… Tomek Mrugalski