Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-failover-design-02

Kim Kinnear <kkinnear@cisco.com> Thu, 11 July 2013 20:21 UTC

Return-Path: <kkinnear@cisco.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4997721F99A9 for <dhcwg@ietfa.amsl.com>; Thu, 11 Jul 2013 13:21:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dTncv+ZsEWef for <dhcwg@ietfa.amsl.com>; Thu, 11 Jul 2013 13:20:54 -0700 (PDT)
Received: from rcdn-iport-5.cisco.com (rcdn-iport-5.cisco.com [173.37.86.76]) by ietfa.amsl.com (Postfix) with ESMTP id 3545C21F994C for <dhcwg@ietf.org>; Thu, 11 Jul 2013 13:20:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=16715; q=dns/txt; s=iport; t=1373574053; x=1374783653; h=subject:mime-version:from:in-reply-to:date:cc: content-transfer-encoding:message-id:references:to; bh=KUxtpIAR3aHHMJ1xgccGDprkUdyfGEnsssyMhusgjJU=; b=GZ1eOaOETRuk9FXAlhRjG6wLBJh4X+qIUL31f1DJycYBvCZe699/CJer 3BeFrBTzEhOX111qEERHDO/6YjWpScLieodJSOnTtlHQXIrlI7lxMpg8B me9fWZBzDUDZ3fDSTcvoUpw4Xxawfw1ztJLn7Jw+5U7chumUl0yGbYwHb U=;
X-IronPort-AV: E=Sophos;i="4.89,647,1367971200"; d="scan'208";a="233772347"
Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rcdn-iport-5.cisco.com with ESMTP; 11 Jul 2013 20:20:53 +0000
Received: from printer-xerox-test-5735.cisco.com (printer-xerox-test-5735.cisco.com [161.44.65.115]) by rcdn-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id r6BKKpIt029555; Thu, 11 Jul 2013 20:20:52 GMT
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset="us-ascii"
From: Kim Kinnear <kkinnear@cisco.com>
In-Reply-To: <1363117384.2123.64.camel@marcin-lenovo>
Date: Thu, 11 Jul 2013 16:20:54 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <56D269DB-423A-49C3-83E2-C58FE5FB8DA1@cisco.com>
References: <1363117384.2123.64.camel@marcin-lenovo>
To: Marcin Siodelski <msiodelski@gmail.com>
X-Mailer: Apple Mail (2.1278)
Cc: "dhcwg@ietf.org" <dhcwg@ietf.org>, Kim Kinnear <kkinnear@cisco.com>
Subject: Re: [dhcwg] Comments on draft-ietf-dhc-dhcpv6-failover-design-02
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dhcwg>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jul 2013 20:21:00 -0000

Marcin,

Thanks very much for your comments.  I apologize that it has taken
us so long to get back to you regarding them.

My responses to your comments are indented, inline, below.

On Mar 12, 2013, at 3:43 PM, Marcin Siodelski wrote:

> Hi, 
> 
> I have read the version -02 of the draft. This is a useful work and I
> support it move forward.
> 
> Some first round of comments...
> 
> 2. Glossary:
> - The "Failover transmission" is defined in the glossary but it is used
> nowhere in the draft. Perhaps you meant "Failover communication" which is
> common in the text.

	Sure, good idea.
> 
> 3. Introduction
> - The document says that the failover protocol is not suitable for lease
> times shorter than 30sec. Then, is it suitable for lease times of 31s?
> Why is that arbitrary number selected? I believe that the document
> should simply mention that there is an additional overhead on the server 
> related to failover transmission (that would justify use of this term in 
> the glossary) and thus it is not efficient for short lease times.

	While the actual time selected is arbitrary, the need for a
	a minimum time is not based so much on efficiency as it is
	on the need to handle time skew (Section 8.1) effectively.
	We chose 30 seconds as a value which was long enough to 
	allow a reasonable implementation to handle time skew, but
	not so long that anyone would really care about using
	leases that short.
> 
> 3.1. Perhaps, it is worth to replace the title from "Additional
> Requirements" with "Design Requirements" to indicate that it is not
> extending the protocol requirements described elsewhere.

	Sure.
> 
> 4. Protocol Overview
> - The hyperlink to "Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460]" is
> broken because clicking on "Section 5.1" takes you to the section 5.1 in
> the local document.

	Thanks, we'll see what we can do about this.
> 
> - Is it worth to mention that if the primary server can't establish
> communication with the secondary server (because secondary is down) it
> will attempt to connect with the implementation dependent interval?

	Sure.  Will reference Section 5.1 where this is described.
> 
> 4.1. Failover State Machine Overview
> - It is not clear what the state the partner goes to, when the
> DISCONNECT occurs. I presume it is PARTNER-DOWN state but the document
> should clarify it.

	For sure, since it is not PARTNER-DOWN, but rather 
	COMMUNICATIONS-INTERRUPTED.  This is because the re-integration
	from COMMUNICATIONS-INTERRUPTED is much easier (more efficient,
	quicker) than that from PARTNER-DOWN.
> 
> - Here is a sentence: "In case of a disagreement between the simplified
> and complete description, please follow Section 9". I think that there
> must be no "disagreement" between those two sections. The one provides a
> supplementary information for the other but they should remain
> consistent in the common part.
	
	I agree that there must not be a disagreement, and we will ensure
	that we don't see any.  However, since information is left out
	of the overview, we have to have a statement like this in
	the document (or take out the overview).  Describing the same
	thing in two different places without some way to resolve
	(apparent) differences is a bad idea.  I am sure that some
	readers will perceive a disagreement, even though I will
	not.
> 
> - I don't understand the purpose of: "It frequently returns back to the
> state it was in before shutdown". Perhaps, some context needs to be
> provided for it?

	Will do.  I think this pretty much says that you need to 
	remember the state you were in before you shut down.
> 
> - In the 4th paragraph "in unresponsive" should be replaced with "is
> unresponsive".

	Yes.

> However, I am wondering if it is correct given that
> extensions may introduce "active-active" mode in which this is not true
> anymore. Perhaps, it could be mentioned that "assuming that failover
> pair is in the active-passive mode, the secondary server is unresponsive
> when in NORMAL state". 

	We can't write this draft and take in account all of the
	extensions that will come (though that will certainly
	be one of them).  The extensions will specify the necessary
	changes.
> 
> - A bit of clarification would be needed on what the "auto-partner-down"
> is - glossary? Is this that the server bypasses the
> COMMUNICATION-INTERRUPTED state and goes straight to PARTNER-DOWN or it
> goes to COMMUNICATION-INTERRUPTED state then to PARTNER-DOWN? If the
> latter is the case, then would it state in the COMMUNICATION-INTERRUPTED
> state for a while, trying to reconnect?

	Good point.  We'll put that in the glossary.  The auto-partner-down
	capability is where the server goes into COMMUNICATIONS-INTERRUPTED
	state and, after waiting a while, automatically goes into
	PARTNER-DOWN state.  It *always* is trying to reconnect, regardless
	of the state it is in. 
> 
> - I don't understand what is meant by "When a server does not have an
> intact lease state database (e.g. due to first time run or catastrophic
> failure) ..."? First of all, is the lease state data base the database
> which holds leases or it holds some additional information about the
> server state prior to the failure, which can be used to determine that
> the catastrophic failure occured? Also, the "doesn't have an intact
> lease state database" formulation is misleading because I think that the
> server on its first time DOES have the intact database.

	The presumption is that the information about whether you
	have talked to the partner in the past or not is held in
	the same database as the lease information.  So that if you
	lose one of these, you will lose them both.

	Yes, the database is intact, but it is empty.  This is where	
	the simplified description here is ... simplified.  There are
	two different scenarios:

	a) The other server says "I've talked to you before at time x"
	and the receiving server has no record of having talked to
	the other server.  That is not an "intact" lease-state-database.

	b) The other server says "I've never talked to you before", and
	you agree, but you are a backup, so you go into RECOVER state.

	We will clarify this.
> 
> - The RECOVER-WAIT to RECOVER-DONE transition could be better described.
> Specifically, what condition it is triggered by?
	
	For sure.  See Section 9.6.2.  We felt that moving that
	information into this section (or even a stripped down
	version), would overwhelm this overview.
> 
> - Is there any way out from the RESOLUTION-INTERRUPTED? If the
> communication is reestablished I presume that both servers continue
> resolving conflicts?

	See Section 9.11.2.  Again, this is an overview.
> 
> 4.2. Messages
> - I suggest that the following description of the CONNECT message:
> "The partner is expected to confirm by responding with CONNECTACK" is
> reworded to "The partner MUST confirm by responding with CONNECT ACK".

	I was going to say sure, but then I remembered why we didn't
	make this a MUST originally.  See the comment below on 5.1,
	Creating Connections.
> 
> - Perhaps, the same applies to some other types of messages but I did not
> check that thoroughly enough to put this into this review.
> 
> - Minor: it would be easier to search for the specific message
> description if the messages were ordered alphabetically. 

	True.  Always an issue.  Alpha or by importance, or functional
	grouping, or what.  We chose some variant of importance.
> 
> - The POOLRESP is used everywhere throughout the document, while it is
> called "POOLRSP" in the list of messages.

	Thanks, we'll fix that.
> 
> 
> 5.1. Creating Connections
> - "If it has no secondary relationships with the connecting server, it
> SHOULD drop the connection". This sentence suggests that the secondary
> server will not respond to the initiating server if it has no
> relationship with it. A couple of paragraphs earlier, it was said the
> server "is expected" or MUST respond with CONNECTACK. I suggest that it
> is reworded to explain that the server MUST reject the connection by
> sending the CONNECTACK with the reason for rejection instead of
> "dropping".

	Geez.  Get me to make it a MUST, and then complain that 
	the document doesn't follow the MUST?  Nice try :-)

	Seriously, we don't want to require that a connection from
	another (rogue?) server require any more processing than
	just to drop the connection.  
> 
> 6.1. Proportional Allocation
> - The paragraph which starts with "The initial allocation when..." is
> confusing.

> If I understand correctly, the POOLRESP is devoted to deliver
> the actual resources to the secondary server. The document says that it
> is to inform a secondary server "how many resources it allocated". This
> doesn't sound to be the same.

	It isn't the same.  The POOLREQ is supposed to cause the
	primary server to examine the various prefixes and ensure that
	the secondary has what it is supposed to have.  The POOLRESP
	is the message that says to the secondary that the primary has
	done the examination.  The POOLRESP doesn't return anything
	other than the information that the scan has completed
	and how many leases/prefixes that will be forthcoming.
> 
> - I am wondering why at all, the POOLREQ/POOLRESP exchange has to be
> initiated by the secondary server. In the scenario where the failover
> configuration is handled through the primary server it is primary server
> who should notify the secondary that the (for example) pool balance has
> been changed.

	The primary doesn't have to tell the secondary that the balance
	has been changed.  It just sends the changes in BNDUPD packets.

> This implies the primary could initiate the exchange by
> notifying the secondary server that the rebalancing will start now.

	The primary doesn't have to notify that secondary that
	the rebalancing will (or has) started.

> On the other hand in the "Proportional Allocation" scenario the rebalancing
> may be started when the secondary has returned addresses to the primary
> which had been released by the client. However, in such case the primary
> should also be aware of the need to rebalance and could also start the
> rebalancing procedure. So again, I don't understand why the secondary is
> starting the rebalancing procedure here?

	Typically, the primary will do rebalancing on its own
	only for portions of the address space.  The POOLREQ
	allows the secondary to request a scan of the entire address
	space. 
> 
> - Another related question: is this really required to do an extra
> POOLREQ/POOLRESP exchange while the actual transfer of addresses is
> later done via BNDUPD/BNDACK? What extra information will
> be carried in the POOLREQ/POOLRESP messages that needs to be delivered
> prior to sending BNDUPD/BNDACK which actually delegate resources by
> modifying state FREE to FREE_BACKUP or the other way around?

	The information that the scan has completed is the real
	information that is transmitted by the POOLRESP.  The 
	POOLREQ transmits the desire for a full scan of the 
	address space.

	In many cases (hopefully in most), the primary will keep
	things balanced.  It will do this balancing in an implementation
	specific manner, and with an implementation specific frequency.

	However, the POOLREQ/POOLRESP messages are a way for the
	secondary to ensure that a full scan has been performed
	by the primary.
> 
> - "...and the secondary server is responsible for being configured in
> such a way". Well, the server is not responsible for anything.
> Administrator might be responsible to configure the server, the
> developer is responsible to ensure the proper implementation, but the
> server is not responsible...

	Good point.  We'll fix this.
> 
> 6.2. Independent Allocation
> - It is mentioned that the resources are allocated/split as a part of
> initial connection establishment. How? Specifically, are resources
> delegated to the secondary server using the same messages as used for
> "Proportional Allocation"?

	No, we need to clarify this.  Independent allocation is
	essentially algorithmic -- each server is configured in such
	a way that it *knows* which IP addresses it has available
	for allocation.  The actual IP address are available for
	allocation are *not* transmitted over the wire as they
	are for proportional allocation.

>  If so, what happens if the primary server
> requests reallocation of resources later on? Is this rejected by the
> primary server? Is this request dropped?

	I think you meant "... what happens if the secondary server requests ..."
	Good question.  The primary server scans the address space
	that is currently using proportional allocation, and ignores
	the address space using independent allocation.
> 
> 6.3. Choosing Allocation Algorithm
> - Typo in "indepentent".

	Thanks.
> 
> 7. Information Model
> - Instead of referring to "DHCP RFC" the link to the actual RFC along
> with its number should be provided.

	Sure.
> 
> - Typo in "untile"

	Yes.
> 
> 8. Failover Mechanisms
> - The last sentence is redundant as the previous one says pretty much
> the same.

	Sure.
> 
> 8.4. MCLT Concept
> - The paragraph which starts with "The fundamental relationship
> which..." is a little bit obscure. I suggest that it is reworded to
> something like: "The fundamental relationship on which much of the
> correctness of this protocol depends is that the lease expiration time
> known to DHCPv6 client MUST NOT be greater by more than MCLT, than the
> potential expiration time known to a server's partner. 

	Sure, thanks.
> 
> 
> 8.10. Acknowledging Reception
> - This section is empty. At least TBD would be useful. Otherwise it suggests
> that this part has been accidentally omitted. 

	Yes, it was accidentally omitted.  There is text there
	in the .xml file.  Have to figure out why it isn't
	coming out...
> 
> 9.1. State Machine Operation
> -  One additional event that may cause transition from one state to
> another is probably administrator's intervention?

	In some implementations, sure.  Not required
	by the design, though.
> 
> 9.3.2. Transition Out of Startup State
> - The part that describes Step 2 is intricate. I read it a couple of
> times and I still don't feel I understand it. Perhaps, it could be split
> into a couple of simpler sentences.

	I'll try...
> 
> 9.4. PARTNER-DOWN State
> - It may be worth to give a hint how the server may end-up being in this
> state. Most likely it is because his partner had stopped responding.
> However, in this case they are in the COMMUNICATION-INTERRUPTED state
> and there is some other event (what?) that needs to occur to transition
> to the PARTNER-DOWN state.

	Good point.  Great point.  I'll fix this up.

	The event is typically operation intervention, though
	it could be the auto-partner-down thing we discussed 
	earlier.
> 
> 9.11. RESOLUTION-INTERRUPTED
> - In the introduction to this chapter it is pointed out that if the
> server's remained in the POTENTIAL-CONFLICT state and the resolution of
> the conflict was interrupted, they transition to RESOLUTION-INTERRUPTED
> and they are unresponsive to clients

	I don't see the statement about being unresponsive to clients.
	Could you point me at it more specifically?

> (I presume this is because they
> were unresponsive when being in POTENTIAL-CONFLICT state and the
> conflict is still assumed to be present).

	No, not really.  That is the point of RESOLUTION-INTERRUPTED
	state -- they can attempt to do *something*, and not just
	sit there waiting for the partner.

> If you take a look at the
> 9.11.1. it is said that the server MUST respond to all DHCP client
> requests. That looks contrary to the previous statement.

	Indeed, but I can't find the previous statement...
> My
> understanding is that the server can't respond in the
> RESOLUTION-INTERRUPTED state thus it requires administrative
> intervention.

	I don't know where in the document you got that idea
	from, but I'd like to fix it.

	-----------------------------------

	Thanks for the great review!

	Regards -- Kim

> 
> Cheers,
> Marcin
> 
> 
>