Re: [dhcwg] Review comments for draft-ietf-dhc-dhcpv6-failover-protocol-00.txt

Kim Kinnear <> Fri, 04 December 2015 22:32 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id BF6531AC426 for <>; Fri, 4 Dec 2015 14:32:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -14.511
X-Spam-Status: No, score=-14.511 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id oNcdvK5BpVYf for <>; Fri, 4 Dec 2015 14:32:32 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id B1B4A1AC42F for <>; Fri, 4 Dec 2015 14:32:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;;; l=18925; q=dns/txt; s=iport; t=1449268352; x=1450477952; h=mime-version:subject:from:in-reply-to:date:cc: content-transfer-encoding:message-id:references:to; bh=A+ryLAVXkqmvcVE+VbnJjfgTMksK0BUEFvb3jtewWb0=; b=N6+MHkj547bIBsJ9E0JsKa3JfkUxSZHcXAGJDysFuLlusrddMCSA9e4m qQd76cdij5BjsvwCl0IkPX2OoELY5y03e9xqe55Hi0jDU1hgWUc0nhWVB 8Hd2JoxR4QbxbB3qN116BQoeK9fzaJD5xVCXY4SZK2moYWY7cPcK9HhvR k=;
X-IronPort-AV: E=Sophos;i="5.20,382,1444694400"; d="scan'208";a="50625182"
Received: from ([]) by with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Dec 2015 22:32:31 +0000
Received: from ( []) (authenticated bits=0) by (8.14.5/8.14.5) with ESMTP id tB4MWSV8023202 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Dec 2015 22:32:29 GMT
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\))
From: Kim Kinnear <>
In-Reply-To: <>
Date: Fri, 4 Dec 2015 17:32:28 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <>
To: Shawn Routhier <>
X-Mailer: Apple Mail (2.3096.5)
X-Authenticated-User: kkinnear
Archived-At: <>
Cc:, Kim Kinnear <>
Subject: Re: [dhcwg] Review comments for draft-ietf-dhc-dhcpv6-failover-protocol-00.txt
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 04 Dec 2015 22:32:36 -0000


First, let me thank you for this truly high quality review.  You have
a taken the time to understand what we are trying to say and have
found a number of places where the draft is inconsistent or misleading
or missing something important.  This takes a tremendous amount of
effort, and yet without several reviews of this quality we stood
little chance of pulling this draft together.  Between your review and
Bernie's my confidence in this document has gone up considerably.
Thank you again!!  We really needed this review!

More comments inline...

> On Dec 3, 2015, at 2:29 PM, Shawn Routhier <>; wrote:
> During the last IETF there was a request to review the failover protocol draft.
> I’ve now had a chance to review it and here are my comments.
> regards,
> Shawn
> **
> 1) Glossary
> Page 5, Delegable Prefix
> The use of this term seems to be inconsistent.  In the glossary
> it seems to be something like XXXX:0:1 from which a set of
> prefixes to be delegated can be generated (say a set of /64s).
> In 4.2.2 it seems to be treating the term as the prefix that would
> be delegated for example XXXX:0:1:0/64.

	I'm not in love with the term delegable prefix, for what that
	is worth, but it is the best term I've yet found.

	I was using delegable prefix to refer both to a prefix from
	which another prefix might be delegated, as well as a prefix
	which was itself delegated.  I didn't think this was
	particularly confusing, though as I said above, the entire
	term is a bit awkward for me.

> 2) Page 5, Failover endpoint - This mentions "prefixes associated with
> that relationship".  I assume this means prefixes from which addresses
> or prefixes may be leased but it could be read as simply the prefixes
> that can be delegated.

	Thanks, I'll clear this up.
> 3) Page 6, Resource - It might be useful to mention why temporary addresses
> are not included in failover - either in this paragraph or maybe somewhere
> else.  (Or maybe it was in the requirements document and I missed it?)

	Good idea, except that recently Bernie suggested that perhaps they
	*should* be included, because of the direction of some of the
	privacy drafts (or maybe RFC's). So I think we may well be 
	including temporary addresses.
> 4) page 8,
> 4.2.1 Independent Allocation
> I wonder if handing out addresses from only one partner's pool during
> a PARTNER-DOWN state will have any odd effects.  In v4 the addresses could
> sort of be re-mixed over time.  In v6 that won't happen.  Offhand I can't
> think of any issues aside from the obvious pool imbalance, but it may
> be worth spending some time to think if odd things could happen.

	I will do so.  I think the mixing in v4 was more likely to be
	an issue than the way we are doing it in this draft, and I
	haven't seen any issues in the v4 space over this.
> Paragraph 4 mentions that a partner MUST NOT lease Ipv6 addresses that
> were assigned to its downed partner and later released by a client.
> Should it also mention expired leases? declines?

	The term "released" isn't well used here -- it most certainly
	means expired and declined.  I'll fix this.
> 5) Independent allocation Algorithm
> I am confused, shouldn't the low-order bit be 127?  Or is this text trying
> to state that low-order bit within a prefix is set or cleared? 

	For sure, yes, 127.  Thanks!
> Also while it is most likely that address ranges will be described via
> a prefix it is also possible that they are described via start & end
> addresses.  For example XXXX::100 to XXXX::200 (or if you prefer to
> have a larger range XXXX::100:0:0:0 to XXXX::200:0:0:0).  When you
> mention "prefix" are you intending to restrict the description? or
> are you including something like turning the above ranges into a
> set of prefixes for processing?

	I'm not sure where you are going with this.  I see a prefix as
	either including the entire set of IP addresses which are
	possible with that prefix, or perhaps a prefix has a range or
	perhaps a set of ranges of addresses associated with it.
	Delegated prefixes come with *all* of the addresses possible
	with them.  Configured prefixes get whatever your
	configuration approach specifies.

	I don't see that we need to specify the way that prefixes are
	configured, but perhaps I'm missing something.

> 6) 4.2.2 Proportional Allocation
> What happens if I have a large number of possible prefixes?  For example
> the two servers are configured with a /32 and a prefix length of /64?
> Does the spec need to either limit this in some fashion or supply some
> guidelines or checks?

	I don't see why it would need to say anything about it.  We
	are trying to stay out of the configuration space with this
	document, for a variety of reasons.
> Are we expecting that the configuration will include the length of the
> prefixes a server can delegate?

	I would expect so, and it could be fixed or it could have maximum
	and minimum lengths.
> I don't see any option for the rewind optimization to avoid a secondary
> in communications interrupted from running out of prefixes.

	You've lost me here.  If the server isn't in partner-down
	then the secondary *could* run out of prefixes to delegate,
	but I'd think that is going to be pretty unlikely given the
	way people through around DHCPv6 prefixes, but it is possible.
> 7) 5.3.4 POOLRESP
> This definition isn't quite in line with 4.2.2.  This text has
> "to indicate that it has responded to the secondary's request"
> The use of "has responded to" suggests that the primary has completed
> the response (as in has sent all the BNDUPDs and recevied all the BNDACKs)
> 4.2.2 states that the POOLRESP make be sent before, during or after any
> BINDUPDs that need to be sent.

	Good catch, I'll fix the text to make it clear that it isn't
	associated with completing the request.
> I'm imagining the sub-options encapsulated within the option which
> I think means the length depends on the sub-options.  Are you thinking
> of some other arrangement?

	No, thanks for catching that!
> Does the document actually specify the version number somewhere?  I didn't
> see it but maybe I missed it.  (Not the option code but the actual version
> number.)

	Er, no.  I should ensure that it does.
> I believe this is the actual lifetime from 4.4?  If so it would be useful
> to make that clear - either by changing one of the names to the other
> or simply adding some text.

	This comes up again.  Actually, there are some states other
	than ACTIVE that can have a lifetime.  We have found over
	the years that ABANDONED also needs to have a lifetime, though
	we didn't start out thinking so.  Experience in the real world
	has led us down that path, and expired also has a lifetime in
	our world.  As these might not have a lifetime in some implementations,
	I'm thinking about how to allow sufficient flexibility for
	implementations while still having a standard failover protocol.
	I don't have a good answer to this as yet.
> 11) 6.1.1 Sending a CONNECT message
> From the description of Failover Endpoint (page 5) it appears that one
> server can be in failover relationships with multiple other servers and
> one server could even be in multiple relationships with one other server.
> The text in 6.2 seems to reinforce this as an option.  If this is true then
> doesn't the CONNECT message need to contain information to specify which
> relationship this CONNECT message is for?  (Note: I'm not arguing for
> having multiple endpoints on the same pair of servers, I am wondering
> how one of them will determine which of those endpoints the other is
> trying to reach.)

	What a great catch!  Thanks!  I used to think multiple
	relationships were a great way to let people do whatever they
	wanted, but experience has shown that you get headaches and
	ultimately, less availability, when you allow more than one
	relationship per server.  At least with our v4 failover
	customers.  We moved away from multiple relationships years
	ago and have never looked back, so I don't typically think
	about multiple relationships.  And here is where I really left
	out the key to having multiple relationships -- some way to
	define them.  I'll fix this.  Wow, I can't believe I missed
> 12) 6.1.2 Receiving a CONNECT message
> It sounds like the the two servers can use different versions of the protocol,
> receive time, and unacked bind updates.  The latter two seem okay but the
> text may want to mention that as a possibility.  The first one seems like
> it might be difficult to deal with.

	I was planning to leave the version negotiation until we had
	a second version.  Once there was a second version, then we could
	see if it would make sense to work cross-version or what.  But
	as long as we *have* a version, it should be possible to defer
	that decision.  But I'm open to discussion on that.
> 13) 6.3 Sending a STATE message
> In the bullet list two of them use "this failover endpoint" while the
> other two use "the failover endpoint".  I think they are also referring
> to the failover endpoint sending the STATE message and find "this failover
> endpoint" clearer.

	I'll fix this.
> 14) 6.6 Unreachability detection
> Does the document need to specify the messages are on this connection?
> "...any message is transmitted on this connection...".

	I don't see this as necessary.  If you care, could you motivate
	this a bit more?
> 15) The second paragraph states that if the option OPTION_F_RECEIVE_TIME is
> in the CONNECT or CONNECTACK message it is used.  But the descriptions of
> CONNECT and CONNECTACK messages make it sound like that option is required.
> One of the two sections of text should be clarified (I would vote for
> always requiring the option and disconnecting if it isn't in the CONNECT

	I will make them required, nice catch!
> 16) Figure 1, note 4
> This has the following text:
> ...and the MCLT has passed since the entry in RELEASED, EXPIRED, or RESET states.
> Should "in" be "into"?

	For sure.
> 17) Figure 1, note 7
> This states that Addresses belonging to the primary server can be assigned to
> the secondary server pool (or the reverse) but elsewhere it appears that
> addresses are always handled via Independent Allocation rather than
> Proportional Allocation.

	Leftovers from previous drafts.  I'll fix this.
> 18) 7.3 Times Required for Exchanging Binding Updates
> partner-raw-clt-time
> This mentions time context which is also mentioned later on but is never defined.
> Perhaps adding an entry in the definitions would be useful?

	Will do.
> state-expiration-time
> As mentioned abouve for 5.4.20 I believe this is the actual lifetime as
> handed to the client?  If so some text to clarify that would be useful.
	Above, I discussed that active isn't the only timed state.
> 19) 7.4 Sending Binding Updates
> The double asterisk (**) is defined as "only if the value in the OPTION_F_BINDING_STATUS
> is associated with a timeout".  I don't see a description of which status vaues
> are associated with a timeout.  Does this mean that a timeout happened? or could
> happen?  From the use I think this means status == ACTIVE and therefore there is
> a time at which this entry would expire, but I don't think that is clear from
> the text.

	This is the crux of the "which states are timed states" issue.
	I will drive a stake in the ground when I update the draft and
	say that these states are timed.  This is exactly what I hoped
	people would find -- the things missing altogether from the
> 20) 7.5.2 Procesing Binding Updates, para 4
> This paragraph seems to indicate that a BNDACK doesn't need to include
> all of the OPTION_CLIENT_DATA options from the BNDUPD.  From the other
> paragraphs it sounds like the BNDACK should contain all of them indicating
> either success or failure.  In further reading 7.6 does state that the BNDACK
> contains an option for each option in the BNDUPD so perhaps update the phrase
> "each of which corresponds to one of the OPTION_CLIENT_DATA options"
> to something like:
> "one for each of the OPTION_CLIENT_DATA options"

	Will do.  Of course, Bernie has noted that not all information 
	appears in OPTION_CLIENT_DATA options because of prefix delegation
	and the need to exchange information on prefixes that are not
	associated with any client.  So that needs to be figured out.
> 21) Bullet item 6:
> This states that the partner-lifetime can be updated from a BNDUPD.  In 7.3
> the partner-lifetime is described as "The time that we will send (or have
> sent) the partner...".  I think the text in 7.3 may need to update a bit
> to indicate that the partner can update us as well.

> 22) 7.6 Sending Binding Acks
> It occurs to me that this text and the previous text about IA_NAs and
> IA_PDs doesn't say if multiple options for a single client  MUST or SHOULD
> be in a single OPTION_CLIENT_DATA or if the server is free to put them
> in multiple OPTION_CLIENT_DATA options (or even in separate BNDUPDs).
> I'm not sure if the protocol should require any behavior but it might
> be useful to at least point out the possibility.

	I'm going to defer a definitive response here.  I think I need to
	do a little research on the bulk leasequery draft and see where
	it goes with this.  But I will say *something* more in this area.
> 23) 7.7 Receiving Binding Acks
> Is there any processing for receving a NAK?  or should there be
> text saying if a NAK is recieved don't do anything?

	Interesting question.  I will have to think on this too.  I
	don't see anything in the draft that we do on a NAK, but I
	will give it some more thought.  In either case, I'll add text
	with the answer to your question!
> I also don't understand why the partner-lifetime is set to 0.  Given
> the acked-partner-lifetime is set to it, retaining partner-lifetime doesn't
> seem necessary but it doesn't seem necessary to 0 it either.  Is this
> to provide a way to know if / when the partner will need to be updated if
> it isn't 0?

	In some cases, partner-lifetime is used as a flag that you need
	to send something to the partner.  Once you have done that and
	they got it, it makes sense to clear it.
> 24) 7.9 BNDUPD/ BNDACK Data Flow, Figure 4
> At the bottom of the figure is the state expiration line in BNDACK.
> I assume that there is nothing under storage on receiving side as
> that server doesn't update the state-expiration-time.  Perhaps
> explicitly mentioning that would be clearer?  Something like 
> "no change" or "no update"?

> 25) 8.1 State Machine Operation
> The last paragraph before the diagram mentions "+" and "-".  It would
> be best to specify what each of them mean (I assume "-" means no
> communication and "+" means communication.  I would also add a third
> to mean both.  Some of the boxes include "-|+" while others simply
> use "+-", having a single character would probably be clearer.

	Maybe "*"?
> 26) 8.3.2
> This section makes use of PREVIOUS-STATE and CURRENT-STATE without
> a definition of them.  I believe they are intended to be the variables
> for those items? 

	I'll discuss these locals/variables.
> There seems to be some inconsistency in the use of PREVIOUS-STATE &
> CURRENT-STATE and previous state & current state.  See step 2 para 2,
> step 5 para2, step 5 para 4

	Nice catch, I'll fix this.
> 27) In Step 6
> Should the server proceed to transition from PREVIOUS-STATE to the
> next state due to communications interrupted?  Or perhaps the text
> from step 2 should be moved to here?  I don't see what having step
> 2 is doing.  In step 5 there is a mention of taking the communications
> OK transition, so isn't step 6 the only other case?

	Hmmm.  I think the point of this discussion is that step
	2 sets the value of PREVIOUS-STATE, but the transition to
	that state is in (well, could be in) step 6.
> 28) 8.7.2 Transition out of RECOVER-DONE State
> The text doesn't seem to match the diagram.  The text only mentions
> transitioning when the partner is NORMAL or RECOVER-DONE.  The diagram
> also (seems to) has (have) transitions for: RECOVER & RECOVER-WAIT to

	Wow, yes.  I will fix the text.
> 29) 8.8.1 Operation in NORMAL State
> This note is for the NORMAL box in the diagram.  The box has (responsive)
> which would generally indicate that both servers are responsive while
> only the primary is.

	Good catch.  I'll fix that.
> Para 2 mentions starting a timer if configured for auto-partner down.
> I assume this is the "Safe Period Timer" from the diagram?  If so
> mentioning that name in the text would be clearer.  (Something like
> "...then a timer (the Safe Period Timer from the Figure 5) is started...")

	Thanks. I thought I had expunged "Safe Period".  I'll
	fix this.
> 31) 9.3 Adding RRs to the DNS
> There is a use case for doing DDNS updates on all lease renewals.
> If there is a concern that the DNS might be damaged in some way
> then having the DHCP server update it on each renew will tend
> to heal it of breakages.

	For sure, I'll mention that.
> 32) 9.4 Deleting RRs from the DNS
> Is there any concern about the server making the resource FREE* crashing
> after receiving the BNDACK but before doing the DDNS update?  I think
> the end result of that would be the AAAA and PTR records still in the
> DNS until the address is reused (for the PTR) and the client gets
> a new address (for the AAAA) or possibly the server recovers and finishes
> the cleanup.

	That is an issue, and I think it depends on the implementation
	and how it records whether a DNS removal need to be done.  I need
	to think about this one a bit more before I have a good answer 
	for you.

	Thanks again for a great review!


> _______________________________________________
> dhcwg mailing list