Re: [dhcwg] WGLC comments on draft-ietf-dhc-dhcpv6-failover-protocol-01

Tomek,

Thank you for reviewing it.   My comments are embedded below...

> On May 27, 2016, at 2:17 PM, Tomek Mrugalski <tomasz.mrugalski@gmail.com> wrote:
> 
> On 28.04.2016 21:25, Bernie Volz (volz) wrote:
>> This message starts the DHC Working Group Last Call to advance
>> draft-ietf-dhc-dhcpv6-failover-protocol-01, DHCPv6 Failover Protocol.
>> This document’s intended status is Proposed Standard. At present, there
>> is no IPR file against this document.
>> 
>> Please send your comments by May 22, 2016. If you do not feel this
>> document should advance, please state your reasons why.
> It definitely took much longer than I initially anticipated, but here it
> goes. While my name is still on this I-D, I have not been involved
> in this draft for at least a year. All the praise for this work should
> go to Kim. As such, please consider this review as if it was provided by
> non-author.
> 
> I did read the draft in its entirety. It is long and complex, but very
> well written with concepts, implementation details and reasons for
> design choices being well explained.
> 
> Here are my comments. Many of them are editorial in nature, so please
> forgive I did not segregate them out.
> 
> Abstract:
> to provide for DHCPv6 failover => to provide DHCPv6 failover

	Sure.
> 
> Section 1:
> Add "Active-active mode may be defined in the future."

	I specifically removed this.  I don't see the value of saying
	this.  If it happens, it happens.  If it doesn't, it doesn't.
	How does saying this help anything?  If you feel strongly, we
	can put it back in. 

	My basic thought in the later stages of working on this draft
	was -- if it isn't necessary, it should not be in the draft.
	I've *tried* to keep to that approach, in order to keep this
	draft small enough to actually move through the process.

	Also, everything that isn't required that is in the draft is
	just something *else* to argue about with people who review
	it.  And a lot of people are going to review it.  At least
	we hope so, anyway.
> 
> Section 3:
> To make it consistent, we should change all instances of Binding status
> (3 ocurrences) to binding-status (31 occurrences).

	Sure.  Nice catch.
> 
> server when it is in the BACKUP state. =>
> server when it is in the FREE-BACKUP state.
> (there's no BACKUP state)

	Good catch.
> 
> and BACKUP on the secondary server =>
> and FREE-BACKUP on the secondary server

	Yes.
> 
> Section 4.2.1
> used for allocating individual DHCP IPv6 addresses =>
> used for allocating individual IPv6 addresses

	Yes.
> 
> When an IPv6 address goes PENDING-FREE Section 7.2 it is owned =>
> When an IPv6 address goes PENDING-FREE (see Section 7.2) it is owned

	Yes.
> 
> Section 4.2.1 says:
>  An operational partner MUST NOT lease IPv6 addresses that were
>  assigned to its downed partner and later expired or were released or
>  declined by a client unless it is in PARTNER-DOWN state.  When it is
>  in PARTNER-DOWN state, a server SHOULD allocate from its own pool.  It
>  SHOULD NOT use its partner's pool.
> 
> This text is inconsistent. Either we use SHOULD/SHOULD NOT everywhere
> or MUST/MUST NOT.

	I could not disagree more -- it is perfectly appropriate
	to use MUST/MUST NOT for one situation and SHOULD/SHOULD NOT
	for a *different* situation.

	However, I think that below you are suggesting that we
	never let one partner do more than extend the other partner's
	leases, and I'm ok with that.

	To that end, I will change this to say:

 An operational partner MUST NOT lease IPv6 addresses that were
 assigned to its downed partner and later expired or were released or
 declined by a client.  When it is in PARTNER-DOWN state, a server
 MUST allocate new leases from its own pool.  It MUST NOT use its
 partner's pool.

> Also, the last sentence should be clarified to "It
> SHOULD NOT use its partner's pool to allocate new leases.".

	Yes, essentially.
> 
> I like the approach to allocate odd addresses on one partner and even
> on another. This implies exactly 50/50 split between them (we previously
> talked about configuration parameter for that), but that's
> ok. If someone is running out of IPv6 addresses, he has bigger problems
> than misbehaving failover.

	I agree with this change, and have made the text change above.
> 
> Section 4.3:
> "will have no record at all of the lease to the DHCP client." =>
> "will have no record at all of the lease being assigned to the DHCP
> client."

	Sure.
> 
> Section 5.1
> The text does not define how to set transaction-id. There's reference
> to Section 5.1 of RFC5460, which in turn says that the meaning of
> the fields are defined in RFC3315. That RFC says that clients set them
> to random value. There are no clients in failover protocol. I suppose
> we need a text that whoever initiates the exchange, must set the
> transaction-id. How about setting it to increasing value. This is better
> than using the usual random, which has a slim, but non-zero chance of
> repeat itself. Random could work most of the time, but every couple
> months the PRNG would spit the same value sufficiently close to each
> other and then failover could skip an update.

	Good point, I'll create some words for this.
> 
> Q: section 5.2 says that msg-type is different than the one in DHCPv6.
> Why? We currently have 23 types defined. Do we really expect that many
> more to be defined? My thoughts about this are explained below when
> commenting Section 11.

	See my comment when you discuss Section 11, below.
> 
> Section 5.3.9
> I think CONNECTACK is not the right name for this message, as it
> can also be used to reject connection. ACK in DHCP world definitely
> means a positive acknowledgement. So better name would be
> CONNECT-REPLY or maybe CONNECT-RESPONSE?

	Sigh.  I don't entirely agree that in the DHCP world that
	ACK always means a positive response.  In the bulked DHCPv4
	failover world, an ACK can have some positive and some negative
	responses, and I carried that over here.  But it sounds like
	this is important to you.  If we are going to change it,
	I would like to keep message types a single thing, no hyphens.
	I understand that RFC 3315 has hyphened-messages.  I would 
	still like to keep these to have no hyphens for a variety
	of reasons.

	CONNECTREPLY
> 
> The same is true for BNDACK as BNDACK may actually contain NAK
> response. BND-REPLY or BINDING-REPLY seems better suited.

	BNDREPLY
> 
> Generic comment for Sections 5.4.*. Most of the options do not have
> proper names and they're referred to by the constant values. Most
> (all?) DHCPv6 options have proper names that are used to refer them.
> Do you think it would be useful to include them?

	Well, no.  Why create an additional level of translation?
> 
> Also, for each option there's an explanatory text that says what the
> content of the option is. It would be very useful to have one sentence
> for each option that explains when and where this option may appear.

	I think that has two problems:

	1. It duplicates information already in the draft in the
	messages themselves.  The descriptions of the messages should
	say when an option should or should not appear -- that is
	where the context of how they are used is clear.  If we have
	that in two places it can be inconsistent, and then which one
	does the user follow?

	2. It creates a need to "update" the this draft if some other
	draft in the future decides to reuse one of these options.
	Which isn't a huge deal, but why make life harder for future
	folks?

	So, really, you can't expect to understand how these options
	are used from their definitions.  Either the definitions come
	first, and you have to suspend your desire to know everything
	about these options are used until later in the draft, or we
	have to put the options definitions at the end of the draft
	and you have to keep referring to that as you read the meat of
	the draft.  It is a circular definition problem.

	Seems to me that the most recent approach to this with
	drafts/RFCs in the DHC WG is to define the options up front,
	and then describe how and when they are used, which I have
	done here.  I have written drafts both ways over the years,
	and neither is wonderful, but both work.

	But with the "option definitions first" approach, you simply
	cannot expect to understand how and when they are used when
	reading section 5.4.

	I don't particularly want to put the options all at the end,
	since I think that is not the current approach in our WG, and
	so I think people have to wait to find out how the options are
	used until the message that use them describe how they are
	used.
> 
> Section 5.4.3
> There should be a text similar to: "Three encapsulated options are
> defined at this time, but additional options may be defined in the
> future."

	Again, what is the value of "... but additional options may
	be defined in the future."  That's true of anything here,
	why would you say that for this section and not for everything
	else?  

	That is just confusing.  I think this is a bad idea.
> 
> Sections 5.4.4 - 5.4.6
> Extra text should be added: "This option may appear only in
> OPTION_F_DNS_REMOVAL_INFO"

	I also think this is a bad idea.  If someone wants to reuse
	one of these options in some other draft, why are you not
	allowing them to do so?

> 
> Section 5.4.7
> I'm unclear whether this is the greatest lifetime for all leases
> ever processed by the server? Or is it supposed to be an encapsulated
> option and it describes a single lease only?

	See Sections 7.3, 7.4, 7.5.3, 7.5.5, 7.8.

	It, along with many other options, are used inside of an IA_XX
	option.  You have to look at the sections of the document where
	this is used to see what value actually goes into it.

	I will add: "The greatest lifetime that this
	server has ever acked to its partner in a BNDACK for this lease
	or prefix."

> Section 5.4.8
> Getting the value of this option may be tricky. It depends on many
> things, but one of the limiting factors in Linux are socket buffers.
> They're limited by the number of bytes rather than number of packets.

	Hmmm.  This has little to do with Linux.  This is how many
	packets the DHCP server is willing to have *read* from the
	socket and be working on at once.  If the receiving server has
	a fixed number of buffers for failover, then if you send more
	than it can hold, it might well drop the packets.  The goal
	here is to *not* use TCP backpressure to stall the sender,
	so that the connection stays clear for important messages
	like DISCONNECT or STATE changes.  So we want to read from
	the socket and keep the connection unblocked.

	This number tells the sender that the receiver doesn't want to
	see more than this number outstanding at one time.

> 
> Section 5.4.12
> When is this option sent? After connection with partner is
> reestablished?

	See Section 6.3.

> 
> Section 5.4.13
> "associated with this IPv6 address." =>
> "associated with this IPv6 address or prefix."

	Sure.
> 
> Section 5.4.14
> The figure there has extra 32bits field called protocol-version. I
> think that's an editing artifact and should be removed.

	Good catch.  I had to look at it several times to see that!
> 
> Also, do you think it would be reasonable to add a text that in
> general the major version must match or the servers won't be able to
> interoperate? This is just a proposal for consideration. In Kea we're
> using database versioning with major/minor numbers. Minor numbers are
> for backward compatible changes, like little tweaks. Once we change
> something significant, we bump up the major number. It's only an idea. I
> don't insist on it in any way.

	I don't want to restrict people's ability to build cool
	software.  If George wants to build a server that will work
	with three different major versions, should we stop him?
	This is entirely an implementation decision, in my view,
	so no, I don't think we should say this.
> 
> Section 5.4.15
> I would recommend renaming this option to OPTION_F_KEEPALIVE_TIME.
> Keep-alive is immediately understood, while receive time may mean
> many things.

	I didn't want people to somehow assume that this was related
	to the TCP keepalive capability.  In particular, I didn't want
	them to say "why not just use TCP keep alive and you won't
	need this option", and then I have to explain it in detail.

	But if you want to change it, sure.

	OPTION_F_KEEPALIVE_TIME
> 
> Section 5.5
> Description of AddressInUse status says:
>  One client on one server has leases that are in conflict with the
>  leases that the client has on another server.  Alternatively, the
>  address could be associated with a different IA_ID on each server.
> That's IAID. IA_ID looks like a new type of IA container, which confused
> me at first.

	Good catch.
> 
> Section 6.1.2
> The text says that MCLT received in the CONNECT message should be
> compared with the configured MCLT value. Does it make sense at all
> to allow MCLT configuration on the secondary side? Removing this
> ability and accepting whatever primary sends seems the way to go
> here. One less parameter to misconfigure. Anyway, if you mistrust the
> values primary may send, why establish the relationship at all?

	It is vital that the secondary and primary agree on the
	MCLT.  We tend to ensure that they are configured identically,
	and consider it an warning if they are not.  You are suggesting
	that the secondary would have to remember whatever value the
	primary most recently told the secondary during the CONNECT.
	That's reasonable, though different than our approach has
	been for the last decade or so.

	Ok, I'll change it.
> 
> Section 7.3
> liftime => lifetime
> 
> The time when the this server... =>
> The time when this server...

	Thanks. 
> 
> Section 7.4
> The following text:
> 
>  Every BNDUPD message contains information about either a single
>  client binding in an OPTION_CLIENT_DATA option, or a single prefix
>  lease in an OPTION_IAPREFIX option.
> 
> is a bit confusing and took me a while to understand what it says. How
> about this rewording:
> 
>  Every BNDUPD message contains information about either a single
>  client binding in an OPTION_CLIENT_DATA option that include IAADDR
>  or IAPREFIX options associated with that client, or a single prefix
>  lease in an OPTION_IAPREFIX option for prefixes that are currently not
>  associated with any clients.

	Great, sold!
> 
> This section also mentions that OPTION_VSS MUST appear in BNDUDP
> message if certain conditions are met. I disagree with this. Support
> for RFC6607 is not mandatory and there may be valid, working failover
> deployments that don't support VSS. Hence I don't agree with the MUST here.

	Yes, I agree that RFC6607 is not mandatory.  

	The condition where the MUST is invoked for the OPTION_VSS is
	the condition where you *are* using RFC 6607.  If you are
	*not* using RFC 6607, then you MUST NOT use the OPTION_VSS.
	The MUST doesn't make you support RFC 6607, it says that if
	you *are* using RFC 6607, you MUST use OPTION_VSS.

	I see this as both correct and required, so I am not
	inclined to remove this.
> 
> I think we should to reword the list slightly. The sentence before the
> list says "MUST contain at least the data shown below in its
> client-options section", but then OPTION_F_RECONFIGURE_DATA has lower
> case "if any". I think it would be better to rephrase this list as:
> 
> The OPTION_CLIENT_DATA option:
> 
> - MUST contain option 1
> - MAY contain option 2
> - MUST NOT contain option 3
> ...

	Yes, I can see that the original MUST is a little overbearing.
	I think this can be handled by listing the data and avoiding
	2119 language except in specific circumstances.  The discussion
	above on RFC 6607 is one of those circumstances where I think
	RFC 2119 language is required.
> 
> The list also mentions IA_TA. At the first glance it does not make
> much sense to use failover to provide lease stability for addresses
> that are temporary by nature. Do you think failover should cover
> temporary addresses at all? There's a tiny bit of usefulness to
> prevent duplicates in already assigned temporary addresses. But is it
> worth the effort?

	We originally thought the same way, but then realized that 
	IA_TA leases *can* be renewed.  Certainly we would expect that
	they would rarely be renewed, but they can.  Given that, it
	seemed reasonable to allow them to be renewed from the other
	server if the primary server went away.   If your server
	doesn't allow renewals for IA_TA leases, then supporting them
	in failover would have little value. 

	This is one of those cases where I am always tempted to put in
	more words justifying and explaining why we did something, but
	when we did that for v4 the draft was 121 pages.  So I've tried
	to just say what to do for the protocol.  

	I will put in one sentence to clarify this, saying that they
	can be renewed and that is why we support them, as this is likely
	to be a common question.
> 
> The following text is a bit confusing:
> 
>  OPTION_STATUS_CODE containing an error code, or
>  containing a success code if a message is required.
>  If the information in the corresponding OPTION_IA_NA,
>  OPTION_IA_TA, or OPTION_IA_PD option in the BNDUPD was
>  accepted, and no status message was required (which is
>  the usual case), no OPTION_STATUS_CODE option appears.
> 
> The first sentence suggests that there are cases when success status
> code sometimes is necessary, but the second one says it does
> not appear. There's no normative language, I admit, but nevertheless
> this should be clarified (something like status code may appear,
> but it's perfectly ok to skip it for success status).

	The whole silly way that the OPTION_STATUS_CODE is
	defined is confusing, and we are just paying the continuing
	price for it here.

	I will reword this to be clearer.
> 
> Section 7.7
> 
> partner.. (remove extra dot).

	Yes.
> 
> The phrase NAK is used in section 7.7 only. Other sections refer
> to this condition using the word reject or rejected. Maybe we could make
> the naming consistent? Or at least add a text explaining that NAK means
> a rejected update?

	Sure.
> 
> Section 7.7 ends with a list of 3 bullets. It's unclear what they
> relate to. Is this a continuation of the previous list? Then they
> should be 3, 4, and 5. If not, then some leading sentence should
> be added explaining what this list describes.

	They are what do with the information in the
	OPTION_IAPREFIX, concerning a single prefix lease.

	I'll add more words to make this clear.
> 
> Figure 4 - extra empty lines between counters could be useful here.
> It's confusing whether the (uncorrected) comment refers to the line
> above or below.

	I'll fix this.

> Section 8.5.2
> "communications fails". This one is genuinely confusing for
> non-native speaker, I must admit that. Shouldn't this be either
> "communication fails" or "communications fail"?   Merriam-Webster
> dictionaly says that Communication is plural, but may be used as
> singular or plural. Communications is plural, but there's an example
> that says "Communications is...". Source:
> http://www.merriam-webster.com/dictionary/communications?show=0&t=1292594132

	Sorry about that.  I think "communication fails" would be
	better than what is there.
> 
> Section 8.6.2
> "This is to allow any IPv6 addresses..." =>
> "This is to allow any IPv6 addresses or prefixes..."

	Sure.
> 
> Section 8.7.1
> The text says that in the RECOVER-DONE state the server may respond to
> RENEW requests and extend existing leases. It MUST NOT allocate any
> additional leases when in RECOVER-DONE state. Ok, so how should the
> server behave if a Renew comes in and there's no lease for it. Respond
> with NoAddrsAvail or not repond at all? On one hand, not responding
> seems safer, because the client will likely retransmit and there's
> non-trivial chance that the server will be back in Normal state by
> then and everyone will be happy. On the other hand, responding with
> NoAddrsAvail status has the benefit of potentially providing other
> configuration options. If you don't have any preference, I think going
> with no response is the way to go. Regardless of the choice, the
> behavior should be clarified.

	I agree.  Also, "change the state of lease that appear" ->
	"change the state of a lease that appears".
> 
> Section 11
> "These values should start at 20, in order to reduce the
> chance of collisions with existing DHCPv4 Messages using the same
> ports as DHCPv6 failover.". I don't like that approach for several
> reasons. First, the arbitrarily chosen 20 is already outdated (we have
> standard DHCP message types up to 23 defined now). No matter what
> values you start from, there will likely be a conflict some time
> in the future. But this is a minor thing. Second, more important
> issue is the confusion it brings. When you ask what message type 1 is,
> every DHCP engineer with a clue will instantly tell it's Solicit.
> With this proposal, the answer will be "in what context: dhcp proper
> or dhcp failover"? This has more implications than it looks at
> the first glance. Many logging routines in the code would have to
> be updated to pass extra info whether the message should be
> interpreted as DHCP or DHCP failover message. Also, think about
> cases where you use tools like wireshark to inspect existing traffic.
> They would have to be extended with extra logic to do conditional
> parsing based on the TCP port number. To make it more convoluted,
> we already have cases of TCP-based traffic in bulk and active
> leasequery. We could have more if draft-volz-dhc-relay-server-security
> evolves towards TLS solution.
> 
> Oh, and the failover solution *is* using some messages from the
> standard DHCP. STARTTLS is used when establishing connection. It happens
> to have message code 23, which directly conflicts with proposal to start
> the codes from 20.
> 
> So what's the problem with using existing message type registry
> anyway? With the proposed 12 messages, we'd have 35 message types
> defined in DHCPv6. There's still 220 more available until we run out
> of types. And seriously, if we ever define 200 message types, it
> would mean something somewhere went horribly wrong.
> 
> I would very much prefer to use regular message types. Failover
> is complicated enough without extra message type tricks.

	I could spend a lot of time justifying the way that it
	is currently written -- and there are what I think are
	good reason for it, but since I am going to agree to
	change it, I won't bother.

	I will change this to be the way you want it.
> 
> --------------------
> 
> Partners send CONTACT messages as a mechanism to keep the connection
> alive. Do you think it would make sense to get rid of this message and
> send STATE messages instead? The benefit is that the server would have
> more up-to-date configuration that its partner is in specific
> state. There would be one less message type, so it would make the spec
> marginally simpler. I think this could be useful to detect cases when
> one way of communication fails, but the other is still functional.  On
> the other hand, CONTACT message is probably empty, so sending STATE
> with all the options would generate more traffic even during normal,
> healthy operation.

	I very much want to keep the CONTACT message for a variety of
	reasons:

	1. We have various capabilities to log messages in
	log files for debugging.  We have one log setting that will
	log *all* failover messages, and a different one that will log
	all failover messages *except* CONTACT messages.  So you only
	see the "good stuff".  I think this is a useful capability for
	anyone implementing failover, and while I'm sure we could put
	some status bits in a STATE message to say "this isn't really
	a state message, this is really a contact message but we sent
	a state message because we got rid of the contact message", it
	is a lot easier to simply have two different messages.

	2. CONTACT messages are shorter.

	3. STATE messages MUST be sent anytime anything changes
	anyway, and they are useful as sentinels that something *did*
	change.  If we send them all the time, the don't carry that
	meaning.  If a server would have more up to date information from
	receiving a STATE message used as a CONTACT message -- information
	that, today, it didn't get as a STATE message, then something is
	broken.

	4. I don't see how this helps you detect a one-way
	communication failure better than having CONTACT and STATE
	messages.

	5. I very much like the fact that when you see CONTACT
	messages you know immediately that there isn't other traffic
	going on.

	6. Our implementation of v4 failover did exactly what
	you propose with state messages, and it cost processing time
	since you have to interpret every STATE message and save the
	contents if different, and decide if the contents *are* different.
	With a CONTACT message, you just ignore it very early on.  It
	was also confusing, because you didn't know if this message
	came out because it was supposed to because something changed,
	or it came out because the timer expired.  Hard to debug.

	In case it isn't obvious, I *really* don't want to merge
	the CONTACT message and the STATE message.  Been there, done
	that, and really didn't like it.  Really. 

	That's it -- I'll make the changes I've agree to.  If
	there are some that I didn't agree to that you want to discuss,
	let me know.

	Thanks for the detailed review -- Kim

> 
> --------------------
> 
> Ok, that's it. Overall, I think Kim did a fantastic job with this draft.
> There are some nits to be improved, but only a handful of technical
> comments to be addressed. For an almost 90 pages long draft this is
> constitutes a great shape. Thank you for this work, Kim.
> 
> Tomek
> 
> _______________________________________________
> dhcwg mailing list
> dhcwg@ietf.org
> https://www.ietf.org/mailman/listinfo/dhcwg