Re: [dhcwg] Review comments for draft-ietf-dhc-dhcpv6-failover-protocol-00.txt

Shawn Routhier <sar@isc.org> Tue, 08 December 2015 05:54 UTC

Return-Path: <sar@isc.org>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD3DD1B2DFE for <dhcwg@ietfa.amsl.com>; Mon, 7 Dec 2015 21:54:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.789
X-Spam-Level:
X-Spam-Status: No, score=0.789 tagged_above=-999 required=5 tests=[BAYES_50=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EdwxM5sD7R4E for <dhcwg@ietfa.amsl.com>; Mon, 7 Dec 2015 21:54:47 -0800 (PST)
Received: from mx.ams1.isc.org (mx.ams1.isc.org [IPv6:2001:500:60::65]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 87CBA1B2D40 for <dhcwg@ietf.org>; Mon, 7 Dec 2015 21:54:47 -0800 (PST)
Received: from zmx1.isc.org (zmx1.isc.org [149.20.0.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx.ams1.isc.org (Postfix) with ESMTPS id 18C581FCBAE; Tue, 8 Dec 2015 05:54:43 +0000 (UTC)
Received: from zmx1.isc.org (localhost [127.0.0.1]) by zmx1.isc.org (Postfix) with ESMTPS id 5B78916003C; Tue, 8 Dec 2015 05:57:04 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1]) by zmx1.isc.org (Postfix) with ESMTP id 4D96F16004F; Tue, 8 Dec 2015 05:57:04 +0000 (UTC)
Received: from zmx1.isc.org ([127.0.0.1]) by localhost (zmx1.isc.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id lXXubK6nh1hA; Tue, 8 Dec 2015 05:57:04 +0000 (UTC)
Received: from shawns-mbp.attlocal.net (107-138-42-57.lightspeed.sntcca.sbcglobal.net [107.138.42.57]) by zmx1.isc.org (Postfix) with ESMTPSA id 0B4B016003C; Tue, 8 Dec 2015 05:57:04 +0000 (UTC)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Shawn Routhier <sar@isc.org>
In-Reply-To: <E3E44720-8B86-4101-863E-A919A3F22EB2@cisco.com>
Date: Mon, 07 Dec 2015 21:54:41 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <53B64972-B0EB-4092-A8F1-604BC298B9BE@isc.org>
References: <0A4CA8D4-3D24-40D1-B664-C9B99538FF52@isc.org> <E3E44720-8B86-4101-863E-A919A3F22EB2@cisco.com>
To: Kim Kinnear <kkinnear@cisco.com>
X-Mailer: Apple Mail (2.2104)
Archived-At: <http://mailarchive.ietf.org/arch/msg/dhcwg/yjlEt6AoRoG-AXZsXDISDJYkGao>
Cc: dhcwg@ietf.org
Subject: Re: [dhcwg] Review comments for draft-ietf-dhc-dhcpv6-failover-protocol-00.txt
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dhcwg/>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Dec 2015 05:54:50 -0000

> On Dec 4, 2015, at 2:32 PM, Kim Kinnear <kkinnear@cisco.com> wrote:
> 
> Shawn,
> 
> First, let me thank you for this truly high quality review.  

You’re welcome.  I should also point out that, while I had a number of comments
the draft seems to be in pretty good shape to start with.

> More comments inline…

I’ve snipped out things that seemed clear.

> 
>> On Dec 3, 2015, at 2:29 PM, Shawn Routhier <sar@isc.org> wrote:
>> 
>> During the last IETF there was a request to review the failover protocol draft.
>> I’ve now had a chance to review it and here are my comments.
>> 
>> regards,
>> Shawn
>> 
>> **
>> 
>> 1) Glossary
>> Page 5, Delegable Prefix
>> The use of this term seems to be inconsistent.  In the glossary
>> it seems to be something like XXXX:0:1 from which a set of
>> prefixes to be delegated can be generated (say a set of /64s).
>> In 4.2.2 it seems to be treating the term as the prefix that would
>> be delegated for example XXXX:0:1:0/64.
> 
> 	I'm not in love with the term delegable prefix, for what that
> 	is worth, but it is the best term I've yet found.
> 
> 	I was using delegable prefix to refer both to a prefix from
> 	which another prefix might be delegated, as well as a prefix
> 	which was itself delegated.  I didn't think this was
> 	particularly confusing, though as I said above, the entire
> 	term is a bit awkward for me.

Where it might get confusing is in 4.2.2 or in the BNDUPD.  I think in those
areas the text is referring to a single prefix and the intention is to have the
servers pass a number of single prefixes back and forth 1 per BNDUPD.
By using an unqualified delegable prefix in the text it could be read to mean
that the primary can send a number of prefixes by aggregating them.  For example
the primary could send 1 update
XXXX:0:1:0/60
or it could send 16 updates
XXXX:0:1:0/64, XXXX:0:1:1/64, … XXXX:0:1:F/64

> 
>> 
>> 3) Page 6, Resource - It might be useful to mention why temporary addresses
>> are not included in failover - either in this paragraph or maybe somewhere
>> else.  (Or maybe it was in the requirements document and I missed it?)
> 
> 	Good idea, except that recently Bernie suggested that perhaps they
> 	*should* be included, because of the direction of some of the
> 	privacy drafts (or maybe RFC's). So I think we may well be 
> 	including temporary addresses.

Either including them or a note as to why they aren’ t included works for me.

>> 
>> 5) 4.2.1.1 Independent allocation Algorithm
>> I am confused, shouldn't the low-order bit be 127?  Or is this text trying
>> to state that low-order bit within a prefix is set or cleared? 
> 
> 	For sure, yes, 127.  Thanks!
>> 
>> 
>> Also while it is most likely that address ranges will be described via
>> a prefix it is also possible that they are described via start & end
>> addresses.  For example XXXX::100 to XXXX::200 (or if you prefer to
>> have a larger range XXXX::100:0:0:0 to XXXX::200:0:0:0).  When you
>> mention "prefix" are you intending to restrict the description? or
>> are you including something like turning the above ranges into a
>> set of prefixes for processing?
> 
> 	I'm not sure where you are going with this.  I see a prefix as
> 	either including the entire set of IP addresses which are
> 	possible with that prefix, or perhaps a prefix has a range or
> 	perhaps a set of ranges of addresses associated with it.
> 	Delegated prefixes come with *all* of the addresses possible
> 	with them.  Configured prefixes get whatever your
> 	configuration approach specifies.
> 
> 	I don't see that we need to specify the way that prefixes are
> 	configured, but perhaps I'm missing something.

It may be a nit - what I was thinking about is that the term “prefix” tends to mean
a single block of addresses as in XXXX::/64.  So there could be some confusion
that the protocol requires configuration based on that style.  From your reply that
isn’t your intent so using something like 
  “For each address that can be allocated, the primary server MUST allocate…”
Which avoids any discussion about how the addresses are arranged.  

> 
>> 
>> 6) 4.2.2 Proportional Allocation
>> What happens if I have a large number of possible prefixes?  For example
>> the two servers are configured with a /32 and a prefix length of /64?
>> Does the spec need to either limit this in some fashion or supply some
>> guidelines or checks?
> 
> 	I don't see why it would need to say anything about it.  We
> 	are trying to stay out of the configuration space with this
> 	document, for a variety of reasons.

That should be okay, it will simply be up to the product documentation to
point out that having a really large number of prefixes might suck up a lot
of memory.

>> 
>> Are we expecting that the configuration will include the length of the
>> prefixes a server can delegate?
> 
> 	I would expect so, and it could be fixed or it could have maximum
> 	and minimum lengths.
>> 
>> I don't see any option for the rewind optimization to avoid a secondary
>> in communications interrupted from running out of prefixes.
> 
> 	You've lost me here.  If the server isn't in partner-down
> 	then the secondary *could* run out of prefixes to delegate,
> 	but I'd think that is going to be pretty unlikely given the
> 	way people through around DHCPv6 prefixes, but it is possible.

There is a rewind optimization in some versions of the v4 failover protocol. 
 It basically allows a secondary server that is in communications-interrupted
to “rewind” a lease to whatever the primary last knew.  So if the lease was
in backup the secondary could hand the lease out, then it could expire and
the secondary could reuse it.  This assumes that the secondary is in
communications-interrupted for a while.  

The v6 failover seems to be going more towards the auto-partner down style
where a secondary will move to partner-down more quickly, so this may not
be a useful optimization.  After thinking about it more, the right thing is probably
to leave the rewind optimization out and see how the protocol goes and how
it gets used.  If it becomes useful in the future it can be added then and we can
avoid extra complications that might not be necessary now.

>> 
>> 10) 5.4.20 OPTION_F_STATE_EXPIRATION_TIME
>> I believe this is the actual lifetime from 4.4?  If so it would be useful
>> to make that clear - either by changing one of the names to the other
>> or simply adding some text.
> 
> 	This comes up again.  Actually, there are some states other
> 	than ACTIVE that can have a lifetime.  We have found over
> 	the years that ABANDONED also needs to have a lifetime, though
> 	we didn't start out thinking so.  Experience in the real world
> 	has led us down that path, and expired also has a lifetime in
> 	our world.  As these might not have a lifetime in some implementations,
> 	I'm thinking about how to allow sufficient flexibility for
> 	implementations while still having a standard failover protocol.
> 	I don't have a good answer to this as yet.

Yes I can see that abandoned at least might have a lifetime.  By expired
having a lifetime do you mean that the previous lease information (for
example client id) is still associated with the address?

I haven’t thought about what it might mean for the protocol to handle
that but adding a bit of text to 5.4.20 would to explain that  bit would
probably be useful.

>> 
>> 11) 6.1.1 Sending a CONNECT message
>> From the description of Failover Endpoint (page 5) it appears that one
>> server can be in failover relationships with multiple other servers and
>> one server could even be in multiple relationships with one other server.
>> The text in 6.2 seems to reinforce this as an option.  If this is true then
>> doesn't the CONNECT message need to contain information to specify which
>> relationship this CONNECT message is for?  (Note: I'm not arguing for
>> having multiple endpoints on the same pair of servers, I am wondering
>> how one of them will determine which of those endpoints the other is
>> trying to reach.)
> 
> 	What a great catch!  Thanks!  I used to think multiple
> 	relationships were a great way to let people do whatever they
> 	wanted, but experience has shown that you get headaches and
> 	ultimately, less availability, when you allow more than one
> 	relationship per server.  At least with our v4 failover
> 	customers.  We moved away from multiple relationships years
> 	ago and have never looked back, so I don't typically think
> 	about multiple relationships.  And here is where I really left
> 	out the key to having multiple relationships -- some way to
> 	define them.  I'll fix this.  Wow, I can't believe I missed
> 	this!

I understand that some of our customers and the customers of our customers do 
use the multiple relationship feature.  Unfortunately I have no way of knowing how 
many (or what percentage) of our users do so, so I can’t really say how required it
might be.

>> 
>> 12) 6.1.2 Receiving a CONNECT message
>> It sounds like the the two servers can use different versions of the protocol,
>> receive time, and unacked bind updates.  The latter two seem okay but the
>> text may want to mention that as a possibility.  The first one seems like
>> it might be difficult to deal with.
> 
> 	I was planning to leave the version negotiation until we had
> 	a second version.  Once there was a second version, then we could
> 	see if it would make sense to work cross-version or what.  But
> 	as long as we *have* a version, it should be possible to defer
> 	that decision.  But I'm open to discussion on that.

You’re probably correct that we can defer the decision.  I think we would
probably want to be somewhat compatible so version X and version X+1 
can run for long enough to sync at least but that’s a discussion for later.
(At least some of our users upgrade the product versions by stopping one
server and restarting it with the new version and then doing the same for
the second.)
>> 
>> 14) 6.6 Unreachability detection
>> Does the document need to specify the messages are on this connection?
>> "...any message is transmitted on this connection...".
> 
> 	I don't see this as necessary.  If you care, could you motivate
> 	this a bit more?

The way the text is currently written the requirement that the message be on
this connection is not explicit.  It could be read that “every time any message 
is transmitted” means any message on any connection.  From the context this
doesn’t make much sense but adding the extra words makes it clear.  I don’t
feel strongly about this, I’m just trying to remove any items that may cause
people to trip.

>> 
>> 
>> 22) 7.6 Sending Binding Acks
>> It occurs to me that this text and the previous text about IA_NAs and
>> IA_PDs doesn't say if multiple options for a single client  MUST or SHOULD
>> be in a single OPTION_CLIENT_DATA or if the server is free to put them
>> in multiple OPTION_CLIENT_DATA options (or even in separate BNDUPDs).
>> I'm not sure if the protocol should require any behavior but it might
>> be useful to at least point out the possibility.
> 
> 	I'm going to defer a definitive response here.  I think I need to
> 	do a little research on the bulk leasequery draft and see where
> 	it goes with this.  But I will say *something* more in this area.

That’s fine, I’m not sure what I think the “correct answer” is either.

>> 
>> 23) 7.7 Receiving Binding Acks
>> Is there any processing for receving a NAK?  or should there be
>> text saying if a NAK is recieved don't do anything?
> 
> 	Interesting question.  I will have to think on this too.  I
> 	don't see anything in the draft that we do on a NAK, but I
> 	will give it some more thought.  In either case, I'll add text
> 	with the answer to your question!
>> 
>> I also don't understand why the partner-lifetime is set to 0.  Given
>> the acked-partner-lifetime is set to it, retaining partner-lifetime doesn't
>> seem necessary but it doesn't seem necessary to 0 it either.  Is this
>> to provide a way to know if / when the partner will need to be updated if
>> it isn't 0?
> 
> 	In some cases, partner-lifetime is used as a flag that you need
> 	to send something to the partner.  Once you have done that and
> 	they got it, it makes sense to clear it.

Perhaps a brief (1 line at most) note?

>> 
>> 25) 8.1 State Machine Operation
>> The last paragraph before the diagram mentions "+" and "-".  It would
>> be best to specify what each of them mean (I assume "-" means no
>> communication and "+" means communication.  I would also add a third
>> to mean both.  Some of the boxes include "-|+" while others simply
>> use "+-", having a single character would probably be clearer.
> 
> 	Maybe "*”?

that would work for me.  
> 
>> 
>> 27) In Step 6 (8.3.2)
>> Should the server proceed to transition from PREVIOUS-STATE to the
>> next state due to communications interrupted?  Or perhaps the text
>> from step 2 should be moved to here?  I don't see what having step
>> 2 is doing.  In step 5 there is a mention of taking the communications
>> OK transition, so isn't step 6 the only other case?
> 
> 	Hmmm.  I think the point of this discussion is that step
> 	2 sets the value of PREVIOUS-STATE, but the transition to
> 	that state is in (well, could be in) step 6.

I think I was mis-reading step 2 & 5
> 
> 
> 	Kim

Shawn