[dhcwg] Changes to create draft-ietf-dhc-failover-12.txt

Kim Kinnear <kkinnear@cisco.com> Mon, 03 March 2003 19:22 UTC

Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA08587; Mon, 3 Mar 2003 14:22:48 -0500 (EST)
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h23JWjp18125; Mon, 3 Mar 2003 14:32:45 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h23JVPp18079 for <dhcwg@optimus.ietf.org>; Mon, 3 Mar 2003 14:31:25 -0500
Received: from rtp-core-1.cisco.com (rtp-core-1.cisco.com [64.102.124.12]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id OAA08535 for <dhcwg@ietf.org>; Mon, 3 Mar 2003 14:20:48 -0500 (EST)
Received: from goblet.cisco.com (IDENT:mirapoint@goblet.cisco.com [161.44.168.80]) by rtp-core-1.cisco.com (8.12.6/8.12.6) with ESMTP id h23JMIJR009596; Mon, 3 Mar 2003 14:22:18 -0500 (EST)
Received: from KKINNEAR-W2K.cisco.com (dhcp-161-44-149-192.cisco.com [161.44.149.192]) by goblet.cisco.com (Mirapoint) with ESMTP id ACS57566; Mon, 3 Mar 2003 14:22:17 -0500 (EST)
Message-Id: <4.3.2.7.2.20030303135757.0255f9b0@goblet.cisco.com>
X-Sender: kkinnear@goblet.cisco.com
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Mon, 03 Mar 2003 14:22:16 -0500
To: dhcwg@ietf.org
From: Kim Kinnear <kkinnear@cisco.com>
Cc: kkinnear@cisco.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Subject: [dhcwg] Changes to create draft-ietf-dhc-failover-12.txt
Sender: dhcwg-admin@ietf.org
Errors-To: dhcwg-admin@ietf.org
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Id: <dhcwg.ietf.org>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>

Folks,

Here is mail detailing the changes made to
draft-ietf-dhc-failover-11.txt to yield
draft-ietf-dhc-failover-12.txt.  In most cases considerable
changes were made to the indicated sections, so that it was not
possible to say "this word changed to that word", but rather the
entire section needs to be re-read to grasp the essence of the
change.

-------------------------------------------------------------------

During the last IETF in Atlanta, a discussion was held with several
folks about problems in the DHCP failover protocol and its description.
In attendance were:

Scanner Luce	scanner@nominum.com
Bernie Volz	Bernie.Volz@am1.ericsson.se
Mark Stapp	mjs@cisco.com
Kim Kinnear	kkinnear@cisco.com

We primarily discussed issues encountered by Scanner during his
implementation of the failover protocol, and first raised at a
meeting with several people during the Summer 2002 IETF.

Thanks to Bernie Volz for comments and additions to an earlier
version of these notes.  Some of his additions I've simply placed
into the text, but I've left one comment explicit since we may
want to discuss it further.

The action plan for failover at this point is:

 (x)	a. Circulate these notes to the DHCP list. 

 (x)	b. Accept comments.

 (x)	c. Update the failover draft by the end of February.

-->	c-1. Circulate email concerning changes made.

	d. Consider whether we need another WG last call based
	on these changes.

This email is step C-1.

Changes made to the failover draft:
-----------------------------------

While the discussion ranged over several topics, the action items
boiled down to the following:

1.  Connection establishment changes:

	a.  There MUST be one endpoint failover relationship
	(i.e., between two servers).

	b.  There SHOULD be one relationship per partner, but
	this is not a requirement.

	We determined there was little value in having the same
	two servers be involved in two relationships (in
	general).  So, having one server be primary for some
	pools and secondary for others where the partner has
	opposite roles is really not necessary and makes little
	sense.  Especially now that load balancing exists and the
	primary and secondary are almost equal (the primary just
	breaks ties).

	c.  There SHOULD be only one port in use for failover
	traffic.

	d.  The TCP connection from the secondary server to the
	primary server is dropped by the secondary server
      or is dropped by the primary server in the event that both
	servers end up connecting at the same time.

	We need a strategy to handle the case where two
	connections are done at the same time (primary ->
	secondary; secondary -> primary).  The role is simply
	that the connection that the primary initiated is the one
	that is kept.  So, the primary drops the connection it
	ACCEPTed from the secondary and the secondary drops the
	connection on which it CONNECTed to the primary.

  Modifications were made to implement these changes to:

  	Definition of failover endpoint.

	Section 5.1.1 Failover endpoints

	Section 8. Connection Management
	
	Section 8.1 Connection granularity

	Section 8.2 Creating the TCP connection

2.  Remove paragraph 4 from Section 7.1.3 (from the BNDUPD
conflict section).  This paragraph turns out to add as opposed to
remove confusion.

  Modifications were made to implement this change by:

	Removing paragraph 4 from Section 7.1.3.

3.  Consider adding pseudo-code for the MCLT logic (which Scanner
has offered to contribute, since he felt this would be helpful.)
I'll include anything I get in this regard.

  No modifications were made in this case, because I received
  no pseudo-code to add.

4.  Review sequence diagrams for accuracy.

  These were all reviewed, but no changes were made as no problems
  were found.  If someone has problems with these sequence diagrams,
  please send me specific information about the problem, and I'll
  be glad to fix it.  

5.  The failover partners can run in two different modes -- time
sync mode, or time skew mode.  In time sync mode, still send the
time, and the receiving server can be itself be in one of two
modes -- time correction mode (to handle time drift) or time
rejection mode (which will reject packets with a time that is
"too wrong") in them.

Bernie added:

	One point is that whichever mode one is in, one must
	allow some small drift in time when doing time based
	comparisons.  For example, lease expirations may easily
	be off by 1 second just because of the time that elapsed
	between when a packet was sent and received (a very small
	fraction of time must elapse for this to potentially
	happen).  I think that was more the issue that we need to
	make clear - time checks need to be somewhat soft rather
	than absolute?  But I agree that servers may chose to
	require the time to be "in sync" or can chose to do time
	corrections.  

	Note that we have found problems in not accommodating
	time skew; if the servers require the time to be in sync
	and continue running (but refuse to communicate) bad
	things sometimes happen (especially if the servers assume
	partner-down state).  This can happen because step 6 in
	section 9.3.2 allows two actions.  We may want to
	consider exactly what it means for the partner to be
	"down"; for example if the partner is up but a connection
	can not be established because the time-skew it out of
	range, then it may be better to have one server wait
	(typically the one that was down the longest?).

  Modifications were made to implement this change by changing:

	Section 5.10 Time synchronization between servers

6.  The client-last-transaction-time should not be remembered if
the packet is dropped due to the server being in the wrong
failover state to respond to DHCP client packets.  This was
implicit in the previous versions of the draft, but not clearly
stated.

  Modifications were made to implement this change by changing:

	Section 9.2 Server State Transitions

7. Contact information was changed, and some dates were updated
to 2003.

_______________________________________________
dhcwg mailing list
dhcwg@ietf.org
https://www1.ietf.org/mailman/listinfo/dhcwg