Re: [dhcwg] Mirja Kühlewind's No Objection on draft-ietf-dhc-dhcpv6-failover-protocol-04: (with COMMENT)

kkinnear <kkinnear@cisco.com> Thu, 02 February 2017 16:26 UTC

Return-Path: <kkinnear@cisco.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 12DB812969F; Thu, 2 Feb 2017 08:26:00 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.721
X-Spam-Level:
X-Spam-Status: No, score=-17.721 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-3.199, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iqY4C9St0JmP; Thu, 2 Feb 2017 08:25:58 -0800 (PST)
Received: from alln-iport-8.cisco.com (alln-iport-8.cisco.com [173.37.142.95]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7F80412943F; Thu, 2 Feb 2017 08:25:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=4938; q=dns/txt; s=iport; t=1486052757; x=1487262357; h=mime-version:subject:from:in-reply-to:date:cc: content-transfer-encoding:message-id:references:to; bh=aOtgjqBr5VLX0CINbwZLtgV9/lMCkFoMdgwAtN6DETQ=; b=hbEb8MbQE0V3A8idaL4+OrFsxEsf+L9+dRl0rGdFfd1+T+ob0BM94Buc gNf/upw/nljEg94xvYMMfrMJ0DB0KaZOKQBS6ysQFGJFfygSO0KLKCrXd ZHuFnkdhXcvTXwd7HrkRjCBt9aJRG3/42X1KDprd8qHbWqGHlwVkP649v g=;
X-IronPort-AV: E=Sophos;i="5.33,325,1477958400"; d="scan'208";a="380435301"
Received: from rcdn-core-4.cisco.com ([173.37.93.155]) by alln-iport-8.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Feb 2017 16:25:56 +0000
Received: from [161.44.67.129] ([161.44.67.129]) (authenticated bits=0) by rcdn-core-4.cisco.com (8.14.5/8.14.5) with ESMTP id v12GPtHa023833 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 2 Feb 2017 16:25:56 GMT
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: kkinnear <kkinnear@cisco.com>
In-Reply-To: <148604665210.13944.15621944233917718081.idtracker@ietfa.amsl.com>
Date: Thu, 2 Feb 2017 11:25:55 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <B494EACB-5786-4A87-8E4C-1C0C93D29284@cisco.com>
References: <148604665210.13944.15621944233917718081.idtracker@ietfa.amsl.com>
To: Mirja Kuehlewind <ietf@kuehlewind.net>
X-Mailer: Apple Mail (2.3124)
X-Authenticated-User: kkinnear@cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/dhcwg/yXwZu0f0Sipl2BYPcEN5_mfVTao>
Cc: Bernie Volz <volz@cisco.com>, dhc-chairs@ietf.org, The IESG <iesg@ietf.org>, draft-ietf-dhc-dhcpv6-failover-protocol@ietf.org, dhcwg@ietf.org, Kim Kinnear <kkinnear@cisco.com>
Subject: Re: [dhcwg] =?utf-8?q?Mirja_K=C3=BChlewind=27s_No_Objection_on_draft-?= =?utf-8?q?ietf-dhc-dhcpv6-failover-protocol-04=3A_=28with_COMMENT=29?=
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dhcwg/>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Feb 2017 16:26:00 -0000

Mirja,

Thanks for your review.

I'll respond to your questions directly, indented, below...

> On Feb 2, 2017, at 9:44 AM, Mirja Kuehlewind <ietf@kuehlewind.net>; wrote:
> 
> Mirja Kühlewind has entered the following ballot position for
> draft-ietf-dhc-dhcpv6-failover-protocol-04: No Objection
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-dhc-dhcpv6-failover-protocol/
> 
> 
> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> A few questions that are not fully clear to me and maybe need some
> additional explanation in the draft (or maybe it's just me...):
> 
> - It's not fully clear to me when a TCP connection is opened or closed.
> Are the two servers supposed to have one long-lived connection? And if
> that connection is terminated for any reason, should the primary server
> try to re-open immediately? And if a (new) connection is (re-)open do I
> always need to send a CONNECT first, or only if I didn't have any
> connection with this server before? And if the secondary server goes down
> and comes up in RECOVER state (sec 8.5.1.), should it open a TCP
> connection to the primary server, or will always the primary server be
> the one that opens the connection (and if so when will it do it)?


	I would have thought that this was spelled out, but as I look
	at the draft I see that it isn't all that explicit.  While I
	could answer your questions directly, let me instead offer a
	new paragraph which tries to answer these questions for not
	just you but other readers of the document:


6.  Connection Management

   Communication between failover partners takes place over a
   long-lived TCP connection.  This connection is always initiated by
   the primary server, and if the long-lived connection is lost it is
   the responsibility of the primary server to attempt to reconnect to
   the secondary server.  The detailed process used by the primary
   server when initiating a connection and by the secondary server
   when responding to a connection attempt documented in
   Section 6.1 is followed each time a connection is established,
   regardless of any previous connection between the failover
   partners.

6.1.  Creating Connections


> 
> - Also not really clear to me is why OPTION_F_MAX_UNACKED_BNDUPD  is
> needed and how the server should know the right value. I guess you would
> want to calculate this based on the send buffer, however, not all message
> have the same size and as such I don't know how to calculate that. And is
> that really needed? If messages will not be accepted by the receiver-side
> server, the receive window will be zero and the socket on the sending
> side will be blocked; no additional message can be send. What will be
> different if the sender knows in advance when it could potentially happen
> (but also might not if the other end processes the messages quickly and
> there is no excessive loss).
> 


	The intent here is to keep the TCP connection unblocked, so
	that information can flow in both directions.  If one
	direction is is maxed out, it shouldn't keep information from
	flowing in the other direction.  At a TCP level it won't, but
	at an application level it will.  Much of the failover
	information flow involves one server sending a BNDUPD and then
	the partner sends a BNDREPLY.  If one server server sends more
	BNDUPD's than the other server can absorb, the TCP connection
	will block.  This will mean that any BNDREPLY's from the
	server that sent the BNDUPD's will also be blocked.  Ideally,
	the BNDUPD->BNDREPLY flow from each server to the other would
	be independent, and the OPTION_F_MAX_UNACKED_BNDUPD count is
	designed to help that be true.

	Additionally, there are messages other than BNDUPD/BNDREPLY
	(e.g. STATE, DISCONNECT, UPDDONE) that are important to
	transmit from one server to the other and not have backed up
	behind a blocked TCP connection that has been overloaded with
	BNDUPD's for the partner to process.

	We could have created a separate TCP connection for these
	control messages, but the overhead of doing that (and
	specifying that) was great enough that it seemed like using
	the application-level flow control of the
	OPTION_F_MAX_UNACKED_BNDUPD was a good tradeoff.


Thanks again for your review!

Kim