Re: [dhcwg] Mirja Kühlewind's No Objection on draft-ietf-dhc-dhcpv6-failover-protocol-04: (with COMMENT)

kkinnear <kkinnear@cisco.com> Thu, 02 February 2017 17:43 UTC

Return-Path: <kkinnear@cisco.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 74950129958; Thu, 2 Feb 2017 09:43:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.721
X-Spam-Level:
X-Spam-Status: No, score=-17.721 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-3.199, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kva88jT21-r7; Thu, 2 Feb 2017 09:43:50 -0800 (PST)
Received: from rcdn-iport-8.cisco.com (rcdn-iport-8.cisco.com [173.37.86.79]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1D9EE12994F; Thu, 2 Feb 2017 09:43:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=5236; q=dns/txt; s=iport; t=1486057426; x=1487267026; h=mime-version:subject:from:in-reply-to:date:cc: content-transfer-encoding:message-id:references:to; bh=VDcxvvoTKDQlcqHSKbOSllVY3hMyXunmyTTMj885grU=; b=EjATLC8EdbW5MIN7nF3y5P7PaxQjpfO3yCV3GTV440HZJKhVYNqVLjxH QdDPZDwzVveSiDIiXeEiQ9oO7ZkvPGh2lHQqA1CQnvnGyYkuJt8CeNxlo fLSk9F5yQ0dnbHPeOS6+DIwyIZDz0iAbZxlo6UtOLkDrZODFIgvsQiJwi s=;
X-IronPort-AV: E=Sophos;i="5.33,325,1477958400"; d="scan'208";a="202001882"
Received: from alln-core-1.cisco.com ([173.36.13.131]) by rcdn-iport-8.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 02 Feb 2017 17:43:45 +0000
Received: from [161.44.67.129] ([161.44.67.129]) (authenticated bits=0) by alln-core-1.cisco.com (8.14.5/8.14.5) with ESMTP id v12HhiY7031624 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 2 Feb 2017 17:43:45 GMT
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: kkinnear <kkinnear@cisco.com>
In-Reply-To: <35D633DF-B240-4CB9-A1DC-A27FD1048213@kuehlewind.net>
Date: Thu, 2 Feb 2017 12:43:44 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <0EB0E49E-B201-441B-834B-47A050EDA757@cisco.com>
References: <148604665210.13944.15621944233917718081.idtracker@ietfa.amsl.com> <B494EACB-5786-4A87-8E4C-1C0C93D29284@cisco.com> <35D633DF-B240-4CB9-A1DC-A27FD1048213@kuehlewind.net>
To: "Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net>
X-Mailer: Apple Mail (2.3124)
X-Authenticated-User: kkinnear@cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/dhcwg/ln10Pqn6bwgfzVlaDx-ViDEBlac>
Cc: Bernie Volz <volz@cisco.com>, dhc-chairs@ietf.org, The IESG <iesg@ietf.org>, draft-ietf-dhc-dhcpv6-failover-protocol@ietf.org, dhcwg@ietf.org, Kim Kinnear <kkinnear@cisco.com>
Subject: Re: [dhcwg] =?utf-8?q?Mirja_K=C3=BChlewind=27s_No_Objection_on_draft-?= =?utf-8?q?ietf-dhc-dhcpv6-failover-protocol-04=3A_=28with_COMMENT=29?=
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dhcwg/>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Feb 2017 17:43:51 -0000

Mirja,

More comments, below...


> On Feb 2, 2017, at 12:07 PM, Mirja Kuehlewind (IETF) <ietf@kuehlewind.net>; wrote:
> 
> [... removed already handled issue -- Kim]
> 
>> 6.1.  Creating Connections
>> 
>> 
>>> 
>>> - Also not really clear to me is why OPTION_F_MAX_UNACKED_BNDUPD  is
>>> needed and how the server should know the right value. I guess you would
>>> want to calculate this based on the send buffer, however, not all message
>>> have the same size and as such I don't know how to calculate that. And is
>>> that really needed? If messages will not be accepted by the receiver-side
>>> server, the receive window will be zero and the socket on the sending
>>> side will be blocked; no additional message can be send. What will be
>>> different if the sender knows in advance when it could potentially happen
>>> (but also might not if the other end processes the messages quickly and
>>> there is no excessive loss).
>>> 
>> 
>> 
>> 	The intent here is to keep the TCP connection unblocked, so
>> 	that information can flow in both directions.  If one
>> 	direction is is maxed out, it shouldn't keep information from
>> 	flowing in the other direction.  At a TCP level it won't, but
>> 	at an application level it will.  Much of the failover
>> 	information flow involves one server sending a BNDUPD and then
>> 	the partner sends a BNDREPLY.  If one server server sends more
>> 	BNDUPD's than the other server can absorb, the TCP connection
>> 	will block.  This will mean that any BNDREPLY's from the
>> 	server that sent the BNDUPD's will also be blocked.  Ideally,
>> 	the BNDUPD->BNDREPLY flow from each server to the other would
>> 	be independent, and the OPTION_F_MAX_UNACKED_BNDUPD count is
>> 	designed to help that be true.
> 
> So you mean this is purely an application parameter saying I will not process more than X messages at once (before sending out a BNDREPLY). So this is rather independent of any socket buffer configuration, expect that the buffer needs to be large enough to at least handle X (max-size) messages which maybe is a good thing to notice as well.

	This is an application parameter saying that I can accept up
	to X messages at once without blocking the TCP connection.
	That isn't in conflict with what you said, but is focused a
	bit differently.  It is independent of any socket buffer
	configuration -- this is application level flow control.

> 
> However, this basically means that you at sender-side anyway need a way to cache BNDUPD message that you are not allowed to send out yet. Why don’t you just basically set this value implicitly always to 1 and say you can’t send another BNDUPD if an BNDREPLY is still outstanding…? I would guess it’s anyway rather unlikely that you need to send more than one message at once, no?

	Servers frequently need send far more than one BNDUPD at once.
	The most extreme typical case is when one server is updating a
	partner which has been down with information about what has
	been happening while the partner was down.  This will generate
	thousands to tens of thousands of BNDUPD's.  When one server
	has lost its stable storage completely and needs to
	essentially be initialized by the other server, millions of
	BNDUPD's may need to flow across the link.

	Doing them one at a time, while technically correct, typically
	leaves a lot of performance on the table and could easily
	extend the time before the servers synchronize from seconds to
	tens of minutes (and possibly hours).  Many DHCP servers are
	multi-threaded and can process multiple BNDUPD's at the same
	time (though they may batch up the writes to the disk).  Thus,
	we would expect that most servers implementing this protocol
	would set this value to something substantial.

> 
>> 
>> 	Additionally, there are messages other than BNDUPD/BNDREPLY
>> 	(e.g. STATE, DISCONNECT, UPDDONE) that are important to
>> 	transmit from one server to the other and not have backed up
>> 	behind a blocked TCP connection that has been overloaded with
>> 	BNDUPD's for the partner to process.
>> 
>> 	We could have created a separate TCP connection for these
>> 	control messages, but the overhead of doing that (and
>> 	specifying that) was great enough that it seemed like using
>> 	the application-level flow control of the
>> 	OPTION_F_MAX_UNACKED_BNDUPD was a good tradeoff.
> 
> I would actually say that the overhead is rather low. Maybe one should discuss this option at least as one potential implementation possibility. The only hard requirement is that the receiver side would be able to process message coming from different connections from the same endpoint, which I assume would be easy given you anyway have to handle different connections from different endpoints, no?

	Having different implementation possibilities in something as
	basic as connection management in a protocol already this
	complex is something we have tried hard to avoid, and we could
	only justify it if it were necessary to solve a very pressing
	problem.

Thanks -- Kim