Re: [dhcwg] Mirja Kühlewind's No Objection on draft-ietf-dhc-dhcpv6-failover-protocol-04: (with COMMENT)

"Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net> Mon, 27 February 2017 19:16 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E6CD3126BF7 for <dhcwg@ietfa.amsl.com>; Mon, 27 Feb 2017 11:16:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hxxJ2Y0qpUM7 for <dhcwg@ietfa.amsl.com>; Mon, 27 Feb 2017 11:16:23 -0800 (PST)
Received: from kuehlewind.net (kuehlewind.net [83.169.45.111]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 446BB12956F for <dhcwg@ietf.org>; Mon, 27 Feb 2017 11:16:23 -0800 (PST)
Received: (qmail 32342 invoked from network); 27 Feb 2017 20:09:40 +0100
Received: from pd9e116bb.dip0.t-ipconnect.de (HELO ?192.168.178.33?) (217.225.22.187) by kuehlewind.net with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 27 Feb 2017 20:09:40 +0100
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\))
From: "Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net>
In-Reply-To: <0EB0E49E-B201-441B-834B-47A050EDA757@cisco.com>
Date: Mon, 27 Feb 2017 20:09:39 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <55F8B4D9-9061-46AB-8D18-197BC834B4F3@kuehlewind.net>
References: <148604665210.13944.15621944233917718081.idtracker@ietfa.amsl.com> <B494EACB-5786-4A87-8E4C-1C0C93D29284@cisco.com> <35D633DF-B240-4CB9-A1DC-A27FD1048213@kuehlewind.net> <0EB0E49E-B201-441B-834B-47A050EDA757@cisco.com>
To: kkinnear <kkinnear@cisco.com>
X-Mailer: Apple Mail (2.3259)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dhcwg/FRhIvCXiQ6ntDO_JnuT8bwXDeS0>
Cc: dhc-chairs@ietf.org, dhcwg@ietf.org, Bernie Volz <volz@cisco.com>, draft-ietf-dhc-dhcpv6-failover-protocol@ietf.org, The IESG <iesg@ietf.org>
Subject: Re: [dhcwg] Mirja Kühlewind's No Objection on draft-ietf-dhc-dhcpv6-failover-protocol-04: (with COMMENT)
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dhcwg/>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Feb 2017 19:16:25 -0000

Hi Kim,

sorry for my late reply. Thanks for the explanation. Makes sense to me. I think slightly more explanation in the draft could be good to make clear that the TCP blocking itself is not the problem but that one kind of application layer message will block another kind which can lead to total blocking given only one TCP connection is used for all kind of message to reduce complexity in connection management. Because currently it reads a little like if there is a problem with TCP which is not really the case.

Thanks!
Mirja


> Am 02.02.2017 um 18:43 schrieb kkinnear <kkinnear@cisco.com>:
> 
> Mirja,
> 
> More comments, below...
> 
> 
>> On Feb 2, 2017, at 12:07 PM, Mirja Kuehlewind (IETF) <ietf@kuehlewind.net> wrote:
>> 
>> [... removed already handled issue -- Kim]
>> 
>>> 6.1.  Creating Connections
>>> 
>>> 
>>>> 
>>>> - Also not really clear to me is why OPTION_F_MAX_UNACKED_BNDUPD  is
>>>> needed and how the server should know the right value. I guess you would
>>>> want to calculate this based on the send buffer, however, not all message
>>>> have the same size and as such I don't know how to calculate that. And is
>>>> that really needed? If messages will not be accepted by the receiver-side
>>>> server, the receive window will be zero and the socket on the sending
>>>> side will be blocked; no additional message can be send. What will be
>>>> different if the sender knows in advance when it could potentially happen
>>>> (but also might not if the other end processes the messages quickly and
>>>> there is no excessive loss).
>>>> 
>>> 
>>> 
>>> 	The intent here is to keep the TCP connection unblocked, so
>>> 	that information can flow in both directions.  If one
>>> 	direction is is maxed out, it shouldn't keep information from
>>> 	flowing in the other direction.  At a TCP level it won't, but
>>> 	at an application level it will.  Much of the failover
>>> 	information flow involves one server sending a BNDUPD and then
>>> 	the partner sends a BNDREPLY.  If one server server sends more
>>> 	BNDUPD's than the other server can absorb, the TCP connection
>>> 	will block.  This will mean that any BNDREPLY's from the
>>> 	server that sent the BNDUPD's will also be blocked.  Ideally,
>>> 	the BNDUPD->BNDREPLY flow from each server to the other would
>>> 	be independent, and the OPTION_F_MAX_UNACKED_BNDUPD count is
>>> 	designed to help that be true.
>> 
>> So you mean this is purely an application parameter saying I will not process more than X messages at once (before sending out a BNDREPLY). So this is rather independent of any socket buffer configuration, expect that the buffer needs to be large enough to at least handle X (max-size) messages which maybe is a good thing to notice as well.
> 
> 	This is an application parameter saying that I can accept up
> 	to X messages at once without blocking the TCP connection.
> 	That isn't in conflict with what you said, but is focused a
> 	bit differently.  It is independent of any socket buffer
> 	configuration -- this is application level flow control.
> 
>> 
>> However, this basically means that you at sender-side anyway need a way to cache BNDUPD message that you are not allowed to send out yet. Why don’t you just basically set this value implicitly always to 1 and say you can’t send another BNDUPD if an BNDREPLY is still outstanding…? I would guess it’s anyway rather unlikely that you need to send more than one message at once, no?
> 
> 	Servers frequently need send far more than one BNDUPD at once.
> 	The most extreme typical case is when one server is updating a
> 	partner which has been down with information about what has
> 	been happening while the partner was down.  This will generate
> 	thousands to tens of thousands of BNDUPD's.  When one server
> 	has lost its stable storage completely and needs to
> 	essentially be initialized by the other server, millions of
> 	BNDUPD's may need to flow across the link.
> 
> 	Doing them one at a time, while technically correct, typically
> 	leaves a lot of performance on the table and could easily
> 	extend the time before the servers synchronize from seconds to
> 	tens of minutes (and possibly hours).  Many DHCP servers are
> 	multi-threaded and can process multiple BNDUPD's at the same
> 	time (though they may batch up the writes to the disk).  Thus,
> 	we would expect that most servers implementing this protocol
> 	would set this value to something substantial.
> 
>> 
>>> 
>>> 	Additionally, there are messages other than BNDUPD/BNDREPLY
>>> 	(e.g. STATE, DISCONNECT, UPDDONE) that are important to
>>> 	transmit from one server to the other and not have backed up
>>> 	behind a blocked TCP connection that has been overloaded with
>>> 	BNDUPD's for the partner to process.
>>> 
>>> 	We could have created a separate TCP connection for these
>>> 	control messages, but the overhead of doing that (and
>>> 	specifying that) was great enough that it seemed like using
>>> 	the application-level flow control of the
>>> 	OPTION_F_MAX_UNACKED_BNDUPD was a good tradeoff.
>> 
>> I would actually say that the overhead is rather low. Maybe one should discuss this option at least as one potential implementation possibility. The only hard requirement is that the receiver side would be able to process message coming from different connections from the same endpoint, which I assume would be easy given you anyway have to handle different connections from different endpoints, no?
> 
> 	Having different implementation possibilities in something as
> 	basic as connection management in a protocol already this
> 	complex is something we have tried hard to avoid, and we could
> 	only justify it if it were necessary to solve a very pressing
> 	problem.
> 
> Thanks -- Kim
> 
>