Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-requirements-18: (with DISCUSS and COMMENT)

"Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net> Thu, 21 February 2019 13:46 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D9DE130E67; Thu, 21 Feb 2019 05:46:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HuGseU2IDGXX; Thu, 21 Feb 2019 05:46:12 -0800 (PST)
Received: from wp513.webpack.hosteurope.de (wp513.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8223::]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1A7881279E6; Thu, 21 Feb 2019 05:46:12 -0800 (PST)
Received: from 200116b82cde9500947f70fc4af24b59.dip.versatel-1u1.de ([2001:16b8:2cde:9500:947f:70fc:4af2:4b59]); authenticated by wp513.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) id 1gwofw-0008E8-16; Thu, 21 Feb 2019 14:46:08 +0100
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: "Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net>
In-Reply-To: <787AE7BB302AE849A7480A190F8B93302EA232A6@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
Date: Thu, 21 Feb 2019 14:46:06 +0100
Cc: "dots-chairs@ietf.org" <dots-chairs@ietf.org>, "frank.xialiang@huawei.com" <frank.xialiang@huawei.com>, "dots@ietf.org" <dots@ietf.org>, The IESG <iesg@ietf.org>, "draft-ietf-dots-requirements@ietf.org" <draft-ietf-dots-requirements@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <6EFF6377-02A4-4D0B-BCF2-313FDB3B18B8@kuehlewind.net>
References: <155068522853.31498.10686203344983870104.idtracker@ietfa.amsl.com> <787AE7BB302AE849A7480A190F8B93302EA23122@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <66BB8E3D-DEB6-43AC-AAEB-B6EB1A248865@kuehlewind.net> <787AE7BB302AE849A7480A190F8B93302EA232A6@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
To: mohamed.boucadair@orange.com
X-Mailer: Apple Mail (2.3445.9.1)
X-bounce-key: webpack.hosteurope.de;ietf@kuehlewind.net;1550756772;6d406a60;
X-HE-SMSGID: 1gwofw-0008E8-16
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/SVpSWAFIzqtEytz6fBCOxt4jLYQ>
Subject: Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-requirements-18: (with DISCUSS and COMMENT)
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2019 13:46:15 -0000

Hi Med,

please see below.

> Am 21.02.2019 um 13:17 schrieb mohamed.boucadair@orange.com:
> 
>>>> 2) Also on this text in SIG-004:
>>>> "The heartbeat interval during active mitigation could be
>>>>     negotiable, but MUST be frequent enough to maintain any on-path
>>>>     NAT or Firewall bindings during mitigation.  When TCP is used as
>>>>     transport, the DOTS signal channel heartbeat messages need to be
>>>>     frequent enough to maintain the TCP connection state."
>>>> 
>>>> As Joe commented already, different heartbeats at different layers can be
>>>> used
>>>> at the same time for different purposes. You can use heartbeats at the
>>>> application layer to check service availability while e.g. using a higher
>>>> frequent heartbeat at the transport layer to maintain firewall and NAT
>> state.
>>> 
>>> [Med] Please note that the text you quoted is about "during active
>> mitigation". When no attack is ongoing, we do have the following behavior
>> which covers your comment:
>>> 
>>>     When DOTS agents are exchanging heartbeats and no
>>>     mitigation request is active, either agent MAY request changes to
>>>     the heartbeat rate.  For example, a DOTS server might want to
>>>                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>     reduce heartbeat frequency or cease heartbeat exchanges when an
>>>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>     active DOTS client has not requested mitigation, in order to
>>>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>     control load.
>>> 
>>>> The advantage to such an approach is that there is less application layer
>>>> overhead/load e.g. in scenarios where it might be expensive to wake up the
>>>> application or a server is already highly loaded. Also note that the
>> time-
>>>> outs
>>>> values of NATs and firewalls on the path are usually unknown, therefore an
>>>> application can never rely on heartbeats (no matter at which level) and
>> must
>>>> be
>>>> prepared to try to reconnect on the application layer if the connection
>>>> fails.
>>>> Usually, the main reason for using heartbeats to maintain NAT or firewall
>>>> state
>>>> (vs. reconnect every time) in TCP is if the application is time-sensitive
>> and
>>>> a
>>>> full TCP handshake takes too long for the desired service. I'm not sure
>> that
>>>> the case for DOTS, however, I understand it may be beneficial to have
>>>> established state if an attack is on-going.
>>> 
>>> [Med] This is important to avoid new handshakes when the client has to
>> request a mitigation.
>> 
>> This is okay but could be spelled out more explicitly as a requirement,
>> rather than taking about the details of sending heartbeats.
>>> 
>>>> 
>>>> For UDP I guess it's more complicated in your case. Time-outs are usually
>>>> very
>>>> short, however, state is created with the first packet of a flow (as there
>> is
>>>> no handshake in UDP). As you don't see blocking if state is expired as new
>>>> state is created immediately, it's kind of impossible to measure the
>>>> configured
>>>> time-out values. Only if the firewall is under attack it would start
>> blocking
>>>> UDP traffic that is has no state for yet. So I understand why it is
>> desirable
>>>> to maintain UDP state for you, however, I don't understand how you can
>> know
>>>> that your frequency is high enough to actually keep the state open. Note
>> that
>>>> TCP time-outs are usually in the order of hours, while UDP time-outs are
>>>> usually in range of tens of seconds, and might expire even quicker if a
>>>> system
>>>> is under attack. If that is a scenario that is important for you, and
>>>> assuming
>>>> that not all time-outs values on the path can be known, I guess it would
>> be
>>>> recommendable to use TCP instead.
>>>> 
>>>> In any case this can not be a MUST requirement (as timers are usually not
>>>> known). I would recommend to state something like:
>>>> 
>>>> "MAY be frequent enough to maintain NAT or firewall state, if timer values
>>>> are
>>>> known, or if TCP is used, SHOULD use in addition TCP heartbeats  to
>> maintain
>>>> the TCP connection state and reconnect immediately if a failure is
>> detected."
>>>> 
>>> 
>>> [Med] The original wording is accurate and reflects the requirement of the
>> WG. How this will be enforced is part of the solution/specification space.
>> 
>> My hold point here is that
>> 
>> "MUST be frequent enough to maintain any on-path NAT or Firewall bindings
>> during mitigation.“
>> 
>> cannot be a MUST requirement as the network time-out values are not known by
>> the endpoints. Therefore it is impossible to fulfill this requirement.
> 
> [Med] Two comments here: 
> * The requirement can be fulfilled by relying RFC8085 recommendations. This is discussed in the spec documents. 

RFC8085 provide recommended value and limits, however, this does not guarantee that the proposed values actually match the time-out values as deployed on the path.

> * there are deployments in which timers can be discovered (e.g., PCP (RFC6887)).

This does not work in all cases and the draft does not seem to require the usage of anything like this. If a requirement is that the timeout values MUST be known in the deployed scenario, then that should be spelled out, however, I assume that is not your intention because that would limit deployment heavily.

Mirja