Re: [Dots] Improve controllability and predictability of keepalives
<mohamed.boucadair@orange.com> Mon, 18 November 2019 06:36 UTC
Return-Path: <mohamed.boucadair@orange.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 83DEB12089E for <dots@ietfa.amsl.com>; Sun, 17 Nov 2019 22:36:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ETdQEd1w5DjO for <dots@ietfa.amsl.com>; Sun, 17 Nov 2019 22:36:04 -0800 (PST)
Received: from relais-inet.orange.com (relais-inet.orange.com [80.12.66.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D44631200B3 for <dots@ietf.org>; Sun, 17 Nov 2019 22:36:03 -0800 (PST)
Received: from opfedar00.francetelecom.fr (unknown [xx.xx.xx.11]) by opfedar21.francetelecom.fr (ESMTP service) with ESMTP id 47GfNf3b0fz7trt; Mon, 18 Nov 2019 07:36:02 +0100 (CET)
Received: from Exchangemail-eme6.itn.ftgroup (unknown [xx.xx.13.54]) by opfedar00.francetelecom.fr (ESMTP service) with ESMTP id 47GfNf2jLHzCqjj; Mon, 18 Nov 2019 07:36:02 +0100 (CET)
Received: from OPEXCAUBMA2.corporate.adroot.infra.ftgroup ([fe80::e878:bd0:c89e:5b42]) by OPEXCAUBM7D.corporate.adroot.infra.ftgroup ([::1]) with mapi id 14.03.0468.000; Mon, 18 Nov 2019 07:36:02 +0100
From: mohamed.boucadair@orange.com
To: Carsten Bormann <cabo@tzi.org>
CC: "dots@ietf.org" <dots@ietf.org>
Thread-Topic: Improve controllability and predictability of keepalives
Thread-Index: AdWZOjG0GmH8bbOEScaak/2iE6U5+QDUNU61AFL7o9A=
Date: Mon, 18 Nov 2019 06:36:01 +0000
Message-ID: <787AE7BB302AE849A7480A190F8B9330313D89A8@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
References: <787AE7BB302AE849A7480A190F8B9330313624A5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <18CFEB6A-81B8-44A4-B749-DB1689E0B442@tzi.org> <787AE7BB302AE849A7480A190F8B933031362BF5@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <787AE7BB302AE849A7480A190F8B9330313D3421@OPEXCAUBMA2.corporate.adroot.infra.ftgroup> <8E353427-9380-44CF-B151-63DB0106BFCA@tzi.org>
In-Reply-To: <8E353427-9380-44CF-B151-63DB0106BFCA@tzi.org>
Accept-Language: fr-FR, en-US
Content-Language: fr-FR
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.114.13.245]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/vCLVeQJYuDL78ON85IQHUqfMfuc>
Subject: Re: [Dots] Improve controllability and predictability of keepalives
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2019 06:36:06 -0000
Hi Carsten, Please see inline. Cheers, Med > -----Message d'origine----- > De : Carsten Bormann [mailto:cabo@tzi.org] > Envoyé : samedi 16 novembre 2019 15:34 > À : BOUCADAIR Mohamed TGI/OLN > Cc : dots@ietf.org > Objet : Re: Improve controllability and predictability of keepalives > > Thank you. > I still have a few questions, but I think this looks very good. > > 4.4 says: > Mitigation requests MUST NOT be delayed > because of other congestion control checks. > What might those be? [Med] A too low probing rate for example as indicated in the sentence right after that excerpt. There is a risk that mitigation requests are delayed (or that heartbeats are not sent which may lead to session liveness checks). > (You don’t want to be in a situation where you send a ton of mitigation > requests that are mostly dropped at the tail of a queue.) > [Med] Yes, we don't that. The authoritative rule is as follows: Requests marked by the DOTS client as Non-confirmable messages are sent at regular intervals until a response is received from the DOTS server. If the DOTS client cannot maintain an RTT estimate, it MUST NOT send more than one Non-confirmable request every 3 seconds, and SHOULD use an even less aggressive rate whenever possible (case 2 in Section 3.1.3 of [RFC8085]). > Before Figure 27, it says: > > The PUT request used for DOTS heartbeat MUST NOT have a > 'cuid', 'cdid,' or 'mid' Uri-Path. Such PUT requests MUST NOT be > > relayed by DOTS gateways. > > Why? (And what should the DOTS gateways do instead?) [Med] Session loss detect and recovery is local to an agent and its immediate peer. Heartbeats are thus not relayed. We can update as follows: "This procedure occurs between a DOTS agent and its immediate peer DOTS agent. As such, this GET request MUST NOT be relayed by a DOTS gateway. The PUT request used for DOTS heartbeat MUST NOT have a 'cuid', 'cdid,' or 'mid' Uri-Path." > > Grüße, Carsten > > > > On Nov 14, 2019, at 18:17, mohamed.boucadair@orange.com wrote: > > > > Hi Carsten, all, > > > > As promised, we updated the draft to take into account your inputs. The > candidate version is available at (see section 4.7 in particular): > > > > https://github.com/boucadair/draft-ietf-dots-signal- > channel/blob/master/draft-ietf-dots-signal-channel-39.txt > > > > The main changes are: > > * Use PUT to send heartbeat requests > > * Use 2.04 instead of 2.03 > > * DOTS agents can negotiate a probing-rate > > * Provide some guideline for setting the probing-rate. > > > > Do you have any further comment on the new heartbeat mechanism? Thank > you. > > > > Cheers, > > Med > > > >> -----Message d'origine----- > >> De : Dots [mailto:dots-bounces@ietf.org] De la part de > >> mohamed.boucadair@orange.com > >> Envoyé : mardi 12 novembre 2019 17:24 > >> À : Carsten Bormann > >> Cc : dots@ietf.org > >> Objet : Re: [Dots] Improve controllability and predictability of > keepalives > >> > >> Re-, > >> > >> Please see inline. > >> > >> Cheers, > >> Med > >> > >>> -----Message d'origine----- > >>> De : Carsten Bormann [mailto:cabo@tzi.org] > >>> Envoyé : mardi 12 novembre 2019 14:12 > >>> À : BOUCADAIR Mohamed TGI/OLN > >>> Cc : dots@ietf.org > >>> Objet : Re: Improve controllability and predictability of keepalives > >>> > >>> Hi Med, > >>> > >>> what the text below doesn’t say is what kind of information you want to > >>> derive from the heartbeats. > >> > >> [Med] How to interpret HBs by endpoints is discussed in Section 4.7. > >> > >> The way they currently (draft -39) are > >>> defined, the client uses a GET (*). GET is not supposed to influence > >>> application state, so the server will not learn anything from that > >>> heartbeat. Is the intention that only the client needs to react to > >>> heartbeat failures? > >> > >> [Med] No, the server needs to react to heartbeat failures. The cases > that > >> are discussed in the spec are as follows: > >> > >> * If the DOTS server receives traffic from the peer DOTS client but > >> maximum 'missing-hb- > >> allowed' threshold is reached, the DOTS server MUST NOT consider the > >> DOTS signal channel session disconnected. The DOTS server MUST keep > >> on using the current DOTS signal channel session so that the DOTS > >> client can send mitigation requests over the current DOTS signal > >> channel session. In this case, the DOTS server can identify the DOTS > >> client is under attack and the inbound link to the DOTS client > >> (domain) is saturated. > >> > >> * If the DOTS server does not > >> receive a mitigation request from the DOTS client, it implies the > >> DOTS client has not detected the attack or, if an attack mitigation > >> is in progress, it implies the applied DDoS mitigation actions are > >> not yet effective to handle the DDoS attack volume > >> > >> * If the DOTS server does not receive any traffic from the peer DOTS > >> client during the time span required to exhaust the maximum 'missing- > >> hb-allowed' threshold, the DOTS server concludes the session is > >> disconnected. The DOTS server can then trigger pre-configured > >> mitigation requests for this DOTS client (if any). > >> > >>> > >>> RFC 7252 defines PROBING_RATE as 1 B/s. If you get a response within > the > >>> heartbeat interval to the non-confirmable requests, that is not > relevant. > >>> If you don’t, your heartbeat interval "MUST be chosen in > >>> such a way that an endpoint does not exceed an average data rate of > >>> PROBING_RATE in sending to another endpoint that does not respond. > >>> If your interval is intended to be 15 s, that would mean your requests > >> must > >>> be ≤ 15 B, or you need to define PROBING_RATE differently for your > >>> application. > >>> It seems right now you are not trying to be particularly frugal with > the > >>> heartbeat message, which is probably OK since most of your networks > will > >> be > >>> Ethernet and that will expand the frame size to 64 B anyway. But that > >>> means that you need to define PROBING_RATE to be ~ 5 B/s if you don’t > >> want > >>> to be slowed down in probing. > >> > >> [Med] Good point. Will update accordingly. > >> > >>> > >>> Grüße, Carsten > >>> > >>> (*) And expects a 2.03, which would mean that the server confirms the > >> ETag > >>> given in the request. But there is no ETag in that request, and I > really > >>> don’t see why a 2.05 with an empty payload wouldn’t also work. But > maybe > >>> you want to move to POST anyway (so there can be application semantics, > >>> like taking note of the heartbeat, on the server), and 2.04 would fit > >> that > >>> very well (see RFC 7252 Section 5.8.2). > >> > >> [Med] Will consider the use of POST instead of GET. > >> > >>> > >>>> On Nov 12, 2019, at 10:18, <mohamed.boucadair@orange.com> > >>> <mohamed.boucadair@orange.com> wrote: > >>>> > >>>> Hi Carsten, > >>>> > >>>> You indicated the following in an offline message (I’m adding dots > >>> mailing list as you were OK; see below): > >>>> > >>>>> I would have expected using requests sent in non-confirmable > >> messages, > >>>>> requiring more work on the application side (interpreting losses) but > >>> also > >>>>> delivering a more regular, predictable, less noisy signal. > >>>>> A little bit of specification text is then needed to ensure that > >> those > >>>>> requests/responses meet the requirements of RFC 8085 and the related > >>>>> specifications in RFC 7252, and that both sides are in a good > >> position > >>> to > >>>>> interpret the signal they get. > >>>> … > >>>>> If not, I would like to discuss the issue on the DOTS mailing list, > >> and > >>> see > >>>>> whether a small set of changes to the keepalive mechanisms employed > >> can > >>>>> improve its controllability and predictability. > >>>> > >>>> Assuming that non-confirmable application HBs are used, which changes > >> do > >>> you think are needed to enhance the DOTS mechanism to meet 8085/7252 > >>> requirements? > >>>> > >>>> As a reminder we do have the following for setting the hb parameters: > >>>> > >>>> ====== > >>>> Note: heartbeat-interval should be tweaked to also assist DOTS > >>>> messages for NAT traversal (SIG-011 of [RFC8612]). According to > >>>> [RFC8085], keepalive messages must not be sent more frequently > >>>> than once every 15 seconds and should use longer intervals when > >>>> possible. Furthermore, [RFC4787] recommends NATs to use a state > >>>> timeout of 2 minutes or longer, but experience shows that sending > >>>> packets every 15 to 30 seconds is necessary to prevent the > >>>> majority of middleboxes from losing state for UDP flows. From > >>>> that standpoint, the RECOMMENDED minimum heartbeat-interval is 15 > >>>> seconds and the RECOMMENDED maximum heartbeat-interval is 240 > >>>> seconds. The recommended value of 30 seconds is selected to > >>>> anticipate the expiry of NAT state. > >>>> > >>>> A heartbeat-interval of 30 seconds may be considered as too > >> chatty > >>>> in some deployments. For such deployments, DOTS agents may > >>>> negotiate longer heartbeat-interval values to prevent any network > >>>> overload with too frequent keepalives. > >>>> > >>>> Different heartbeat intervals can be defined for 'mitigating- > >>>> config' and 'idle-config' to reduce being too chatty during idle > >>>> times. If there is an on-path translator between the DOTS client > >>>> (standalone or part of a DOTS gateway) and the DOTS server, the > >>>> 'mitigating-config' heartbeat-interval has to be smaller than the > >>>> translator session timeout. It is recommended that the 'idle- > >>>> config' heartbeat-interval is also smaller than the translator > >>>> session timeout to prevent translator traversal issues, or > >>>> disabled entirely. Means to discover the lifetime assigned by a > >>>> translator are out of scope. > >>>> ======= > >>>> > >>>> Thank you. > >>>> > >>>> Cheers, > >>>> Med > >> > >> _______________________________________________ > >> Dots mailing list > >> Dots@ietf.org > >> https://www.ietf.org/mailman/listinfo/dots
- [Dots] Improve controllability and predictability… mohamed.boucadair
- Re: [Dots] Improve controllability and predictabi… Carsten Bormann
- Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
- Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
- Re: [Dots] Improve controllability and predictabi… Carsten Bormann
- Re: [Dots] Improve controllability and predictabi… Carsten Bormann
- Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
- Re: [Dots] Improve controllability and predictabi… Carsten Bormann
- Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
- Re: [Dots] Improve controllability and predictabi… Carsten Bormann
- Re: [Dots] Improve controllability and predictabi… mohamed.boucadair
- Re: [Dots] Improve controllability and predictabi… Valery Smyslov