Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

"Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com> Thu, 17 May 2018 16:06 UTC

Return-Path: <TirumaleswarReddy_Konda@mcafee.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 19CCC1241F3 for <dots@ietfa.amsl.com>; Thu, 17 May 2018 09:06:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.309
X-Spam-Level:
X-Spam-Status: No, score=-4.309 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=mcafee.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hBIiXgBKl0tK for <dots@ietfa.amsl.com>; Thu, 17 May 2018 09:06:05 -0700 (PDT)
Received: from DNVWSMAILOUT1.mcafee.com (dnvwsmailout1.mcafee.com [161.69.31.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7FC941205F0 for <dots@ietf.org>; Thu, 17 May 2018 09:06:05 -0700 (PDT)
X-NAI-Header: Modified by McAfee Email Gateway (5500)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mcafee.com; s=s_mcafee; t=1526573153; h=From: To:Subject:Thread-Topic:Thread-Index:Date: Message-ID:References:In-Reply-To:Accept-Language: Content-Language:X-MS-Has-Attach:X-MS-TNEF-Correlator: dlp-product:dlp-version:dlp-reaction:authentication-results: x-originating-ip:x-ms-publictraffictype:x-microsoft-exchange-diagnostics: x-ms-exchange-antispam-srfa-diagnostics:x-microsoft-antispam: x-ms-traffictypediagnostic:x-microsoft-antispam-prvs: x-exchange-antispam-report-test:x-ms-exchange-senderadcheck: x-exchange-antispam-report-cfa-test:x-forefront-prvs: x-forefront-antispam-report:received-spf:x-microsoft-antispam-message-info: spamdiagnosticoutput:spamdiagnosticmetadata: Content-Type:MIME-Version:X-MS-Office365-Filtering-Correlation-Id: X-MS-Exchange-CrossTenant-Network-Message-Id: X-MS-Exchange-CrossTenant-originalarrivaltime: X-MS-Exchange-CrossTenant-fromentityheader: X-MS-Exchange-CrossTenant-id:X-MS-Exchange-Transport-CrossTenantHeadersStamped: X-OriginatorOrg:X-NAI-Spam-Flag:X-NAI-Spam-Threshold: X-NAI-Spam-Score:X-NAI-Spam-Version; bh=f lcQE02KOTPDqIvkuXrOM1BglKZl8Ge7CQ/lKRUOi/ g=; b=jpKZ4lnf/8sYZHRztDBmv/ZhCi9SWF7aUCLKfc8YcNlW CYY02/gkycZvRNdgVPCDjWkE/zyGbCwdTw/VSa8wAXXvTC1o/b YB7FeJdl3mrN9hXScWXh04G9lxZCXRfpt0sqCIWl17FsNbshcx F8H2+wrGzsgCo+K85yMDUbt9lRU=
Received: from DNVEXAPP1N06.corpzone.internalzone.com (unknown [10.44.48.90]) by DNVWSMAILOUT1.mcafee.com with smtp (TLS: TLSv1/SSLv3,256bits,ECDHE-RSA-AES256-SHA384) id 6ba7_6cda_e828416e_7b69_4867_a8d7_076ce3561e6f; Thu, 17 May 2018 11:05:52 -0500
Received: from DNVEXUSR1N12.corpzone.internalzone.com (10.44.48.85) by DNVEXAPP1N06.corpzone.internalzone.com (10.44.48.90) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Thu, 17 May 2018 10:05:15 -0600
Received: from DNVO365EDGE1.corpzone.internalzone.com (10.44.176.66) by DNVEXUSR1N12.corpzone.internalzone.com (10.44.48.85) with Microsoft SMTP Server (TLS) id 15.0.1347.2 via Frontend Transport; Thu, 17 May 2018 10:05:16 -0600
Received: from NAM03-BY2-obe.outbound.protection.outlook.com (10.44.176.241) by edge.mcafee.com (10.44.176.66) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Thu, 17 May 2018 10:05:12 -0600
Received: from BN6PR16MB1425.namprd16.prod.outlook.com (10.172.207.19) by BN6PR16MB1779.namprd16.prod.outlook.com (10.172.28.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.776.11; Thu, 17 May 2018 16:05:12 +0000
Received: from BN6PR16MB1425.namprd16.prod.outlook.com ([fe80::4dec:4270:1d97:fab8]) by BN6PR16MB1425.namprd16.prod.outlook.com ([fe80::4dec:4270:1d97:fab8%9]) with mapi id 15.20.0776.010; Thu, 17 May 2018 16:05:11 +0000
From: "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com>
To: Jon Shallow <supjps-ietf@jpshallow.com>, "mohamed.boucadair@orange.com" <mohamed.boucadair@orange.com>, "dots@ietf.org" <dots@ietf.org>
Thread-Topic: [Dots] DOTS server crash/restart with Observed mitigations/Configuration
Thread-Index: AdPs/+9TEvk/O5CYS8+qOpb9I1vXhAACuBlAAAIcswAAAjbXAAAByVCQACQmhYAAAPvNsAAO06eAAADXUFA=
Date: Thu, 17 May 2018 16:05:11 +0000
Message-ID: <BN6PR16MB1425C81FAB6CB8E153B5E589EA910@BN6PR16MB1425.namprd16.prod.outlook.com>
References: <025d01d3ecff$ef69f2b0$ce3dd810$@jpshallow.com> <787AE7BB302AE849A7480A190F8B93302DF1BF79@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <02ad01d3ed13$42a0c3b0$c7e24b10$@jpshallow.com> <787AE7BB302AE849A7480A190F8B93302DF1C138@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <BN6PR16MB14258C3B7F3C5A876225950CEA920@BN6PR16MB1425.namprd16.prod.outlook.com> <039101d3edb3$dd8de830$98a9b890$@jpshallow.com> <BN6PR16MB1425FD80B4F2E66ABE5A790FEA910@BN6PR16MB1425.namprd16.prod.outlook.com> <082d01d3edf3$1b1df3e0$5159dba0$@jpshallow.com>
In-Reply-To: <082d01d3edf3$1b1df3e0$5159dba0$@jpshallow.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
dlp-product: dlpe-windows
dlp-version: 11.0.300.84
dlp-reaction: no-action
authentication-results: spf=none (sender IP is ) smtp.mailfrom=TirumaleswarReddy_Konda@McAfee.com;
x-originating-ip: [122.172.116.209]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BN6PR16MB1779; 7:UchyZPYHvn2b006xx2148N6K78d6rdnLzok/+4C8MzhQiOa3MfB6u3GcjGx7WEvVGmjSsCdphA+UbBtCS4jRgcoCFL21A4Qe0ZA9AJO41TG6kBL+Iu7bwDWif4X0ecSmY++Hmjm/WTrBcL/1dvOXGc0v0xj4s4JKU3G2fnNhdvz/C6oL3ryOjo483FWk2J3T1QK83bjtj0v/KZMEjHgKDcr1DxXFU45dHksMBpqaOwMtTlx6XmKzh9w5tWFEQOLc
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:BN6PR16MB1779;
x-ms-traffictypediagnostic: BN6PR16MB1779:
x-microsoft-antispam-prvs: <BN6PR16MB1779EA1A658878D0A679D2F5EA910@BN6PR16MB1779.namprd16.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(28532068793085)(190756311086443)(158342451672863)(18271650672692)(21748063052155)(123452027830198);
x-ms-exchange-senderadcheck: 1
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(10201501046)(3002001)(93006095)(93001095)(3231254)(944501410)(52105095)(149027)(150027)(6041310)(20161123560045)(20161123558120)(20161123564045)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011)(7699016); SRVR:BN6PR16MB1779; BCL:0; PCL:0; RULEID:; SRVR:BN6PR16MB1779;
x-forefront-prvs: 067553F396
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(366004)(376002)(346002)(39380400002)(396003)(189003)(199004)(32952001)(55016002)(6246003)(6306002)(54896002)(2201001)(80792005)(478600001)(86362001)(966005)(74316002)(2900100001)(19609705001)(97736004)(59450400001)(9686003)(6506007)(236005)(8676002)(53546011)(606006)(93886005)(110136005)(72206003)(790700001)(2906002)(6436002)(316002)(6116002)(53936002)(3846002)(229853002)(106356001)(105586002)(53946003)(5660300001)(3280700002)(8936002)(186003)(5890100001)(66066001)(99286004)(25786009)(2501003)(14454004)(7736002)(3660700001)(7696005)(5250100002)(102836004)(11346002)(26005)(68736007)(476003)(486006)(76176011)(446003)(33656002)(81166006)(81156014)(85282002); DIR:OUT; SFP:1101; SCL:1; SRVR:BN6PR16MB1779; H:BN6PR16MB1425.namprd16.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: McAfee.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: PX3voDBoszkx82njEDr3d1nPrgTgAIVZAgBPLVhl2Z6oBQMb7tuJXmdGzsan679l6w8O4lPhyaLm+9/6+ZySsrTqnpTALvz2xoiW+EE9+nwp9dYvzzZbKFKybXUIVHxAAFIgdvPrfHhjf30Nj+qP6HiDnfPQE0tDxIcq8ohZjC+6yLBRSb/P1Drg2i9vSZW3mBGeFxO2QR58nrVqpUMf/A==
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_BN6PR16MB1425C81FAB6CB8E153B5E589EA910BN6PR16MB1425namp_"
MIME-Version: 1.0
X-MS-Office365-Filtering-Correlation-Id: 484428ae-6406-4212-6266-08d5bc0ff856
X-MS-Exchange-CrossTenant-Network-Message-Id: 484428ae-6406-4212-6266-08d5bc0ff856
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 May 2018 16:05:11.7860 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4943e38c-6dd4-428c-886d-24932bc2d5de
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR16MB1779
X-OriginatorOrg: mcafee.com
X-NAI-Spam-Flag: NO
X-NAI-Spam-Threshold: 15
X-NAI-Spam-Score: 0
X-NAI-Spam-Version: 2.3.0.9418 : core <6288> : inlines <6641> : streams <1787011> : uri <2642861>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/5PQZMDeoTJ4nZ5885frbzzeDFSw>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 May 2018 16:06:10 -0000

[Jon1] Agreed the DOTS server cannot sent anything that is unsolicited due to loss of cryptographic state.  However, if the DOTS server can send back in a response to a client request something that states “I restarted recently @ such a time”, the client will know whether the missing heartbeats were down to just a flaky network, or the missing heartbeats were due to a server restart, and so re-subscribe needed for all the resources that the client wants to monitor.  I do not want the DOTS client to keep on doing re-subscribe compounding the network flakiness when under attack.

[R] The first two sentences contradict each other. How can the server send any data on the old session ?
And I don’t understand the value of sending “I restarted recently @such a time” in the new (D)TLS session., the client anyways has to resend the list of active mitigations it wishes to observe in the new session (irrespective of the reason the session is disconnected). A session could be disconnected because of various other reasons (e.g. NAT reload).

-Tiru

From: Jon Shallow [mailto:supjps-ietf@jpshallow.com]
Sent: Thursday, May 17, 2018 8:54 PM
To: Konda, Tirumaleswar Reddy <TirumaleswarReddy_Konda@McAfee.com>; mohamed.boucadair@orange.com; dots@ietf.org
Subject: RE: [Dots] DOTS server crash/restart with Observed mitigations/Configuration


CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.


________________________________
Hi Tiru,

See inline [Jon1]

Regards

Jon

From: Dots [mailto: dots-bounces@ietf.org<mailto:dots-bounces@ietf.org>] On Behalf Of Konda, Tirumaleswar Reddy
Sent: 17 May 2018 16:03
To: Jon Shallow; mohamed.boucadair@orange.com<mailto:mohamed.boucadair@orange.com>; dots@ietf.org<mailto:dots@ietf.org>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

Hi Jon,

This scenario is already discussed in Section 4.7. When the DDoS mitigation is in progress and after the ‘missing-hb-allowed’ threshold is reached, the DOTS client will continue to use the existing (D)TLS session to send heartbeat requests to the server and at the same time try to establish a new (D)TLS session with DOTS server.
[Jon1] I fully understand that
Further, if the server restarts, (D)TLS state is lost on the server and the server does not have the cryptographic state to send ‘re-subscribe’ on the old session.
[Jon1] Agreed the DOTS server cannot sent anything that is unsolicited due to loss of cryptographic state.  However, if the DOTS server can send back in a response to a client request something that states “I restarted recently @ such a time”, the client will know whether the missing heartbeats were down to just a flaky network, or the missing heartbeats were due to a server restart, and so re-subscribe needed for all the resources that the client wants to monitor.  I do not want the DOTS client to keep on doing re-subscribe compounding the network flakiness when under attack.
~Jon1

-Tiru

From: Jon Shallow [mailto:supjps-ietf@jpshallow.com]
Sent: Thursday, May 17, 2018 1:21 PM
To: Konda, Tirumaleswar Reddy <TirumaleswarReddy_Konda@McAfee.com<mailto:TirumaleswarReddy_Konda@McAfee.com>>; mohamed.boucadair@orange.com<mailto:mohamed.boucadair@orange.com>; dots@ietf.org<mailto:dots@ietf.org>
Subject: RE: [Dots] DOTS server crash/restart with Observed mitigations/Configuration


CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.


________________________________
Hi Tiru,

It is a corner recovery case.  The DOTS client can detect that the DOTS server has gone away (using heartbeats) and re-subscribe to all the resources that the DOTS client wants to Observe on detection that the DOTS server is back

However, when mitigating, there is a likelihood that heartbeats will time out and I am not sure that it is wise to “re-subscribe” when the connections are re-established as the network path may still be flaky and adding extra traffic may be problematic.  My suggestion of an epoch time being reported by the DOTS server was a way of determining whether “re-subscribe” was required or not.

Regards

Jon

From: Dots [mailto: dots-bounces@ietf.org<mailto:dots-bounces@ietf.org>] On Behalf Of Konda, Tirumaleswar Reddy
Sent: 16 May 2018 16:05
To: mohamed.boucadair@orange.com<mailto:mohamed.boucadair@orange.com>; Jon Shallow; dots@ietf.org<mailto:dots@ietf.org>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

I don’t get the problem. If the server restarts, (D)TLS state will be lost and server will reject heartbeat requests from the client (either silently or send a fatal alert to the client). DOTS client will have to follow the steps in https://tools.ietf.org/html/draft-ietf-dots-signal-channel-19#section-4.7 to re-establish (D)TLS session with the DOTS server.

-Tiru

From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of mohamed.boucadair@orange.com<mailto:mohamed.boucadair@orange.com>
Sent: Wednesday, May 16, 2018 7:15 PM
To: Jon Shallow <supjps-ietf@jpshallow.com<mailto:supjps-ietf@jpshallow.com>>; dots@ietf.org<mailto:dots@ietf.org>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration


CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.


________________________________
Re-,

Please see inline.

Cheers,
Med

De : Jon Shallow [mailto:supjps-ietf@jpshallow.com]
Envoyé : mercredi 16 mai 2018 14:42
À : BOUCADAIR Mohamed IMT/OLN; dots@ietf.org<mailto:dots@ietf.org>
Objet : RE: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

Hi Med,

See inline.

Regards

Jon

From: Dots [mailto: dots-bounces@ietf.org<mailto:dots-bounces@ietf.org>] On Behalf Of mohamed.boucadair@orange.com<mailto:mohamed.boucadair@orange.com>
Sent: 16 May 2018 12:55
To: Jon Shallow; dots@ietf.org<mailto:dots@ietf.org>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

Hi Jon,

Please see inline.

Cheers,
Med

De : Dots [mailto:dots-bounces@ietf.org] De la part de Jon Shallow
Envoyé : mercredi 16 mai 2018 12:24
À : dots@ietf.org<mailto:dots@ietf.org>
Objet : [Dots] DOTS server crash/restart with Observed mitigations/Configuration

Hi there,

For the signal channel, it is easy to maintain a persistent list of active mitigations.  Should the DOTS server (unexpectedly) stop and restart, then it is easy to rebuild the current active mitigations.

[Med] The list of active mitigations is the “core” state data to be maintained by a server. If we assume that list can be reliably rebuilt, then there is no extension needed to the signal channel to handle this failure case.

[Jon] Agreed.  This was just me building up the background to the scenario.

However, to rebuild the list of active Observations of the current mitigations / configuration changes is considerably more difficult as the DOTS server would also need to maintain a list of client IPs/source ports etc. along with Tokens and Queries and (D)TLS state so that unsolicited Observe responses can continue to be sent back to the clients.  I don’t really think that this is a practical ask.

[Med] First, this can be considered as deployment-specific. As such in some deployments, there is no need to support automatic means to detect state loss at the server side. Second, is it really problematic for the DDoS mitigation that list is lost by a server? I guess, the answer is no if we assume the list of mitigations themselves is maintained.

[Jon] My concern is that the DOTS clients would not see the change in mitigation state, or would not see that there is a new signal channel configuration (if they were being Observed). The DOTS client/server would be out of sync with what is happening and the DOTS client would not necessarily know that it has to take some action to get back to the state before the DOTS server outage.

[Med] The dots client has other means to know the mitigation state. The DOTS client can observe the status locally (and report it to the server) or send a request to retrieve the status from the server. The out of synch in observe is not that critical for mitigating DDoS attacks.

As the DOTS server restarted, existing (D)TLS sessions used by the DOTS client will fail.  So, the heartbeat mechanism will eventually time out  and I guess that the DOTS client should detect this and re-do all of the Observe requests (along with making sure that all the active mitigations are as expected).

[Med] Including clients in networks under active attacks? Isn’t this overloading the server while dealing with ongoing attacks?

[Jon] Yes, it would – hence I did not really want to have to do this whenever a heartbeat failed, but only do it when the DOTS client “knows” that the server has had an outage.

However in Peace time Heartbeats may not being used.
However at Mitigation time heartbeats may fail.

Would it make sense to add to the signal channel an extra parameter that the DOTS server can send to the client in any response in the form of
“last-restarted” : seconds-since-epoch
Which the client can use to detected the restart.

[Med] We used to have an epoch-based approach for a statefull service (see https://tools.ietf.org/html/rfc6887#section-8.5). The details are not restricted to just signalling the epoch value but to the logic to handle it at the client side. Further, there are some aspects to be taken into account when configuring redundancy servers when an anycast address is used. If the same initial epoch value is configured, the client may not detect a backup server is used and hence may not reinstall state... A tweak is to configure distinct epoch values (e.g., +/- 24h difference between a server redundancy group).

The epoch is justified for 6887 because we wanted to recover state loss, while in the DOTS case we don’t have the same model given that, as you rightfully said above, the current active mitigations can be rebuilt at the server side without involving clients.

[Jon] So this method sets epoch time back to 0 whenever there is a restart,
[Med] Yes. The epoch is reset each time the state is lost at the server side.

rather that report on the number of seconds since Jan 1 1970.   That I can handle.  However anycast (though redundant servers are currently out of spec) where 2 or more servers restart and update the start time at exactly the same time is possible, but unlikely.  This could be handled by the anycast servers communicating and seeing they both have the same start time so one backs off (in time terms).

[Jon] I’m just interested in how to recover Observe loss.
[Med] I’m not convinced this is really needed given that active mitigations are not concerned with this failure scenario.

  I appreciate in peace time there may not be any DOTS client requests (or heartbeat) for a long time...

I’m open to suggestions.

Regards

Jon