Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

"Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com> Thu, 17 May 2018 15:04 UTC

From: "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com>
To: Jon Shallow <supjps-ietf@jpshallow.com>, "mohamed.boucadair@orange.com" <mohamed.boucadair@orange.com>, "dots@ietf.org" <dots@ietf.org>
Thread-Topic: [Dots] DOTS server crash/restart with Observed mitigations/Configuration
Thread-Index: AdPs/+9TEvk/O5CYS8+qOpb9I1vXhAACuBlAAAIcswAAAjbXAAAByVCQACQmhYAAAPvNsA==
Date: Thu, 17 May 2018 15:02:55 +0000
Message-ID: <BN6PR16MB1425FD80B4F2E66ABE5A790FEA910@BN6PR16MB1425.namprd16.prod.outlook.com>
References: <025d01d3ecff$ef69f2b0$ce3dd810$@jpshallow.com> <787AE7BB302AE849A7480A190F8B93302DF1BF79@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <02ad01d3ed13$42a0c3b0$c7e24b10$@jpshallow.com> <787AE7BB302AE849A7480A190F8B93302DF1C138@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <BN6PR16MB14258C3B7F3C5A876225950CEA920@BN6PR16MB1425.namprd16.prod.outlook.com> <039101d3edb3$dd8de830$98a9b890$@jpshallow.com>
In-Reply-To: <039101d3edb3$dd8de830$98a9b890$@jpshallow.com>
Accept-Language: en-US
Content-Language: en-US
dlp-product: dlpe-windows
dlp-version: 11.0.300.84
dlp-reaction: no-action
received-spf: None (protection.outlook.com: McAfee.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_BN6PR16MB1425FD80B4F2E66ABE5A790FEA910BN6PR16MB1425namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: e82fb6fe-5bb9-400e-1042-08d5bc0745aa
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 May 2018 15:02:56.0953 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4943e38c-6dd4-428c-886d-24932bc2d5de
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR16MB1425
X-OriginatorOrg: mcafee.com
X-NAI-Spam-Flag: NO
X-NAI-Spam-Threshold: 15
X-NAI-Spam-Score: 0
X-NAI-Spam-Version: 2.3.0.9418 : core <6288> : inlines <6641> : streams <1787007> : uri <2642841>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/ACU_f4ch-Bb-Fjr5-fS2Rn-KwOI>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling $DOTS$ technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 May 2018 15:04:35 -0000

Hi Jon,

This scenario is already discussed in Section 4.7. When the DDoS mitigation is in progress and after the ‘missing-hb-allowed’ threshold is reached, the DOTS client will continue to use the existing (D)TLS session to send heartbeat requests to the server and at the same time try to establish a new (D)TLS session with DOTS server.
Further, if the server restarts, (D)TLS state is lost on the server and the server does not have the cryptographic state to send ‘re-subscribe’ on the old session.

-Tiru

From: Jon Shallow [mailto:supjps-ietf@jpshallow.com]
Sent: Thursday, May 17, 2018 1:21 PM
To: Konda, Tirumaleswar Reddy <TirumaleswarReddy_Konda@McAfee.com>; mohamed.boucadair@orange.com; dots@ietf.org
Subject: RE: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

________________________________
Hi Tiru,

It is a corner recovery case.  The DOTS client can detect that the DOTS server has gone away (using heartbeats) and re-subscribe to all the resources that the DOTS client wants to Observe on detection that the DOTS server is back

However, when mitigating, there is a likelihood that heartbeats will time out and I am not sure that it is wise to “re-subscribe” when the connections are re-established as the network path may still be flaky and adding extra traffic may be problematic.  My suggestion of an epoch time being reported by the DOTS server was a way of determining whether “re-subscribe” was required or not.

Regards

Jon

From: Dots [mailto: dots-bounces@ietf.org] On Behalf Of Konda, Tirumaleswar Reddy
Sent: 16 May 2018 16:05
To: mohamed.boucadair@orange.com; Jon Shallow; dots@ietf.org
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

I don’t get the problem. If the server restarts, (D)TLS state will be lost and server will reject heartbeat requests from the client (either silently or send a fatal alert to the client). DOTS client will have to follow the steps in https://tools.ietf.org/html/draft-ietf-dots-signal-channel-19#section-4.7 to re-establish (D)TLS session with the DOTS server.

-Tiru

From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of mohamed.boucadair@orange.com
Sent: Wednesday, May 16, 2018 7:15 PM
To: Jon Shallow <supjps-ietf@jpshallow.com>; dots@ietf.org
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

________________________________
Re-,

Please see inline.

Cheers,
Med

De : Jon Shallow [mailto:supjps-ietf@jpshallow.com]
Envoyé : mercredi 16 mai 2018 14:42
À : BOUCADAIR Mohamed IMT/OLN; dots@ietf.org<mailto:dots@ietf.org>
Objet : RE: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

Hi Med,

See inline.

Regards

Jon

From: Dots [mailto: dots-bounces@ietf.org<mailto:dots-bounces@ietf.org>] On Behalf Of mohamed.boucadair@orange.com<mailto:mohamed.boucadair@orange.com>
Sent: 16 May 2018 12:55
To: Jon Shallow; dots@ietf.org<mailto:dots@ietf.org>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

Hi Jon,

Please see inline.

Cheers,
Med

De : Dots [mailto:dots-bounces@ietf.org] De la part de Jon Shallow
Envoyé : mercredi 16 mai 2018 12:24
À : dots@ietf.org<mailto:dots@ietf.org>
Objet : [Dots] DOTS server crash/restart with Observed mitigations/Configuration

Hi there,

For the signal channel, it is easy to maintain a persistent list of active mitigations.  Should the DOTS server (unexpectedly) stop and restart, then it is easy to rebuild the current active mitigations.

[Med] The list of active mitigations is the “core” state data to be maintained by a server. If we assume that list can be reliably rebuilt, then there is no extension needed to the signal channel to handle this failure case.

[Jon] Agreed.  This was just me building up the background to the scenario.

However, to rebuild the list of active Observations of the current mitigations / configuration changes is considerably more difficult as the DOTS server would also need to maintain a list of client IPs/source ports etc. along with Tokens and Queries and (D)TLS state so that unsolicited Observe responses can continue to be sent back to the clients.  I don’t really think that this is a practical ask.

[Med] First, this can be considered as deployment-specific. As such in some deployments, there is no need to support automatic means to detect state loss at the server side. Second, is it really problematic for the DDoS mitigation that list is lost by a server? I guess, the answer is no if we assume the list of mitigations themselves is maintained.

[Jon] My concern is that the DOTS clients would not see the change in mitigation state, or would not see that there is a new signal channel configuration (if they were being Observed). The DOTS client/server would be out of sync with what is happening and the DOTS client would not necessarily know that it has to take some action to get back to the state before the DOTS server outage.

[Med] The dots client has other means to know the mitigation state. The DOTS client can observe the status locally (and report it to the server) or send a request to retrieve the status from the server. The out of synch in observe is not that critical for mitigating DDoS attacks.

As the DOTS server restarted, existing (D)TLS sessions used by the DOTS client will fail.  So, the heartbeat mechanism will eventually time out  and I guess that the DOTS client should detect this and re-do all of the Observe requests (along with making sure that all the active mitigations are as expected).

[Med] Including clients in networks under active attacks? Isn’t this overloading the server while dealing with ongoing attacks?

[Jon] Yes, it would – hence I did not really want to have to do this whenever a heartbeat failed, but only do it when the DOTS client “knows” that the server has had an outage.

However in Peace time Heartbeats may not being used.
However at Mitigation time heartbeats may fail.

Would it make sense to add to the signal channel an extra parameter that the DOTS server can send to the client in any response in the form of
“last-restarted” : seconds-since-epoch
Which the client can use to detected the restart.

[Med] We used to have an epoch-based approach for a statefull service (see https://tools.ietf.org/html/rfc6887#section-8.5). The details are not restricted to just signalling the epoch value but to the logic to handle it at the client side. Further, there are some aspects to be taken into account when configuring redundancy servers when an anycast address is used. If the same initial epoch value is configured, the client may not detect a backup server is used and hence may not reinstall state... A tweak is to configure distinct epoch values (e.g., +/- 24h difference between a server redundancy group).

The epoch is justified for 6887 because we wanted to recover state loss, while in the DOTS case we don’t have the same model given that, as you rightfully said above, the current active mitigations can be rebuilt at the server side without involving clients.

[Jon] So this method sets epoch time back to 0 whenever there is a restart,
[Med] Yes. The epoch is reset each time the state is lost at the server side.

rather that report on the number of seconds since Jan 1 1970.   That I can handle.  However anycast (though redundant servers are currently out of spec) where 2 or more servers restart and update the start time at exactly the same time is possible, but unlikely.  This could be handled by the anycast servers communicating and seeing they both have the same start time so one backs off (in time terms).

[Jon] I’m just interested in how to recover Observe loss.
[Med] I’m not convinced this is really needed given that active mitigations are not concerned with this failure scenario.

  I appreciate in peace time there may not be any DOTS client requests (or heartbeat) for a long time...

I’m open to suggestions.

Regards

Jon

[Dots] DOTS server crash/restart with Observed mi… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… mohamed.boucadair
Re: [Dots] DOTS server crash/restart with Observe… mohamed.boucadair
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
[Dots] 答复: DOTS server crash/restart with Observe… Xialiang (Frank, Network Integration Technology Research Dept)
Re: [Dots] 答复: DOTS server crash/restart with Obs… Konda, Tirumaleswar Reddy
Re: [Dots] 答复: DOTS server crash/restart with Obs… Jon Shallow
Re: [Dots] 答复: DOTS server crash/restart with Obs… Konda, Tirumaleswar Reddy