Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

"Jon Shallow" <supjps-ietf@jpshallow.com> Thu, 17 May 2018 15:24 UTC

From: Jon Shallow <supjps-ietf@jpshallow.com>
To: "'Konda, Tirumaleswar Reddy'" <TirumaleswarReddy_Konda@mcafee.com>, mohamed.boucadair@orange.com, dots@ietf.org
References: <025d01d3ecff$ef69f2b0$ce3dd810$@jpshallow.com> <787AE7BB302AE849A7480A190F8B93302DF1BF79@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <02ad01d3ed13$42a0c3b0$c7e24b10$@jpshallow.com> <787AE7BB302AE849A7480A190F8B93302DF1C138@OPEXCLILMA3.corporate.adroot.infra.ftgroup> <BN6PR16MB14258C3B7F3C5A876225950CEA920@BN6PR16MB1425.namprd16.prod.outlook.com> <039101d3edb3$dd8de830$98a9b890$@jpshallow.com> <BN6PR16MB1425FD80B4F2E66ABE5A790FEA910@BN6PR16MB1425.namprd16.prod.outlook.com>
In-Reply-To: <BN6PR16MB1425FD80B4F2E66ABE5A790FEA910@BN6PR16MB1425.namprd16.prod.outlook.com>
Date: Thu, 17 May 2018 16:24:11 +0100
Message-ID: <082d01d3edf3$1b1df3e0$5159dba0$@jpshallow.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_082E_01D3EDFB.7CE4A5D0"
Thread-Index: AQIW1wbd0mEd6XcqqteS3FYBOJVHmAGS49n4AgAfEOkB80nXRAEjJJaOAglYmHgBM8wmLaNfAO7A
Content-Language: en-gb
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/bllCYa-eKmrUh2rQpqiH3PyKzmc>
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration
Precedence: list

Hi Tiru,

 

See inline [Jon1]

 

Regards

 

Jon

 

From: Dots [mailto: dots-bounces@ietf.org] On Behalf Of Konda, Tirumaleswar Reddy
Sent: 17 May 2018 16:03
To: Jon Shallow; mohamed.boucadair@orange.com; dots@ietf.org
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

 

Hi Jon,

 

This scenario is already discussed in Section 4.7. When the DDoS mitigation is in progress and after the ‘missing-hb-allowed’ threshold is reached, the DOTS client will continue to use the existing (D)TLS session to send heartbeat requests to the server and at the same time try to establish a new (D)TLS session with DOTS server. 

[Jon1] I fully understand that

Further, if the server restarts, (D)TLS state is lost on the server and the server does not have the cryptographic state to send ‘re-subscribe’ on the old session. 

[Jon1] Agreed the DOTS server cannot sent anything that is unsolicited due to loss of cryptographic state.  However, if the DOTS server can send back in a response to a client request something that states “I restarted recently @ such a time”, the client will know whether the missing heartbeats were down to just a flaky network, or the missing heartbeats were due to a server restart, and so re-subscribe needed for all the resources that the client wants to monitor.  I do not want the DOTS client to keep on doing re-subscribe compounding the network flakiness when under attack.

~Jon1

 

-Tiru

 

From: Jon Shallow [mailto:supjps-ietf@jpshallow.com] 
Sent: Thursday, May 17, 2018 1:21 PM
To: Konda, Tirumaleswar Reddy <TirumaleswarReddy_Konda@McAfee.com>; mohamed.boucadair@orange.com; dots@ietf.org
Subject: RE: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

 


CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

  _____  

Hi Tiru,

 

It is a corner recovery case.  The DOTS client can detect that the DOTS server has gone away (using heartbeats) and re-subscribe to all the resources that the DOTS client wants to Observe on detection that the DOTS server is back

 

However, when mitigating, there is a likelihood that heartbeats will time out and I am not sure that it is wise to “re-subscribe” when the connections are re-established as the network path may still be flaky and adding extra traffic may be problematic.  My suggestion of an epoch time being reported by the DOTS server was a way of determining whether “re-subscribe” was required or not.

 

Regards

 

Jon

 

From: Dots [mailto: dots-bounces@ietf.org] On Behalf Of Konda, Tirumaleswar Reddy
Sent: 16 May 2018 16:05
To: mohamed.boucadair@orange.com; Jon Shallow; dots@ietf.org
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

 

I don’t get the problem. If the server restarts, (D)TLS state will be lost and server will reject heartbeat requests from the client (either silently or send a fatal alert to the client). DOTS client will have to follow the steps in https://tools.ietf.org/html/draft-ietf-dots-signal-channel-19#section-4.7 to re-establish (D)TLS session with the DOTS server. 

 

-Tiru

 

From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of mohamed.boucadair@orange.com
Sent: Wednesday, May 16, 2018 7:15 PM
To: Jon Shallow <supjps-ietf@jpshallow.com>; dots@ietf.org
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

 


CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

  _____  

Re-,

 

Please see inline. 

 

Cheers,

Med

 

De : Jon Shallow [mailto:supjps-ietf@jpshallow.com] 
Envoyé : mercredi 16 mai 2018 14:42
À : BOUCADAIR Mohamed IMT/OLN; dots@ietf.org
Objet : RE: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

 

Hi Med,

 

See inline.

 

Regards

 

Jon

 

From: Dots [mailto: dots-bounces@ietf.org] On Behalf Of mohamed.boucadair@orange.com
Sent: 16 May 2018 12:55
To: Jon Shallow; dots@ietf.org
Subject: Re: [Dots] DOTS server crash/restart with Observed mitigations/Configuration

 

Hi Jon, 

 

Please see inline. 

 

Cheers,

Med

 

De : Dots [mailto:dots-bounces@ietf.org] De la part de Jon Shallow
Envoyé : mercredi 16 mai 2018 12:24
À : dots@ietf.org
Objet : [Dots] DOTS server crash/restart with Observed mitigations/Configuration

 

Hi there,

 

For the signal channel, it is easy to maintain a persistent list of active mitigations.  Should the DOTS server (unexpectedly) stop and restart, then it is easy to rebuild the current active mitigations.

 

[Med] The list of active mitigations is the “core” state data to be maintained by a server. If we assume that list can be reliably rebuilt, then there is no extension needed to the signal channel to handle this failure case.

 

[Jon] Agreed.  This was just me building up the background to the scenario.

 

However, to rebuild the list of active Observations of the current mitigations / configuration changes is considerably more difficult as the DOTS server would also need to maintain a list of client IPs/source ports etc. along with Tokens and Queries and (D)TLS state so that unsolicited Observe responses can continue to be sent back to the clients.  I don’t really think that this is a practical ask.

 

[Med] First, this can be considered as deployment-specific. As such in some deployments, there is no need to support automatic means to detect state loss at the server side. Second, is it really problematic for the DDoS mitigation that list is lost by a server? I guess, the answer is no if we assume the list of mitigations themselves is maintained. 

 

[Jon] My concern is that the DOTS clients would not see the change in mitigation state, or would not see that there is a new signal channel configuration (if they were being Observed). The DOTS client/server would be out of sync with what is happening and the DOTS client would not necessarily know that it has to take some action to get back to the state before the DOTS server outage.

 

[Med] The dots client has other means to know the mitigation state. The DOTS client can observe the status locally (and report it to the server) or send a request to retrieve the status from the server. The out of synch in observe is not that critical for mitigating DDoS attacks.  

 

As the DOTS server restarted, existing (D)TLS sessions used by the DOTS client will fail.  So, the heartbeat mechanism will eventually time out  and I guess that the DOTS client should detect this and re-do all of the Observe requests (along with making sure that all the active mitigations are as expected).

 

[Med] Including clients in networks under active attacks? Isn’t this overloading the server while dealing with ongoing attacks?  

 

[Jon] Yes, it would – hence I did not really want to have to do this whenever a heartbeat failed, but only do it when the DOTS client “knows” that the server has had an outage.

 

However in Peace time Heartbeats may not being used.

However at Mitigation time heartbeats may fail.

 

Would it make sense to add to the signal channel an extra parameter that the DOTS server can send to the client in any response in the form of 

“last-restarted” : seconds-since-epoch

Which the client can use to detected the restart.

 

[Med] We used to have an epoch-based approach for a statefull service (see https://tools.ietf.org/html/rfc6887#section-8.5). The details are not restricted to just signalling the epoch value but to the logic to handle it at the client side. Further, there are some aspects to be taken into account when configuring redundancy servers when an anycast address is used. If the same initial epoch value is configured, the client may not detect a backup server is used and hence may not reinstall state... A tweak is to configure distinct epoch values (e.g., +/- 24h difference between a server redundancy group). 

 

The epoch is justified for 6887 because we wanted to recover state loss, while in the DOTS case we don’t have the same model given that, as you rightfully said above, the current active mitigations can be rebuilt at the server side without involving clients.  

 

[Jon] So this method sets epoch time back to 0 whenever there is a restart, 

[Med] Yes. The epoch is reset each time the state is lost at the server side. 

 

rather that report on the number of seconds since Jan 1 1970.   That I can handle.  However anycast (though redundant servers are currently out of spec) where 2 or more servers restart and update the start time at exactly the same time is possible, but unlikely.  This could be handled by the anycast servers communicating and seeing they both have the same start time so one backs off (in time terms).

 

[Jon] I’m just interested in how to recover Observe loss.

[Med] I’m not convinced this is really needed given that active mitigations are not concerned with this failure scenario.  

 

  I appreciate in peace time there may not be any DOTS client requests (or heartbeat) for a long time...

 

I’m open to suggestions.

 

Regards

 

Jon

[Dots] DOTS server crash/restart with Observed mi… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… mohamed.boucadair
Re: [Dots] DOTS server crash/restart with Observe… mohamed.boucadair
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
Re: [Dots] DOTS server crash/restart with Observe… Jon Shallow
Re: [Dots] DOTS server crash/restart with Observe… Konda, Tirumaleswar Reddy
[Dots] 答复: DOTS server crash/restart with Observe… Xialiang (Frank, Network Integration Technology Research Dept)
Re: [Dots] 答复: DOTS server crash/restart with Obs… Konda, Tirumaleswar Reddy
Re: [Dots] 答复: DOTS server crash/restart with Obs… Jon Shallow
Re: [Dots] 答复: DOTS server crash/restart with Obs… Konda, Tirumaleswar Reddy