Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-requirements-18: (with DISCUSS and COMMENT)

Hi Mirja, 

Please see inline.

Cheers,
Med

> -----Message d'origine-----
> De : Dots [mailto:dots-bounces@ietf.org] De la part de Mirja Kühlewind
> Envoyé : mercredi 20 février 2019 18:54
> À : The IESG
> Cc : dots-chairs@ietf.org; frank.xialiang@huawei.com; draft-ietf-dots-
> requirements@ietf.org; dots@ietf.org
> Objet : [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-requirements-18:
> (with DISCUSS and COMMENT)
> 
> Mirja Kühlewind has entered the following ballot position for
> draft-ietf-dots-requirements-18: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-dots-requirements/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> Thanks for addressing the TSV-ART comments (and thanks Joe for the review)!
> In-line with Joe's comment, please see some additional comments below.
> 
> 1) One minor edit is required still for SIG-002: for PLMTUD the correct
> reference is RFC4821, however,

[Med] Actually, the document is referring to draft-ietf-intarea-frag-fragile for PMTUD matters. That document cites the appropriate documents: rfc8201, rfc4821, draft-ietf-tsvwg-datagram-plpmtud, etc. 

 as commented by Joe RFC1191 is less reliable

[Med] RFC1191 is cited to justify why PMTU of 576 bytes was chosen.

> and
> therefore usually not recommended. I would recommend to re-add a reference to
> RFC4821 and no reference to RFC1191 (or only with a warning that RFC4821 is
> preferred due to ICMP blocking). Further, the correct reference for datagram
> PLMTUD is draft-ietf-tsvwg-datagram-plpmtud.

[Med] This is already cited in draft-ietf-intarea-frag-fragile. No need to be redundant, IMO. 

> 2) Also on this text in SIG-004:
> "The heartbeat interval during active mitigation could be
>       negotiable, but MUST be frequent enough to maintain any on-path
>       NAT or Firewall bindings during mitigation.  When TCP is used as
>       transport, the DOTS signal channel heartbeat messages need to be
>       frequent enough to maintain the TCP connection state."
> 
> As Joe commented already, different heartbeats at different layers can be
> used
> at the same time for different purposes. You can use heartbeats at the
> application layer to check service availability while e.g. using a higher
> frequent heartbeat at the transport layer to maintain firewall and NAT state.

[Med] Please note that the text you quoted is about "during active mitigation". When no attack is ongoing, we do have the following behavior which covers your comment: 

      When DOTS agents are exchanging heartbeats and no
      mitigation request is active, either agent MAY request changes to
      the heartbeat rate.  For example, a DOTS server might want to
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      reduce heartbeat frequency or cease heartbeat exchanges when an
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      active DOTS client has not requested mitigation, in order to
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      control load.

> The advantage to such an approach is that there is less application layer
> overhead/load e.g. in scenarios where it might be expensive to wake up the
> application or a server is already highly loaded. Also note that the  time-
> outs
> values of NATs and firewalls on the path are usually unknown, therefore an
> application can never rely on heartbeats (no matter at which level) and must
> be
> prepared to try to reconnect on the application layer if the connection
> fails.
> Usually, the main reason for using heartbeats to maintain NAT or firewall
> state
> (vs. reconnect every time) in TCP is if the application is time-sensitive and
> a
> full TCP handshake takes too long for the desired service. I'm not sure that
> the case for DOTS, however, I understand it may be beneficial to have
> established state if an attack is on-going.

[Med] This is important to avoid new handshakes when the client has to request a mitigation. 

> 
> For UDP I guess it's more complicated in your case. Time-outs are usually
> very
> short, however, state is created with the first packet of a flow (as there is
> no handshake in UDP). As you don't see blocking if state is expired as new
> state is created immediately, it's kind of impossible to measure the
> configured
> time-out values. Only if the firewall is under attack it would start blocking
> UDP traffic that is has no state for yet. So I understand why it is desirable
> to maintain UDP state for you, however, I don't understand how you can know
> that your frequency is high enough to actually keep the state open. Note that
> TCP time-outs are usually in the order of hours, while UDP time-outs are
> usually in range of tens of seconds, and might expire even quicker if a
> system
> is under attack. If that is a scenario that is important for you, and
> assuming
> that not all time-outs values on the path can be known, I guess it would be
> recommendable to use TCP instead.
> 
> In any case this can not be a MUST requirement (as timers are usually not
> known). I would recommend to state something like:
> 
> "MAY be frequent enough to maintain NAT or firewall state, if timer values
> are
> known, or if TCP is used, SHOULD use in addition TCP heartbeats  to maintain
> the TCP connection state and reconnect immediately if a failure is detected."
> 

[Med] The original wording is accurate and reflects the requirement of the WG. How this will be enforced is part of the solution/specification space.

> And also for this part it is different for TCP and UDP:
> 
> "Because heartbeat loss is much more likely during volumetric attack, DOTS
>       agents SHOULD avoid signal channel termination when mitigation is
>       active and heartbeats are not received by either DOTS agent for an
>       extended period."
> 
> If TCP would be used and no ACKs are received, TCP would try to retransmit a
> few times and some point terminate the connection. However, UDP is a
> connection-less protocol, there is nothing to terminate.

[Med] The text is about "signal channel termination". The concept of DOTS session is defined here: https://tools.ietf.org/html/draft-ietf-dots-architecture-11#section-3.1 

> 
> Also note that for reliable transports, it is sufficient if one end-hosts
> sends
> heartbeats as the other end is required to acknowledge the reception on the
> transport layer (and if no ack is received the connection is terminated on
> the
> transport layer).
> 
> So I guess what you want to say above is that if a connection-less protocol
> is
> used, heartbeats should continuously be sent even if no heartbeats are
> received
> from the other end. However, I think you still need to define a termination
> criteria, as you for sure don't want to keep sending heartbeats forever.

[Med] Agree. One condition is already cited in the above text: "when mitigation is active". A termination criteria would be that the mitigation is not active anymore. How termination is achieved is part of the solution space. 

> 
> Also the next part:
> 
> "      *  To handle possible DOTS server restart or crash, the DOTS
>          clients MAY attempt to establish a new signal channel session,
>          but MUST continue to send heartbeats on the current session so
>          that the DOTS server knows the session is still alive.  If the
>          new session is successfully established, the DOTS client can
>          terminate the current session."
> 
> There is nothing like connection re-establishing in UDP, you just keep
> sending
> traffic.

[Med] The text is about "signal channel session".

 While in TCP, as explained above, the connection will be terminated
> at
> the transport layer and there is no way to keep sending heartbeats on the
> "old"
> session. Or do have something like DTLS in mind in this case?

[Med] Yes.

> 
> 3) In SIG-006 you say:
> "      Due to the higher likelihood of packet loss during a DDoS attack,
>       DOTS servers MUST regularly send mitigation status to authorized
>       DOTS clients which have requested and been granted mitigation,
>       regardless of client requests for mitigation status."
> 
> Please note that this is only true if a not-reliable transport is used. If a
> reliable transport is used, data is received at the application level without
> loss (but maybe some delay) or the connection is terminated (if loss is too
> high to retransmit successfully).
> 

[Med] The requirement as worded is OK. 

> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> One editorial comment on SEC-002:
> 
> "A security mechanism at the network layer (e.g.,
>       TLS) is thus adequate to provide hop-by-hop security.  In other
>       words, end-to-end security is not required for DOTS protocols."
> 
> TLS is transport layer security (not network layer) and therefore known as
> providing end-to-end security while the term hop-by-hop is used for e.g.
> IPSec.
> 
> I would recommend to change the wording here in order to avoid confusion,
> e.g.
> 
> "A security mechanism at the transport layer (e.g.,
>       TLS) is thus adequate to provide security between different DOTS
> agents.
>       In other words, a direct security association between the server and
>       client, excluding any proxy, is not required for DOTS protocols."
> 

[Med] I disagree with the last part of the proposed wording. The DOTS architecture involves gateways, hence the hop-by-hop security model.  

> And finally one general comment:
> 
> I understand that having wg  consensus for this document is import to proceed
> the work of the group, however, I don't see the value in archiving this
> document in the IETF RFC series as a stand-alone document. If the group
> thinks
> documenting these requirements for consumption outside the group's work at a
> later point in time is valuable, I would rather recommend to add the
> respective
> requirements to the appendix of the respective protocol specs.
> 
> 
> _______________________________________________
> Dots mailing list
> Dots@ietf.org
> https://www.ietf.org/mailman/listinfo/dots