Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-requirements-18: (with DISCUSS and COMMENT)

Hi Med,

please see below.

> Am 21.02.2019 um 08:43 schrieb mohamed.boucadair@orange.com:
> 
> Hi Mirja, 
> 
> Please see inline.
> 
> Cheers,
> Med
> 
>> -----Message d'origine-----
>> De : Dots [mailto:dots-bounces@ietf.org] De la part de Mirja Kühlewind
>> Envoyé : mercredi 20 février 2019 18:54
>> À : The IESG
>> Cc : dots-chairs@ietf.org; frank.xialiang@huawei.com; draft-ietf-dots-
>> requirements@ietf.org; dots@ietf.org
>> Objet : [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-requirements-18:
>> (with DISCUSS and COMMENT)
>> 
>> Mirja Kühlewind has entered the following ballot position for
>> draft-ietf-dots-requirements-18: Discuss
>> 
>> When responding, please keep the subject line intact and reply to all
>> email addresses included in the To and CC lines. (Feel free to cut this
>> introductory paragraph, however.)
>> 
>> 
>> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
>> for more information about IESG DISCUSS and COMMENT positions.
>> 
>> 
>> The document, along with other ballot positions, can be found here:
>> https://datatracker.ietf.org/doc/draft-ietf-dots-requirements/
>> 
>> 
>> 
>> ----------------------------------------------------------------------
>> DISCUSS:
>> ----------------------------------------------------------------------
>> 
>> Thanks for addressing the TSV-ART comments (and thanks Joe for the review)!
>> In-line with Joe's comment, please see some additional comments below.
>> 
>> 1) One minor edit is required still for SIG-002: for PLMTUD the correct
>> reference is RFC4821, however,
> 
> [Med] Actually, the document is referring to draft-ietf-intarea-frag-fragile for PMTUD matters. That document cites the appropriate documents: rfc8201, rfc4821, draft-ietf-tsvwg-datagram-plpmtud, etc. 
> 
> as commented by Joe RFC1191 is less reliable
> 
> [Med] RFC1191 is cited to justify why PMTU of 576 bytes was chosen.
> 
>> and
>> therefore usually not recommended. I would recommend to re-add a reference to
>> RFC4821 and no reference to RFC1191 (or only with a warning that RFC4821 is
>> preferred due to ICMP blocking). Further, the correct reference for datagram
>> PLMTUD is draft-ietf-tsvwg-datagram-plpmtud.
> 
> [Med] This is already cited in draft-ietf-intarea-frag-fragile. No need to be redundant, IMO. 

Actually, yes this is probably more an editorial comment from my side, that citing rfc4821 and draft-ietf-tsvwg-datagram-plpmtud directly could be good. But I will not hold my discuss for this.

> 
>> 2) Also on this text in SIG-004:
>> "The heartbeat interval during active mitigation could be
>>      negotiable, but MUST be frequent enough to maintain any on-path
>>      NAT or Firewall bindings during mitigation.  When TCP is used as
>>      transport, the DOTS signal channel heartbeat messages need to be
>>      frequent enough to maintain the TCP connection state."
>> 
>> As Joe commented already, different heartbeats at different layers can be
>> used
>> at the same time for different purposes. You can use heartbeats at the
>> application layer to check service availability while e.g. using a higher
>> frequent heartbeat at the transport layer to maintain firewall and NAT state.
> 
> [Med] Please note that the text you quoted is about "during active mitigation". When no attack is ongoing, we do have the following behavior which covers your comment: 
> 
>      When DOTS agents are exchanging heartbeats and no
>      mitigation request is active, either agent MAY request changes to
>      the heartbeat rate.  For example, a DOTS server might want to
>                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      reduce heartbeat frequency or cease heartbeat exchanges when an
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      active DOTS client has not requested mitigation, in order to
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>      control load.
> 
>> The advantage to such an approach is that there is less application layer
>> overhead/load e.g. in scenarios where it might be expensive to wake up the
>> application or a server is already highly loaded. Also note that the  time-
>> outs
>> values of NATs and firewalls on the path are usually unknown, therefore an
>> application can never rely on heartbeats (no matter at which level) and must
>> be
>> prepared to try to reconnect on the application layer if the connection
>> fails.
>> Usually, the main reason for using heartbeats to maintain NAT or firewall
>> state
>> (vs. reconnect every time) in TCP is if the application is time-sensitive and
>> a
>> full TCP handshake takes too long for the desired service. I'm not sure that
>> the case for DOTS, however, I understand it may be beneficial to have
>> established state if an attack is on-going.
> 
> [Med] This is important to avoid new handshakes when the client has to request a mitigation. 

This is okay but could be spelled out more explicitly as a requirement, rather than taking about the details of sending heartbeats.
> 
>> 
>> For UDP I guess it's more complicated in your case. Time-outs are usually
>> very
>> short, however, state is created with the first packet of a flow (as there is
>> no handshake in UDP). As you don't see blocking if state is expired as new
>> state is created immediately, it's kind of impossible to measure the
>> configured
>> time-out values. Only if the firewall is under attack it would start blocking
>> UDP traffic that is has no state for yet. So I understand why it is desirable
>> to maintain UDP state for you, however, I don't understand how you can know
>> that your frequency is high enough to actually keep the state open. Note that
>> TCP time-outs are usually in the order of hours, while UDP time-outs are
>> usually in range of tens of seconds, and might expire even quicker if a
>> system
>> is under attack. If that is a scenario that is important for you, and
>> assuming
>> that not all time-outs values on the path can be known, I guess it would be
>> recommendable to use TCP instead.
>> 
>> In any case this can not be a MUST requirement (as timers are usually not
>> known). I would recommend to state something like:
>> 
>> "MAY be frequent enough to maintain NAT or firewall state, if timer values
>> are
>> known, or if TCP is used, SHOULD use in addition TCP heartbeats  to maintain
>> the TCP connection state and reconnect immediately if a failure is detected."
>> 
> 
> [Med] The original wording is accurate and reflects the requirement of the WG. How this will be enforced is part of the solution/specification space.

My hold point here is that 

"MUST be frequent enough to maintain any on-path NAT or Firewall bindings during mitigation.“

cannot be a MUST requirement as the network time-out values are not known by the endpoints. Therefore it is impossible to fulfill this requirement.

> 
>> And also for this part it is different for TCP and UDP:
>> 
>> "Because heartbeat loss is much more likely during volumetric attack, DOTS
>>      agents SHOULD avoid signal channel termination when mitigation is
>>      active and heartbeats are not received by either DOTS agent for an
>>      extended period."
>> 
>> If TCP would be used and no ACKs are received, TCP would try to retransmit a
>> few times and some point terminate the connection. However, UDP is a
>> connection-less protocol, there is nothing to terminate.
> 
> [Med] The text is about "signal channel termination". The concept of DOTS session is defined here: https://tools.ietf.org/html/draft-ietf-dots-architecture-11#section-3.1 

Okay I was actually misinterpreting this. However, I actually think this is going too much into technical details for a requirements document. But re-reading I think the requirement if really needed on this level is okay.

> 
>> 
>> Also note that for reliable transports, it is sufficient if one end-hosts
>> sends
>> heartbeats as the other end is required to acknowledge the reception on the
>> transport layer (and if no ack is received the connection is terminated on
>> the
>> transport layer).
>> 
>> So I guess what you want to say above is that if a connection-less protocol
>> is
>> used, heartbeats should continuously be sent even if no heartbeats are
>> received
>> from the other end. However, I think you still need to define a termination
>> criteria, as you for sure don't want to keep sending heartbeats forever.
> 
> [Med] Agree. One condition is already cited in the above text: "when mitigation is active". A termination criteria would be that the mitigation is not active anymore. How termination is achieved is part of the solution space. 
> 
One clarification question: If mitigation is active and you loose the heartbeat, is it always the case that the mitigation ends after a well defined time or could the mitigation go on „forever"?

>> 
>> Also the next part:
>> 
>> "      *  To handle possible DOTS server restart or crash, the DOTS
>>         clients MAY attempt to establish a new signal channel session,
>>         but MUST continue to send heartbeats on the current session so
>>         that the DOTS server knows the session is still alive.  If the
>>         new session is successfully established, the DOTS client can
>>         terminate the current session."
>> 
>> There is nothing like connection re-establishing in UDP, you just keep
>> sending
>> traffic.
> 
> [Med] The text is about "signal channel session“.

Yes, misinterpreted that. That should be okay.

> 
> While in TCP, as explained above, the connection will be terminated
>> at
>> the transport layer and there is no way to keep sending heartbeats on the
>> "old"
>> session. Or do have something like DTLS in mind in this case?
> 
> [Med] Yes.
> 
>> 
>> 3) In SIG-006 you say:
>> "      Due to the higher likelihood of packet loss during a DDoS attack,
>>      DOTS servers MUST regularly send mitigation status to authorized
>>      DOTS clients which have requested and been granted mitigation,
>>      regardless of client requests for mitigation status."
>> 
>> Please note that this is only true if a not-reliable transport is used. If a
>> reliable transport is used, data is received at the application level without
>> loss (but maybe some delay) or the connection is terminated (if loss is too
>> high to retransmit successfully).
>> 
> 
> [Med] The requirement as worded is OK. 

I disagree, because as I said if a reliable transport is used this is not true. Maybe you can adapt this sentence slightly to clarify that you probably had a scenario in mind where an unreliable transport is used.

> 
>> 
>> ----------------------------------------------------------------------
>> COMMENT:
>> ----------------------------------------------------------------------
>> 
>> One editorial comment on SEC-002:
>> 
>> "A security mechanism at the network layer (e.g.,
>>      TLS) is thus adequate to provide hop-by-hop security.  In other
>>      words, end-to-end security is not required for DOTS protocols."
>> 
>> TLS is transport layer security (not network layer) and therefore known as
>> providing end-to-end security while the term hop-by-hop is used for e.g.
>> IPSec.
>> 
>> I would recommend to change the wording here in order to avoid confusion,
>> e.g.
>> 
>> "A security mechanism at the transport layer (e.g.,
>>      TLS) is thus adequate to provide security between different DOTS
>> agents.
>>      In other words, a direct security association between the server and
>>      client, excluding any proxy, is not required for DOTS protocols."
>> 
> 
> [Med] I disagree with the last part of the proposed wording. The DOTS architecture involves gateways, hence the hop-by-hop security model.

This is not a technical comment. The technical content is correct. However, as I said above, the term hop-by-hop is associated by many people in the community with something like IPSec, while application layer gateways are rather considered as endpoints. All I'm requesting is to avoid the terms end-to-end and hop-by-hop in this context as it might be confusing to others.

Mirja

>  
> 
>> And finally one general comment:
>> 
>> I understand that having wg  consensus for this document is import to proceed
>> the work of the group, however, I don't see the value in archiving this
>> document in the IETF RFC series as a stand-alone document. If the group
>> thinks
>> documenting these requirements for consumption outside the group's work at a
>> later point in time is valuable, I would rather recommend to add the
>> respective
>> requirements to the appendix of the respective protocol specs.
>> 
>> 
>> _______________________________________________
>> Dots mailing list
>> Dots@ietf.org
>> https://www.ietf.org/mailman/listinfo/dots