Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)

Mirja Kuehlewind <ietf@kuehlewind.net> Mon, 06 May 2019 13:52 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F12B712017E; Mon, 6 May 2019 06:52:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iLC9RqZWHveo; Mon, 6 May 2019 06:52:19 -0700 (PDT)
Received: from wp513.webpack.hosteurope.de (wp513.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8223::]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E8D26120176; Mon, 6 May 2019 06:52:18 -0700 (PDT)
Received: from [129.192.10.3] (helo=[10.149.1.237]); authenticated by wp513.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) id 1hNe2N-0008IQ-VF; Mon, 06 May 2019 15:52:12 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\))
From: Mirja Kuehlewind <ietf@kuehlewind.net>
In-Reply-To: <787AE7BB302AE849A7480A190F8B93302EA68C1A@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
Date: Mon, 06 May 2019 15:52:10 +0200
Cc: The IESG <iesg@ietf.org>, "draft-ietf-dots-signal-channel@ietf.org" <draft-ietf-dots-signal-channel@ietf.org>, Liang Xia <frank.xialiang@huawei.com>, "dots-chairs@ietf.org" <dots-chairs@ietf.org>, "dots@ietf.org" <dots@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F5FA219E-0124-43D8-A3FE-EAEDDAB7CA22@kuehlewind.net>
References: <155672175129.924.6789867477696592350.idtracker@ietfa.amsl.com> <787AE7BB302AE849A7480A190F8B93302EA68C1A@OPEXCAUBMA2.corporate.adroot.infra.ftgroup>
To: mohamed.boucadair@orange.com
X-Mailer: Apple Mail (2.3445.104.8)
X-bounce-key: webpack.hosteurope.de;ietf@kuehlewind.net;1557150739;f7bebad5;
X-HE-SMSGID: 1hNe2N-0008IQ-VF
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/OeGuxbdV_Vdv5jy1Ti2fZiZch0c>
Subject: Re: [Dots] Mirja Kühlewind's Discuss on draft-ietf-dots-signal-channel-31: (with DISCUSS and COMMENT)
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 May 2019 13:52:23 -0000

Hi Med,

Please see inline.

> On 2. May 2019, at 12:00, <mohamed.boucadair@orange.com> <mohamed.boucadair@orange.com> wrote:
> 
> Hi Mirja, 
> 
> Thank you for the detailed review. 
> 
> Please see inline.
> 
> Cheers,
> Med
> 
>> -----Message d'origine-----
>> De : Mirja Kühlewind via Datatracker [mailto:noreply@ietf.org]
>> Envoyé : mercredi 1 mai 2019 16:43
>> À : The IESG
>> Cc : draft-ietf-dots-signal-channel@ietf.org; Liang Xia; dots-
>> chairs@ietf.org; frank.xialiang@huawei.com; dots@ietf.org
>> Objet : Mirja Kühlewind's Discuss on draft-ietf-dots-signal-channel-31: (with
>> DISCUSS and COMMENT)
>> 
>> Mirja Kühlewind has entered the following ballot position for
>> draft-ietf-dots-signal-channel-31: Discuss
>> 
>> When responding, please keep the subject line intact and reply to all
>> email addresses included in the To and CC lines. (Feel free to cut this
>> introductory paragraph, however.)
>> 
>> 
>> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
>> for more information about IESG DISCUSS and COMMENT positions.
>> 
>> 
>> The document, along with other ballot positions, can be found here:
>> https://datatracker.ietf.org/doc/draft-ietf-dots-signal-channel/
>> 
>> 
>> 
>> ----------------------------------------------------------------------
>> DISCUSS:
>> ----------------------------------------------------------------------
>> 
>> 1) Port usage (see section 3):
>> The port request for DOTS was reviewed by the port expert team. Some members
>> of
>> the team were concerned about the assignment of a separate port number for
>> DOTS
>> as Coap is used and already has a designated port number. I believe that Coap
>> is used as a transport in the case and DOTS provides a separate service
>> compared to what Coap is usually used for, however, it is not clear why DOTS
>> needs a designated port. Section 3 says that the port can either be
>> preconfigured or dynamically detected, therefore it is not clear why a fixed
>> port is needed (see also section 7.1. of RFC7605). In the port review process
>> the authored argued that a port is needed to differentiate the DOTS service
>> in
>> the network. However, this is not an endorsed usage for port numbers (see
>> section 6.2. of RFC7605). Further, I believe assigning a fixed port might
>> actually add an attack vector for DOTS, either by DDoSing the respective port
>> at the DOTS server, or any attempt to block DOTS traffic on the network from
>> the DOTS client to the DOTS server.
> 
> [Med] FWIW, below the reasons why we believe that a DOTS port is needed: 

I’ve read the communication with the experts. As I wrote in my discuss point above, I agree with you that dots is a separate service and should not use the Coap port, however, I have a different question which is why do you need a dedicated fixed port? Please see further below.

> 
> ========
> (1) 
> 
> We are familiar with RFC7605. We confirm that the port we are asking for is:
> 
>   o  ... not intended to differentiate
>      performance variations within the same service, e.g., high-speed
>      versus ordinary speed.  Performance variations can be supported
>      within a single assigned port number in context of separate
>      pairwise endpoint associations.
> 
> The example about differentiated policies is not related to ** performance ** but about being able to deliver a ** basic/normal ** DOTS service. Such DOTS service takes place in attack times...during which links are saturated by nature.
> 
> The purpose of DOTS is as such different from the typical usage of CoAP by IoT devices, DOTS signal channel is a new service.

Yes.

> 
> (2)
> 
> The WG already considered in previous versions of the specification to use the existing CoAP port number (in compliance with RFC7605), but that design was abandoned by the WG for the reasons documented in the draft. 


Okay. 

> 
> (3)
> 
> The built-in discovery of services and resources discussed in RFC 7252 requires request and response between the peers, but DOTS protocol is designed to work in congested networks (because of DDoS attack) and the DOTS client during a volumetric DDoS attack may not be able to discover services and resources. 
> 
> During a DDoS attack, the incoming link is likely to be saturated and the DOTS client can only send requests but not receive responses from the server. Assigning a DOTS service port avoids the need for resource discovery and additional RTT involved before the mitigation request can be sent to the DOTS server. 

My understanding is that you anyway need to communication to the Dots server at least once before you can request mitigation. So discovery should have been done previously and not when under attack. Further you also need to pre-configure or discover the server address.

> 
> (4)
> 
> During a DDoS attack, the traffic from the endpoints and IoT devices can be rate-limited or blocked, but DOTS protocol traffic needs to be allowed. Assigning a DOTS service port helps middleboxes to identify and not rate-limit the DOTS protocol traffic during DDoS attack.

This is not a great approach and the additional attack vector I’ve been taking about. If everything but the dots port is block or rate-limited, then an attacker is probably tempted to run their attack using the designated dots port.

> 
> (5)
> 
> Given the rise of DDoS attacks targeting CoAP-capable IoT objects (usually, low-cost, not maintained,..), policies to filter "legacy" CoAP messages (at the CPE + access/transit networks) will have implications on the delivery of DOTS service which is a key component for the emergence of protective networking architectures. Having means to differentiate traditional CoAP service from DOTS are important from a deployment standpoint.

This is not a good approach either. The port number at maximum could be used as a hint but you really need to apply different means to identify dots traffic as everybody including attack traffic could just use the dots port.

> ====
> 
>> 
>> 2) Section 4.3 says:
>> "In reference to Figure 4, the DOTS client sends two TCP SYNs and two
>>   DTLS ClientHello messages at the same time over IPv6 and IPv4."
>> However, RFC8305 explicitly states that connection attempts SHOULD NOT be
>> made
>> simultaneously (see sec 5).
>> 
> 
> [Med] This one is discussed in another thread. 

My discuss point was actually different. Due over network overload, probing should NOT be done simultaneously. However, I’ve seen that you now also changed the mechanism to only use simultaneous probing when under attack. That may be fine, however, it is not fully clear why happy eyeballs would be needed at all during an attack as the assumption is that you always have an active session (starting before the attack). I would further like to see it made even more clear that probing MUST be performed sequentially (as described in RFC8305) otherwise.
 
> 
>> Further Figure 4 shows a different order of request as recommended in the
>> text
>> (text says: "UDP over IPv6, UDP over IPv4, TCP over IPv6, and finally TCP
>> over
>> IPv4").
> 
> [Med] Actually, the text says:
> 
>  "In reference to Figure 4, the DOTS client sends two TCP SYNs and two
>   DTLS ClientHello messages at the same time over IPv6 and IPv4. "
> 
> We will add annotations to the figure. 
> 
> Also why are the UDP connection attempts repeated? I guess that is
>> meant to be the retransmission of the DTLS Hello?
> 
> [Med] Yes. 
> 
> However, usually you should
>> receive the TCP SYNACK before you retransmit 
> 
> [Med] What is retransmitted is DTLS. 
> 
> or in the best case even before
>> you start the next connection attempt. Therefore that should be not displayed
>> like this in the figure or needs further explanation.
> 
> [Med] We will add annotations to the figure. 
> 
>> 
>> 3) Why are these statements SHOULDs and not MUSTs (see section 4.4)?
>>  "DOTS agents SHOULD follow the data transmission guidelines discussed
>>   in Section 3.1.3 of [RFC8085] and control transmission behavior by
>>   not sending more than one UDP datagram per round-trip time (RTT) to
>>   the peer DOTS agent on average."
> 
> [Med] because we are aligned with 8085 which says:
> 
> " SHOULD still control their transmission behavior by not
>   sending on average more than one UDP datagram per RTT to a
>   destination “

RFC8085 uses SHOULD because it’s a gnernic guideline document for all protocols that use UPD. If there is no good reason to do otherwise in a specific protocol spec, as dots is, you need to pick suitable values and define them with a MUST.

> 
>> and
>>  "If the DOTS client cannot maintain an RTT estimate, it
>>   SHOULD NOT send more than one Non-confirmable request every 3
>>   seconds"
> 
> [Med] because 8085 says:
> 
> "Such applications SHOULD NOT send more than one UDP
>       datagram every 3 seconds”

Same as above.
> 
> 
>> as well as in section 4.4.2.1:
>>  "If the DOTS server cannot maintain an RTT
>>   estimate, it SHOULD NOT send more than one asynchronous notification
>>   every 3 seconds"
> 
> [Med] Idem as above.

Same as above.

> 
>> and again in section 4.4.2.2:
>> "The frequency of polling the DOTS server to get the
>>   mitigation status SHOULD follow the transmission guidelines in
>>   Section 3.1.3 of [RFC8085].
> 
> [Med] This is to accommodate the case where a local policy is provided to the DOTS agent. 

You need specify this further. A reference to RFC8085 alone is never enough.

> 
>> However, most of the communication pattern used by DOTS rely on a
>> request/reply
>> pattern and Coap specifies for this case that only one request can be
>> outstanding at a time (until the reply is received or message is assumed to
>> be
>> lost) (see section 4.7 and 4.2 of RFC7252) which therefore will be used in
>> this
>> case. Only migration updates are send without reply, and here a MUST would be
>> more appropriate.
>> 
>> Please also note that if there can only be one request outstanding (before a
>> reply is received) it is also not possible that requests are received out of
>> order (see e.g. 4.4.3: "If UDP is used as transport, CoAP requests may arrive
>> out-of-order.”).

However, again as I said, the causes above should anyway not resend message based on a 3 second timer because most of them are request/reply pattern and Coap specifies that there should always only be one request outstanding.


>> 
>> 4) draft-ietf-core-hop-limit is used in section 10:
>> "The presence of DOTS gateways may lead to infinite forwarding loops,
>>   which is undesirable.  To prevent and detect such loops, this
>>   document uses the Hop-Limit Option."
>> This sounds like it should be required (and normative language should be
>> used)
>> and therefore draft-ietf-core-hop-limit should also be a normative reference.
>> Also draft-ietf-core-comi should probably another normative reference.
> 
> [Med] These two items are already covered in the reply to Alexey's review. 

Okay, please move draft-ietf-core-hop-limit to normative.

I don’t see any discussion about draft-ietf-core-comi…?

> 
>> 
>> 5)Section 4.5.2: You give recommendations for min and max in a note, however,
>> these values should be specified normatively and in best with a MUST.
> 
> [Med] We can use RECOMMENDED.

Yes, please use normative language here as well. Also why is actually a maximum specified?

> 
>> 
>> 6) Section 4.7: "the DOTS
>>   agent sends a heartbeat over the signal channel to maintain its half
>>   of the channel.  The DOTS agent similarly expects a heartbeat from
>>   its peer DOTS agent"
>> and
>> "DOTS servers MAY trigger their heartbeat requests immediately after
>>   receiving heartbeat probes from peer DOTS clients."
>> Actually heartbeat should only be send in one direction (as the other end
>> will
>> send an ack) and the protocol should clearly specify which endpoint is
>> responsible for triggering the ping.
>> 
> 
> [Med] The current behavior us aligned with "SIG-004  Channel Health Monitoring" of https://tools.ietf.org/html/draft-ietf-dots-requirements-22. 

Usually one ends send the heartbeats (ping) and the other end send an ack (pong). There is no need for both ends to send heartbeats independently as the heartbeat and retransmission rate is known and therefore both ends can infer independently that the connection failed if no messages are received anymore. That also how I read SIG-004.

> 
>> 7) sec 7.3:"To avoid DOTS signal message fragmentation and the subsequent
>>   decreased probability of message delivery, DOTS agents MUST ensure
>>   that the DTLS record MUST fit within a single datagram."
>> This should be handled by the DTLS record layer and not by DOTS that works on
>> top of DTLS (actually Coap), therefor it seems straight to have a normative
>> requirement here in the DOTS spec. Also note that the calculation provided is
>> not valid for early data (0-RTT) as the hello messages could be transmitted
>> in
>> the same datagram.
>> 
> 
> [Med] Will check this. 
> 
>> 8) Also sec 7.3: "If the path
>>   MTU is not known to the DOTS server, an IP MTU of 1280 bytes SHOULD
>>   be assumed."
>>  Actually this is only true for IPv6. The later note mentions that the
>>  situation is different from IPV4, however, it should probably be made clear
>>  from the beginning that 1280 can only be assumed for IPv6.
> 
> [Med] Actually, we are echoing RFC7252: 
> 
>   If the Path MTU is not known for a destination, an IP MTU of 1280
>   bytes SHOULD be assumed; if nothing is known about the size of the
>   headers, good upper bounds are 1152 bytes for the message size and
>   1024 bytes for the payload size.


The difference is that you previously say:

"To avoid DOTS signal message fragmentation and the subsequent
   decreased probability of message delivery, DOTS agents MUST ensure
   that the DTLS record MUST fit within a single datagram.”

If this is a hard requirement (using MUST), you also MUST limit your message for IPv4 to 576 bytes. 

RFC7252, however says this instead:
"A CoAP message, appropriately encapsulated, SHOULD fit within a
   single IP packet (i.e., avoid IP fragmentation) and (by fitting into
   one UDP payload) obviously needs to fit within a single IP datagram.”

Using SHOULD.
 
> 
>> 
>> 9) sec 9.6: What's the registration policy for the newly created registries?
>> 
> 
> [Med] The text says the following: 
> 
>   New codes can be assigned via Standards Action [RFC8126].

Okay. Missed that as it was below the table. May move the text above the table. But this is of course fully editorial only.
> 
> 
>> 10) The document should more explicitly provide more guidance about when a
>> client should start a session and what should be done (from the client side)
>> if
>> a session is detected as inactive (other than during migration which is
>> discussed a bit in 4.7). Is the assumption to have basically permanently an
>> active session or connect for migration and configuration requests separately
>> at a time?
> 
> [Med] In order to avoid cryptographic handshake for new mitigation requests, the session is assumed to be established and maintained. 

If the assumption is that you always have an active session, then I actually don’t understand how you can have an happy eyeball procedure during an attack. Shouldn’t there already been an active session or at least a previous session which pre-cached information? Also if the connection is detected to be inactive, which peer should try to re-establish the connection? I guess only the client would, right? That need to be further specified!


> 
>> 
>> 
>> ----------------------------------------------------------------------
>> COMMENT:
>> ----------------------------------------------------------------------
>> 
>> 1) I really recommend to add subsections to section 4.4.1.
>> 
>> 2) section 4.4.1: "The lifetime of the
>>         deactivated mitigation request will be updated to (retry-timer
>>         + 45 seconds), so the DOTS client can refresh the deactivated
>>         mitigation request after retry-timer seconds before expiry of
>>         lifetime and check if the conflict is resolved."
>> This wording is not fully clear to me. If the life time of a deactivated
>> request in updated, isn't it active again?
> 
> [Med] A request can be updated but the status may be flagged as active or deactivate. 

Please specify explicitly what the status should be after adding the 45 seconds.
> 
> And if it is active and another
>> request is sent, isn't that request rejected again.
> 
> [Med] Likely

Then I don’t understand how this is supposed to work.
> 
> Can you please further
>> clarify.
>> 
>> 3) section 4.4.2: "lifetime:  The remaining lifetime of the mitigation
>> request,
>> in
>>      seconds."
>> Shouldn't lifetime we rather a timestamp because there is some unknown
>> transmission delay between the time when the reply is generated and the reply
>> is received, and as such a lifetime in seconds is quite meaningless for the
>> client.
> 
> [Med] We prefer lifetime because otherwise time synchronization will be needed. The use of lifetime is aligned with other similar usage: e.g., RFC6887, RFC8512, etc. 

I guess it also depends on what kind of time span we are talking about. If it is only a few seconds, transmission delay might have a big impact. However, it probably could be good in any case to advise that a refresh of a mitigation request should not be send “last minute” as there is some fuzziness because of transmission delays.

> 
>> 
>> 4) section 4.4.2.1: " For DOTS server
>>   application, the message type MUST always be set to Non-confirmable
>>   even if the underlying COAP library elects a notification to be sent
>>   in a Confirmable message."
>> I'm not sure I understand this sentence. Can you please further explain?
> 
> [Med] What is meant is to relax this behavior from RFC 7641:
> 
>   A server that transmits notifications mostly in non-confirmable
>   messages MUST send a notification in a confirmable message instead of
>   a non-confirmable message at least every 24 hours. 
> 
> DOTS application will override that behavior. 

Okay. Would be good to provide a pointer to RFC7641 and further clarify this in the document.

> 
>> 
>> 5) section 4.4.4: "For example, if there is a financial
>>   relationship between the DOTS client and server domains, the DOTS
>>   client stops incurring cost at this point."
>> I find this sentence a bit problematic given the active-but-terminating
>> period
>> is defined by server. Wouldn't that mean the server can make me pay for
>> undefined period of time?
> 
> [Med] That is only an example if agreed between the client and server. Such considerations are out of scope. 

What I’m saying is that this is not a very good example for the mechanism you provide. So either if that is a valid example, you should probably reconsider your mechanism or I’d advise you to not use this as an example.

> 
> Also the max of 300 sec doesn't seem to be a
>> MUST...?
>> 
> 
> [Med] Other values can be used in specific deployments/agreements. 

The text says: 

“.. MAY exponentially
   increase (the base of the exponent is 2) the active-but-terminating
   period up to a maximum of 300 seconds (5 minutes).”

Which sounds like an explicit maximum and to clarify this I would recommend use of normative language. If that is not meant to be an actual must, then I would recommend rephrasing this sentence.



> 
>> 6) In section Section 4.5 you talk about Caop Ping/Pong. However, these terms
>> are not used in RFC7252. Maybe clarify that empty confirmable  messages are
>> used and provide a pointer to section 4.2. of RFC7252 right here (instead of
>> only later).
> 
> [Med] Will do. Thanks.
> 
>> 
>> 7) High-level question: Given this doc specifies a YANG model, why are
>> configuration are not retrieved and changed using NETCONF or RESTCONF?
>> 
> 
> [Med] The draft says the following: 
> 
>   This YANG module (ietf-dots-signal-channel) defines the DOTS client
>   interaction with the DOTS server as seen by the DOTS client.  A DOTS
>   server is allowed to update the non-configurable 'ro' entities in the
>   responses.  This YANG module is not intended to be used via NETCONF/
>   RESTCONF for DOTS server management purposes; such module is out of
>   the scope of this document.  It serves only to provide a data model
>   and encoding, but not a management data model.

Yes, I understand why you want to use Coap when under attack. However, the configuration part (Figure 19/20) is not done during the attack, therefore NETCONF/RESTCONF could be used (or it could even be part of the dots data channel).

Mirja


> 
> Thank you again for the detailed review. 
> 
> Cheers,
> Med 
>