[Dots] AD evaluation of draft-ietf-dots-signal-call-home-09
Benjamin Kaduk <kaduk@mit.edu> Tue, 13 October 2020 23:08 UTC
Return-Path: <kaduk@mit.edu>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EF2D3A11FA; Tue, 13 Oct 2020 16:08:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eyLdP1OUnmyy; Tue, 13 Oct 2020 16:08:28 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E9A593A11F9; Tue, 13 Oct 2020 16:08:27 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 09DN8KPn027189 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Oct 2020 19:08:24 -0400
Date: Tue, 13 Oct 2020 16:08:19 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: draft-ietf-dots-signal-call-home.all@ietf.org
Cc: dots@ietf.org
Message-ID: <20201013230819.GA50845@kduck.mit.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/1PFJNzY2ewKOGt7bct-AUtC0Jhk>
Subject: [Dots] AD evaluation of draft-ietf-dots-signal-call-home-09
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Oct 2020 23:08:32 -0000
Hi all, Nothing super-earth-shattering in here, but there's enough that we'll need a revised I-D and some more WG input before I'm ready to start the IETF LC. -Ben YANG lint says: ietf-dots-call-home@2020-07-07.yang:11: warning: imported module "ietf-dots-signal-channel" not used AFAICT that's a yanglint bug, since we do augment nodes from the signal channel. I see that this document has a few IPR notices against it; in particular, the one at https://datatracker.ietf.org/ipr/3318/ does not list any licensing terms ("Unwilling to Commit to the Provisions of a), b), or c) Above", where a/b/c are "no license needed for implementation"/"RAND with no fee"/"RAND with possible fee"). I believe we are working to see if the situation can be clarified and the IPR disclosure updated, but I don't know that we need to delay any progress until that has occurred. That said, personally, I'm not very comfortable about a protocol that potentially is encumbered by IPR with unknown license terms. However, I am willing to at least advance the document to IETF LC if there is clear WG support for doing so, even knowing that use of the protocol would potentially be subject to arbitrary terms from the IPR holder. I would really like to hear from the WG members on this point. I mention it in the section-by-section comments, but we introduce the possibility of "wait for administrator approval" in the mitigation-request processing pipeline, which could exceed a normal CoAP request timeout. I think we need to have some substantive text discussion the expected behavior in this case. Section 1.1 Some of the DDoS attacks like spoofed RST or FIN packets, Slowloris, and Transport Layer Security (TLS) re-negotiation are difficult to detect on a home network device without adversely affecting its (side note) TLS renegotiation as an attack is just keeping a TLS connection open and repeatedly re-negotiating to cause the server to burn CPU on signing operations? I don't think I've heard of that one before. Section 1.2 'DOTS signal channel Call Home' (or DOTS Call Home, for short) refers to a DOTS signal channel established at the initiative of a DOTS server. That is, the DOTS server initiates a secure connection to a DOTS client, and uses that connection to receive the attack traffic information (e.g., attack sources) from the DOTS client. More details are provided in Section 3. I think this introductory section needs to be crystal clear for the reader about the relationship between the "conventional" signal channel DOTS client and the corresponding call home DOTS client for a given mitigation process. Some people will expect both things with "client" in the name to be on the same device, and some people will expect the (call home) client to be the device that makes mitigation requests, but both cannot be right. The phrase "role reversal" might be useful in describing these interpretations, as might a forward reference to section 1.4. (Yes, I understand that there is no need for either DOTS call home peer to be colocated with any base signal channel element, but I think that the default scenario in many readers' minds will be the simple role-reversal setup.) Section 1.3 It's a little unfortunate that we are using the "DMS" acronym in the figure here, when we only mention the terminology of RFC 8612 in section 2. (That said, I'm willing to leave it as-is for now and see if anyone complains.) Section 3.1 The DOTS signal channel Call Home preserves all but one of the DOTS client/server roles in the DOTS protocol stack, as compared to DOTS client-initiated DOTS signal channel protocol I suppose one could quibble about whether "party allowed to behave passively with respect to heartbeats" qualifies as a "client/server role" ... I think I'm okay leaving this as-is for now, though. For example, a home network element (e.g., home router) co-located with a Call Home DOTS server is the (D)TLS server. However, when Figure 7 has a box labelled "Call Home DOTS server" and also "(D)TLS client" (not server). At first I thought this was trying to make an analogy to a (normal) signal channel DOTS server as being the (D)TLS server, but that would not be in a home network element. So maybe this is just a typo? calling home, the DOTS server initially assumes the role of the (D)TLS client, but the network element's role as a DOTS server We may want to continue using "DOTS Call Home server" in these two lines. remains the same. Furthermore, existing certificate chains and mutual authentication mechanisms between the DOTS agents are unaffected by the Call Home function. This Call Home function This may merit a bit more text, or at least a bit more thought, since we are now asking the certificate validation to be used for a different logical purpose. While it's true that the entity acting as a (D)TLS server is likely to be the same device as a regular DOTS server (or at least operated by the same DMS provider), and so there is a fairly strong analogue there in terms of the (D)TLS server certificate validation procedures, the (D)TLS client is something of a different entity than a traditional DOTS client. It's true that RFC 8782 leaves the specifics of the mutual authentication a bit under-specified (and essentially just says that it has to happen somehow), but one might imagine that the number of DOTS clients are fairly small and tied to specific legal contracts, so manual provisioning (or provisioning during onboarding) of client certificate information is reasonable. For the call home case, we should expect a lot more call home DOTS servers (i.e., (D)TLS clients) and thus should probably have a better story for automating the mutual authentication check. Defining an extendedKeyUsage value that indicate authorization to act as a call home server would be one typical way to do so (it's perhaps unfortunate that we didn't define an EKU for DOTS client usage), but if we're not going to do that we should at least put some words in about how the mutual authentication requirement remains but a different ACL may be needed for call home than for traditional DOTS sessions. enables the DOTS server co-located with a network element (possibly (Still Call Home server, right?) Section 3.2.1 If the Call Home DOTS server does not receive any traffic from the peer Call Home DOTS client during the time span required to exhaust the maximum 'missing-hb-allowed' threshold, the Call Home DOTS server concludes the session is disconnected. Then, the Call Home DOTS server MUST try to resume the (D)TLS session. Why resume specifically, as opposed to the broader "initate a new (D)TLS connection" that could encompass both resumption and a full handshake? Section 3.2.2 If a Call Home DOTS client wants to redirect a Call Home DOTS server to another Call Home DOTS client, it MUST send a Non-confirmable PUT request to the predefined resource ".well-known/dots/redirect" with the new Call Home DOTS client FQDN or IP address in the body of the PUT similar to what is described in Section 4.6 of [I-D.ietf-dots-rfc8782-bis]. [...] I suggest that we mention the actual element in the YANG module that contains the structure that will be used as the PUT body, as this text in isolation feels like it's attempting to define the protocol by way of example and analogy, which is not a great pattern for protocol design. Section 3.3.1 In addition, the DOTS client MUST validate that attacker prefixes are within the scope of the DOTS server domain. What does "within the scope" mean in the context of the base signal channel? (i.e., the Call Home scenario depicted in Figure 7). The 'target- uri' or 'target-fqdn' parameters can be included in a mitigation request for diagnostic purposes to notify the Call Home DOTS server domain administrator, but SHOULD NOT be used to determine the target IP addresses. Note that 'target-prefix' becomes a mandatory attribute in the mitigation request signaling the attack information because 'target-uri' and 'target-fqdn' are optional attributes and 'alias-name' will not be conveyed in a mitigation request. I think we have to use normative language that 'target-prefix' is mandatory for call home, since the "don't rely on target-fqdn or target-uri' is not a MUST. (Actually, I think they would have to be "MUST NOT send", not just "MUST NOT rely on for identification", in order for us to be able to get away with the current wording that states it like a fact.) Also, we might want to explain why 'alias-name' cannot be used (and we don't need normative language to ensure it): they are created using the data channel but there is no call home data channel (yet, at least). In order to help attack source identification by a Call Home DOTS server, the Call Home DOTS client SHOULD include in its mitigation request additional information such as 'source-port-range' or 'source-icmp-type-range'. The Call Home DOTS client may not include such information if 'source-prefix' conveys an IPv6 address/prefix. I'm not sure what the "may not" is intending to convey, here. Are these mandaroy for IPv4 prefixes? The Call Home DOTS server MUST check that the 'source-prefix' is within the scope of the Call Home DOTS server domain. Note that in a (nit) this "MUST" seems redundant with the text I quoted previously. The Call Home DOTS server MUST check that the 'source-prefix' is within the scope of the Call Home DOTS server domain. Note that in a DOTS Call Home scenario, the Call Home DOTS server considers, by default, that any routeable IP prefix enclosed in 'target-prefix' is within the scope of the Call Home DOTS client. [...] We say "by default" -- how would some other behavior be activated? with or without DOTS server domain administrator consent. If the attack traffic is blocked, the Call Home DOTS server informs the Call Home DOTS client that the attack is being mitigated. This is just a normal 2.xx response code (and body) to the mitigation request? It might be worth clarifying. If the attack traffic information is identified by the Call Home DOTS server or the Call Home DOTS server domain administrator as legitimate traffic, the mitigation request is rejected, and 4.09 (Conflict) is returned to the Call Home DOTS client. The conflict- There may be quite some delay involved if the administrator needs to decide. Should we say more about (e.g.) using 5.03 and Max-Age in this case? Once the request is validated by the Call Home DOTS server, appropriate actions are enforced to block the attack traffic within the source network. The Call Home DOTS client is informed about the progress of the attack mitigation following the rules in [I-D.ietf-dots-rfc8782-bis]. For example, if the Call Home DOTS server is embedded in a CPE, it can program the packet processor to punt all the traffic from the compromised device to the target to I think the sentence about "informed about the progress" might be misplaced at this location within the paragraph -- the example given seems to just be talking about the "appropriate actions" that are taken for blocking traffic, not any mitigation-status updates. Section 3.3.2 If a Carrier Grade NAT (CGN, including NAT64) is located between the DOTS client domain and DOTS server domain, communicating an external IP address in a mitigation request is likely to be discarded by the Call Home DOTS server because the external IP address is not visible locally to the Call Home DOTS server (see Figure 10). The Call Home DOTS server is only aware of the internal IP addresses/prefixes bound to its domain. Thus, the Call Home DOTS client MUST NOT include the external IP address and/or port number identifying the suspect attack source, but MUST include the internal IP address and/or port number. We're likely to get similar complaints about "how will they know there's a NAT" that we did for the base signal channel. I don't have any great suggestions for trying to forestall such comments, though I do note that 8782 has some explicit text about "[t]his document does not make any recommendations about possible translator discovery mechanisms". Also, it's amusing that for the base signal channel we said to *not* use internal addresses, but for call home we say you have to use internal addresses. In the base signal channel we also said that we did not give recommendations on how to discover possible translator mechanisms... To that aim, the Call Home DOTS client SHOULD rely on mechanisms, such as [RFC8512] or [RFC8513], to retrieve the internal IP address ... yet here we seem to be making such recommendations! If a MAP Border Relay [RFC7597] or lwAFTR [RFC7596] is enabled in the provider's domain to service its customers, the identification of an attack source bound to an IPv4 address/prefix MUST also rely on source port numbers because the same IPv4 address is assigned to multiple customers. The port information is required to unambiguously identify the source of an attack. [same question about how to know that they are in use] If a translator is enabled on the boundaries of the domain hosting the Call Home DOTS server (e.g., a CPE with NAT enabled as shown in Figures 11 and 12), the Call Home DOTS server uses the attack traffic information conveyed in a mitigation request to find the internal source IP address of the compromised device and blocks the traffic In a similar vein, I expect to get some questions about how the call home DOTS server finds the internal source IP address from the attack traffic information conveyed. I don't have a specific change to propose at this time, since I don't know, myself, but we should at least have some answer to give in response to such questions. The text is also a little unclear on why we provide both Figures 11 and 12 -- while both cases are valid, we don't seem to have any discussion that highlights differences between the cases. So perhaps we should say that the behavior of the call home DOTS server is the same whether or not the call home DOTS server is integrated with the CPE/NAT (if true)? Section 3.3.3 I think the YANG module might benefit from being moved up a level or two in the section hierarchy. Section 3.3.3.1 Should we give some indication that 'signal' is the import prefix for "ietf-dots-signal-channel" before going into the tree diagram? (I do not know what the convention is in this regard.) Section 3.3.3.2 I suggest reiterating the note from 8782 about needing to check the mapping output provided by YANG-to-CBOR in light of the situations where differing CBOR/JSON types can arise (e.g., enumerations and 64-bit quantities). I guess it's implicit that we reuse the CBOR map keys 8 and 9 for lower-port and upper-port in the source-port-range array? Section 3.3.3.3 This module uses the common YANG types defined in [RFC6991] and the data structure defined in [RFC8791]. (nit) I think we need another word here, maybe "data structure extension" or "data structure statement"? list source-icmp-type-range { key "lower-type"; description "ICMP type range. When only lower-type is present, it represents a single ICMP type."; It seems that the interpretation of the source-icmp-type-range list is dependent on the IP address family in use. Presumably one is supposed to infer this from the source-prefix (though we don't say so), but the source-prefix is optional when these fields are used in the base signal channel. It is not entirely clear whether it is safe to rely on the target-prefix for address-family determination, though (I do not recall any reason why DOTS signal channel doesn't work in the presence of a NAPT function). Should the icmp attributes only be allowed if the source-prefix is present? sx:augment-structure "/signal:dots-signal/signal:message-type/" + "signal:redirected-signal" { description "The alternate Call Home DOTS client."; Is there something we can/should do for the redirected-signal augmentation to indicate that the alt-server and alt-server-record nodes are removed/useless? leaf alt-ch-client { type string; description "FQDN of an alternate Call Home DOTS client."; Can we discuss a bit what the implications of redirection are for (D)TLS mutual authentication of the post-redirection channel? It seems that in most cases the alt-ch-client FQDN will be needed to perform certificate validation. Perhaps, though, there would be a case where a call home server has a set of preconfigured call home clients (each with IP address(es) and credentials), so a redirection by IP address to a different client in that set would still be functional. So, I am not sure that we need to make alt-ch-client a mandatory field (whether in the YANG or in the prose), but we probably do need to have some text earlier in the document covering the implications for authenticating the redirected-to peer. I note that in the base signal channel, alt-server is mandatory, so we would have some precendent for just making als-ch-client mandatory as well. Section 4.2 Table 2 doesn't seem consistent with Table 1 -- Table 1 lists a couple parameters that admit multiple CBOR types, but Table 2 only lists a single CBOR Major Type for them. Section 4.3 We don't have any visible note about removing TBA9 (and should probably add some text about 4 only being the *requested* value as well, though I'm pretty sure we'd know if there were other Standards Actions in the works that would be potentially requesting a conflicting value!). Section 5 We should probably say something about how the the call home channel is a potential vector for attack, with mitigation requests potentially causing the indicated device to be partially blocked or booted off the network entirely; mutual authentication of a trusted call home server, "a healthy dose of skepticism about the indicated attack" (or rather, local inspection of the indicated traffic), and involvement of the local administrator can mitigate the new risks that are opened up. This is related to what we currently have in the last paragraph, but we don't specifically discuss it as a new risk/attack vector due to this protocol. We might also consider referencing the security considerations of RFC 8071 (NETCONF/RESTCONF call home) since the "considerations not associated with server authentication" are likely similar. There may also be some considerations when the indicated attack source-prefix is in a private or local-use address range -- the "in scope" check at the call home server doesn't mean much. Common precautions mitigating DoS attacks are recommended, such as temporarily blacklisting the source address after a set number of unsuccessful authentication attempts. [I note that we used the term "drop-list" in RFC 8783.] Section 6 We should probably say something about how the use of DPI and similar IPS technologies will have privacy considerations of their own, but the specific considerations are specific to the specific device or technology in question (thus, out of scope for this specific document). Concretely, the protocol does not leak any new information that can be used to ease surveillance. In particular, the Call Home DOTS server is not required to share information that is local to its network (e.g., internal identifiers of an attack source) with the Call Home DOTS client. I guess it's true that we don't require the call-home server to share internal addresses with the call-home client, but we do require the call-home client to (only) use internal addresses in the mitigation request. Presumably the client has to learn those addresses somehow, and one could argue that this protocol is requiring that to happen even if it doesn't convey them in-band in that direction. So I think we need to say more about the privacy considerations of using internal addresses -- it's not safe to claim that it's out of scope. Also, this paragraph seems to be the main part that is directly privacy-relevant in this section -- would the other paragraphs be a better fit in the Security Considerations section? I guess the last paragraph does touch on privacy with the "not meant to track the activity of users", so it could stay here. Triggers to send a DOTS mitigation request to a Call Home DOTS server are deployment-specific. For example, a Call Home DOTS client may rely on the output of some DDoS detection systems deployed within the DOTS client domain to detect potential outbound DDoS attacks or on abuse claims received from remote victim networks. Such DDoS detection and mitigation techniques are not meant to track the activity of users, but to protect the Internet and avoid altering the IP reputation of the DOTS client domain. Perhaps we could say a little more about what steps this mechanism takes to avoid identifying users? E.g., the indicated data refer only to the source and target addresses where suspected attack flows are present; while it does permit expressing flow-level granularity there is no in-band protocol element that would correlate one suspected-attack flow with another suspected-attack flow as being associated with the same user. It is, however, possible that a faulty attack classification algorithm could consistently identify the legitimate behaviors of a particular user as being suspected attacks, in which case that user's traffic would consistently be flagged, but that user's traffic is still grouped in with the entire anonymity set of "all suspected attack traffic". Section 9.2 If the toolchain supports it, we should probably refer to RFC 4632 as BCP 122 in the text when we reference it. We may get someone asking for RFC 8612 to be normative since we expect the reader to be familiar with its terminology. It is only an informational document and is not on the downref registry, so in some sense it is "safer" to preemptively move it to being a normative reference that will automatically get called out as a downref during the IETF LC announcement; on the other hand, the IESG does have leeway to just approve it without another IETF LC if it does need to change from informative to normative as a result of IETF LC or IESG comments. I might suggest using a different slug for the [Sec] reference; the current short form is potentially misleading. Appendix A The other approach is signaling the role of each DOTS agent (e.g., by using the DOTS data channel). For example, the DOTS agent in the home network first initiates a DOTS data channel to the peer DOTS agent in the ISP environment, at this time the DOTS agent in the home network is the DOTS client and the peer DOTS agent in the ISP environment is the DOTS server. After that, the DOTS agent in the home network retrieves the DOTS Call Home capability of the peer DOTS agent. If the peer supports the DOTS Call Home, the DOTS agent needs to subscribe to the peer to use this extension. [...] I don't remember how such a capability negotiation on the data channel would work. Is it supposed to just be keying off of the peer's supported YANG modules or something like that? Might be worth clarifying.
- [Dots] AD evaluation of draft-ietf-dots-signal-ca… Benjamin Kaduk
- Re: [Dots] AD evaluation of draft-ietf-dots-signa… Benjamin Kaduk
- Re: [Dots] AD evaluation of draft-ietf-dots-signa… mohamed.boucadair