Re: [Dots] AD evaluation of draft-ietf-dots-signal-call-home-09

Benjamin Kaduk <kaduk@mit.edu> Wed, 14 October 2020 04:31 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 95B573A1366; Tue, 13 Oct 2020 21:31:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9D8H2Wrmp3cA; Tue, 13 Oct 2020 21:31:49 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3B69A3A1365; Tue, 13 Oct 2020 21:31:48 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 09E4VgKe029803 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 14 Oct 2020 00:31:47 -0400
Date: Tue, 13 Oct 2020 21:31:42 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: draft-ietf-dots-signal-call-home.all@ietf.org
Cc: dots@ietf.org
Message-ID: <20201014043142.GG50845@kduck.mit.edu>
References: <20201013230819.GA50845@kduck.mit.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20201013230819.GA50845@kduck.mit.edu>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/FhRTudDNPL0GF6CkM534-EvTrhA>
Subject: Re: [Dots] AD evaluation of draft-ietf-dots-signal-call-home-09
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Oct 2020 04:31:52 -0000

Oops, forgot to note that I also put a bunch of (hopefully!) editorial
suggestions up at https://github.com/boucadair/dots-call-home/pull/1 rather
than littering them here.

-Ben

On Tue, Oct 13, 2020 at 04:08:19PM -0700, Benjamin Kaduk wrote:
> Hi all,
> 
> Nothing super-earth-shattering in here, but there's enough that we'll need
> a revised I-D and some more WG input before I'm ready to start the IETF LC.
> 
> -Ben
> 
> 
> YANG lint says:
> 
> ietf-dots-call-home@2020-07-07.yang:11: warning: imported module
> "ietf-dots-signal-channel" not used
> 
> AFAICT that's a yanglint bug, since we do augment nodes from the signal
> channel.
> 
> I see that this document has a few IPR notices against it; in
> particular, the one at https://datatracker.ietf.org/ipr/3318/ does not
> list any licensing terms ("Unwilling to Commit to the Provisions of a),
> b), or c) Above", where a/b/c are "no license needed for
> implementation"/"RAND with no fee"/"RAND with possible fee").
> I believe we are working to see if the situation can be clarified and
> the IPR disclosure updated, but I don't know that we need to delay any
> progress until that has occurred.  That said, personally, I'm not very
> comfortable about a protocol that potentially is encumbered by IPR with
> unknown license terms.  However, I am willing to at least advance the
> document to IETF LC if there is clear WG support for doing so, even
> knowing that use of the protocol would potentially be subject to
> arbitrary terms from the IPR holder.  I would really like to hear from
> the WG members on this point.
> 
> I mention it in the section-by-section comments, but we introduce the
> possibility of "wait for administrator approval" in the
> mitigation-request processing pipeline, which could exceed a normal CoAP
> request timeout.  I think we need to have some substantive text
> discussion the expected behavior in this case.
> 
> Section 1.1
> 
>    Some of the DDoS attacks like spoofed RST or FIN packets, Slowloris,
>    and Transport Layer Security (TLS) re-negotiation are difficult to
>    detect on a home network device without adversely affecting its
> 
> (side note) TLS renegotiation as an attack is just keeping a TLS
> connection open and repeatedly re-negotiating to cause the server to
> burn CPU on signing operations?  I don't think I've heard of that one
> before.
> 
> Section 1.2
> 
>    'DOTS signal channel Call Home' (or DOTS Call Home, for short) refers
>    to a DOTS signal channel established at the initiative of a DOTS
>    server.  That is, the DOTS server initiates a secure connection to a
>    DOTS client, and uses that connection to receive the attack traffic
>    information (e.g., attack sources) from the DOTS client.  More
>    details are provided in Section 3.
> 
> I think this introductory section needs to be crystal clear for the
> reader about the relationship between the "conventional" signal channel
> DOTS client and the corresponding call home DOTS client for a given
> mitigation process.  Some people will expect both things with "client"
> in the name to be on the same device, and some people will expect the
> (call home) client to be the device that makes mitigation requests, but
> both cannot be right.  The phrase "role reversal" might be useful in
> describing these interpretations, as might a forward reference to
> section 1.4.  (Yes, I understand that there is no need for either DOTS
> call home peer to be colocated with any base signal channel element, but
> I think that the default scenario in many readers' minds will be the
> simple role-reversal setup.)
> 
> Section 1.3
> 
> It's a little unfortunate that we are using the "DMS" acronym in the
> figure here, when we only mention the terminology of RFC 8612 in section
> 2.  (That said, I'm willing to leave it as-is for now and see if anyone
> complains.)
> 
> Section 3.1
> 
>    The DOTS signal channel Call Home preserves all but one of the DOTS
>    client/server roles in the DOTS protocol stack, as compared to DOTS
>    client-initiated DOTS signal channel protocol
> 
> I suppose one could quibble about whether "party allowed to behave
> passively with respect to heartbeats" qualifies as a "client/server
> role" ... I think I'm okay leaving this as-is for now, though.
> 
>    For example, a home network element (e.g., home router) co-located
>    with a Call Home DOTS server is the (D)TLS server.  However, when
> 
> Figure 7 has a box labelled "Call Home DOTS server" and also "(D)TLS
> client" (not server).  At first I thought this was trying to make an
> analogy to a (normal) signal channel DOTS server as being the (D)TLS
> server, but that would not be in a home network element.  So maybe this
> is just a typo?
> 
>    calling home, the DOTS server initially assumes the role of the
>    (D)TLS client, but the network element's role as a DOTS server
> 
> We may want to continue using "DOTS Call Home server" in these two lines.
> 
>    remains the same.  Furthermore, existing certificate chains and
>    mutual authentication mechanisms between the DOTS agents are
>    unaffected by the Call Home function.  This Call Home function
> 
> This may merit a bit more text, or at least a bit more thought, since we
> are now asking the certificate validation to be used for a different
> logical purpose.  While it's true that the entity acting as a (D)TLS
> server is likely to be the same device as a regular DOTS server (or at
> least operated by the same DMS provider), and so there is a fairly
> strong analogue there in terms of the (D)TLS server certificate
> validation procedures, the (D)TLS client is something of a different
> entity than a traditional DOTS client.  It's true that RFC 8782 leaves
> the specifics of the mutual authentication a bit under-specified (and
> essentially just says that it has to happen somehow), but one might
> imagine that the number of DOTS clients are fairly small and tied to
> specific legal contracts, so manual provisioning (or provisioning during
> onboarding) of client certificate information is reasonable.  For the
> call home case, we should expect a lot more call home DOTS servers
> (i.e., (D)TLS clients) and thus should probably have a better story for
> automating the mutual authentication check.  Defining an
> extendedKeyUsage value that indicate authorization to act as a call home
> server would be one typical way to do so (it's perhaps unfortunate that
> we didn't define an EKU for DOTS client usage), but if we're not going
> to do that we should at least put some words in about how the mutual
> authentication requirement remains but a different ACL may be needed for
> call home than for traditional DOTS sessions.
> 
>    enables the DOTS server co-located with a network element (possibly
> 
> (Still Call Home server, right?)
> 
> Section 3.2.1
> 
>    If the Call Home DOTS server does not receive any traffic from the
>    peer Call Home DOTS client during the time span required to exhaust
>    the maximum 'missing-hb-allowed' threshold, the Call Home DOTS server
>    concludes the session is disconnected.  Then, the Call Home DOTS
>    server MUST try to resume the (D)TLS session.
> 
> Why resume specifically, as opposed to the broader "initate a new (D)TLS
> connection" that could encompass both resumption and a full handshake?
> 
> Section 3.2.2
> 
>    If a Call Home DOTS client wants to redirect a Call Home DOTS server
>    to another Call Home DOTS client, it MUST send a Non-confirmable PUT
>    request to the predefined resource ".well-known/dots/redirect" with
>    the new Call Home DOTS client FQDN or IP address in the body of the
>    PUT similar to what is described in Section 4.6 of
>    [I-D.ietf-dots-rfc8782-bis].  [...]
> 
> I suggest that we mention the actual element in the YANG module that
> contains the structure that will be used as the PUT body, as this text
> in isolation feels like it's attempting to define the protocol by way of
> example and analogy, which is not a great pattern for protocol design.
> 
> Section 3.3.1
> 
>                                                                      In
>       addition, the DOTS client MUST validate that attacker prefixes are
>       within the scope of the DOTS server domain.
> 
> What does "within the scope" mean in the context of the base signal
> channel?
> 
>    (i.e., the Call Home scenario depicted in Figure 7).  The 'target-
>    uri' or 'target-fqdn' parameters can be included in a mitigation
>    request for diagnostic purposes to notify the Call Home DOTS server
>    domain administrator, but SHOULD NOT be used to determine the target
>    IP addresses.  Note that 'target-prefix' becomes a mandatory
>    attribute in the mitigation request signaling the attack information
>    because 'target-uri' and 'target-fqdn' are optional attributes and
>    'alias-name' will not be conveyed in a mitigation request.
> 
> I think we have to use normative language that 'target-prefix' is
> mandatory for call home, since the "don't rely on target-fqdn or
> target-uri' is not a MUST.  (Actually, I think they would have to be
> "MUST NOT send", not just "MUST NOT rely on for identification", in
> order for us to be able to get away with the current wording that states
> it like a fact.)
> 
> Also, we might want to explain why 'alias-name' cannot be used (and we
> don't need normative language to ensure it): they are created using the
> data channel but there is no call home data channel (yet, at least).
> 
>    In order to help attack source identification by a Call Home DOTS
>    server, the Call Home DOTS client SHOULD include in its mitigation
>    request additional information such as 'source-port-range' or
>    'source-icmp-type-range'.  The Call Home DOTS client may not include
>    such information if 'source-prefix' conveys an IPv6 address/prefix.
> 
> I'm not sure what the "may not" is intending to convey, here.  Are these
> mandaroy for IPv4 prefixes?
> 
>    The Call Home DOTS server MUST check that the 'source-prefix' is
>    within the scope of the Call Home DOTS server domain.  Note that in a
> 
> (nit) this "MUST" seems redundant with the text I quoted previously.
> 
>    The Call Home DOTS server MUST check that the 'source-prefix' is
>    within the scope of the Call Home DOTS server domain.  Note that in a
>    DOTS Call Home scenario, the Call Home DOTS server considers, by
>    default, that any routeable IP prefix enclosed in 'target-prefix' is
>    within the scope of the Call Home DOTS client.  [...]
> 
> We say "by default" -- how would some other behavior be activated?
> 
>    with or without DOTS server domain administrator consent.  If the
>    attack traffic is blocked, the Call Home DOTS server informs the Call
>    Home DOTS client that the attack is being mitigated.
> 
> This is just a normal 2.xx response code (and body) to the mitigation
> request?  It might be worth clarifying.
> 
>    If the attack traffic information is identified by the Call Home DOTS
>    server or the Call Home DOTS server domain administrator as
>    legitimate traffic, the mitigation request is rejected, and 4.09
>    (Conflict) is returned to the Call Home DOTS client.  The conflict-
> 
> There may be quite some delay involved if the administrator needs to
> decide.  Should we say more about (e.g.) using 5.03 and Max-Age in this
> case?
> 
>    Once the request is validated by the Call Home DOTS server,
>    appropriate actions are enforced to block the attack traffic within
>    the source network.  The Call Home DOTS client is informed about the
>    progress of the attack mitigation following the rules in
>    [I-D.ietf-dots-rfc8782-bis].  For example, if the Call Home DOTS
>    server is embedded in a CPE, it can program the packet processor to
>    punt all the traffic from the compromised device to the target to
> 
> I think the sentence about "informed about the progress" might be
> misplaced at this location within the paragraph -- the example given seems to
> just be talking about the "appropriate actions" that are taken for blocking
> traffic, not any mitigation-status updates.
> 
> Section 3.3.2
> 
>    If a Carrier Grade NAT (CGN, including NAT64) is located between the
>    DOTS client domain and DOTS server domain, communicating an external
>    IP address in a mitigation request is likely to be discarded by the
>    Call Home DOTS server because the external IP address is not visible
>    locally to the Call Home DOTS server (see Figure 10).  The Call Home
>    DOTS server is only aware of the internal IP addresses/prefixes bound
>    to its domain.  Thus, the Call Home DOTS client MUST NOT include the
>    external IP address and/or port number identifying the suspect attack
>    source, but MUST include the internal IP address and/or port number.
> 
> We're likely to get similar complaints about "how will they know there's
> a NAT" that we did for the base signal channel.  I don't have any great
> suggestions for trying to forestall such comments, though I do note that
> 8782 has some explicit text about "[t]his document does not make any
> recommendations about possible translator discovery mechanisms".
> 
> Also, it's amusing that for the base signal channel we said to *not*
> use internal addresses, but for call home we say you have to use
> internal addresses.  In the base signal channel we also said that we did
> not give recommendations on how to discover possible translator
> mechanisms...
> 
>    To that aim, the Call Home DOTS client SHOULD rely on mechanisms,
>    such as [RFC8512] or [RFC8513], to retrieve the internal IP address
> 
> ... yet here we seem to be making such recommendations!
> 
>    If a MAP Border Relay [RFC7597] or lwAFTR [RFC7596] is enabled in the
>    provider's domain to service its customers, the identification of an
>    attack source bound to an IPv4 address/prefix MUST also rely on
>    source port numbers because the same IPv4 address is assigned to
>    multiple customers.  The port information is required to
>    unambiguously identify the source of an attack.
> 
> [same question about how to know that they are in use]
> 
>    If a translator is enabled on the boundaries of the domain hosting
>    the Call Home DOTS server (e.g., a CPE with NAT enabled as shown in
>    Figures 11 and 12), the Call Home DOTS server uses the attack traffic
>    information conveyed in a mitigation request to find the internal
>    source IP address of the compromised device and blocks the traffic
> 
> In a similar vein, I expect to get some questions about how the call
> home DOTS server finds the internal source IP address from the attack
> traffic information conveyed.  I don't have a specific change to propose
> at this time, since I don't know, myself, but we should at least have
> some answer to give in response to such questions.
> 
> The text is also a little unclear on why we provide both Figures 11 and
> 12 -- while both cases are valid, we don't seem to have any discussion
> that highlights differences between the cases.  So perhaps we should say
> that the behavior of the call home DOTS server is the same whether or
> not the call home DOTS server is integrated with the CPE/NAT (if true)?
> 
> Section 3.3.3
> 
> I think the YANG module might benefit from being moved up a level or two
> in the section hierarchy.
> 
> Section 3.3.3.1
> 
> Should we give some indication that 'signal' is the import prefix for
> "ietf-dots-signal-channel" before going into the tree diagram?  (I do
> not know what the convention is in this regard.)
> 
> Section 3.3.3.2
> 
> I suggest reiterating the note from 8782 about needing to check the
> mapping output provided by YANG-to-CBOR in light of the situations where
> differing CBOR/JSON types can arise (e.g., enumerations and 64-bit
> quantities).
> 
> I guess it's implicit that we reuse the CBOR map keys 8 and 9 for
> lower-port and upper-port in the source-port-range array?
> 
> Section 3.3.3.3
> 
>    This module uses the common YANG types defined in [RFC6991] and the
>    data structure defined in [RFC8791].
> 
> (nit) I think we need another word here, maybe "data structure
> extension" or "data structure statement"?
> 
>        list source-icmp-type-range {
>          key "lower-type";
>          description
>            "ICMP type range. When only lower-type is
>             present, it represents a single ICMP type.";
> 
> It seems that the interpretation of the source-icmp-type-range list is
> dependent on the IP address family in use.  Presumably one is supposed
> to infer this from the source-prefix (though we don't say so), but the
> source-prefix is optional when these fields are used in the base signal
> channel.  It is not entirely clear whether it is safe to rely on the
> target-prefix for address-family determination, though (I do not recall
> any reason why DOTS signal channel doesn't work in the presence of a
> NAPT function).  Should the icmp attributes only be allowed if the
> source-prefix is present?
> 
>      sx:augment-structure "/signal:dots-signal/signal:message-type/"
>                         + "signal:redirected-signal" {
>        description
>          "The alternate Call Home DOTS client.";
> 
> Is there something we can/should do for the redirected-signal
> augmentation to indicate that the alt-server and alt-server-record nodes
> are removed/useless?
> 
>            leaf alt-ch-client {
>              type string;
>              description
>                "FQDN of an alternate Call Home DOTS client.";
> 
> Can we discuss a bit what the implications of redirection are for (D)TLS
> mutual authentication of the post-redirection channel?  It seems that in
> most cases the alt-ch-client FQDN will be needed to perform certificate
> validation.  Perhaps, though, there would be a case where a call home
> server has a set of preconfigured call home clients (each with IP
> address(es) and credentials), so a redirection by IP address to a
> different client in that set would still be functional.  So, I am not
> sure that we need to make alt-ch-client a mandatory field (whether in
> the YANG or in the prose), but we probably do need to have some text
> earlier in the document covering the implications for authenticating the
> redirected-to peer.  I note that in the base signal channel, alt-server
> is mandatory, so we would have some precendent for just making
> als-ch-client mandatory as well.
> 
> Section 4.2
> 
> Table 2 doesn't seem consistent with Table 1 -- Table 1 lists a couple
> parameters that admit multiple CBOR types, but Table 2 only lists a
> single CBOR Major Type for them.
> 
> Section 4.3
> 
> We don't have any visible note about removing TBA9 (and should probably
> add some text about 4 only being the *requested* value as well, though
> I'm pretty sure we'd know if there were other Standards Actions in the
> works that would be potentially requesting a conflicting value!).
> 
> Section 5
> 
> We should probably say something about how the the call home channel is
> a potential vector for attack, with mitigation requests potentially
> causing the indicated device to be partially blocked or booted off the
> network entirely; mutual authentication of a trusted call home server,
> "a healthy dose of skepticism about the indicated attack" (or
> rather, local inspection of the indicated traffic), and involvement of
> the local administrator can mitigate the new risks that are opened up.
> This is related to what we currently have in the last paragraph, but we
> don't specifically discuss it as a new risk/attack vector due to this
> protocol.
> 
> We might also consider referencing the security considerations of RFC
> 8071 (NETCONF/RESTCONF call home) since the "considerations not
> associated with server authentication" are likely similar.
> 
> There may also be some considerations when the indicated attack
> source-prefix is in a private or local-use address range -- the "in
> scope" check at the call home server doesn't mean much.
> 
>    Common precautions mitigating DoS attacks are recommended, such as
>    temporarily blacklisting the source address after a set number of
>    unsuccessful authentication attempts.
> 
> [I note that we used the term "drop-list" in RFC 8783.]
> 
> Section 6
> 
> We should probably say something about how the use of DPI and similar
> IPS technologies will have privacy considerations of their own, but the
> specific considerations are specific to the specific device or
> technology in question (thus, out of scope for this specific document).
> 
>    Concretely, the protocol does not leak any new information that can
>    be used to ease surveillance.  In particular, the Call Home DOTS
>    server is not required to share information that is local to its
>    network (e.g., internal identifiers of an attack source) with the
>    Call Home DOTS client.
> 
> I guess it's true that we don't require the call-home server to share
> internal addresses with the call-home client, but we do require the
> call-home client to (only) use internal addresses in the mitigation
> request.  Presumably the client has to learn those addresses somehow,
> and one could argue that this protocol is requiring that to happen even
> if it doesn't convey them in-band in that direction.  So I think we need
> to say more about the privacy considerations of using internal addresses
> -- it's not safe to claim that it's out of scope.
> 
> Also, this paragraph seems to be the main part that is directly
> privacy-relevant in this section -- would the other paragraphs be a
> better fit in the Security Considerations section?  I guess the last
> paragraph does touch on privacy with the "not meant to track the
> activity of users", so it could stay here.
> 
>    Triggers to send a DOTS mitigation request to a Call Home DOTS server
>    are deployment-specific.  For example, a Call Home DOTS client may
>    rely on the output of some DDoS detection systems deployed within the
>    DOTS client domain to detect potential outbound DDoS attacks or on
>    abuse claims received from remote victim networks.  Such DDoS
>    detection and mitigation techniques are not meant to track the
>    activity of users, but to protect the Internet and avoid altering the
>    IP reputation of the DOTS client domain.
> 
> Perhaps we could say a little more about what steps this mechanism takes
> to avoid identifying users?  E.g., the indicated data refer only to the
> source and target addresses where suspected attack flows are present;
> while it does permit expressing flow-level granularity there is no
> in-band protocol element that would correlate one suspected-attack flow
> with another suspected-attack flow as being associated with the same
> user.  It is, however, possible that a faulty attack classification
> algorithm could consistently identify the legitimate behaviors of a
> particular user as being suspected attacks, in which case that user's
> traffic would consistently be flagged, but that user's traffic is still
> grouped in with the entire anonymity set of "all suspected attack
> traffic".
> 
> Section 9.2
> 
> If the toolchain supports it, we should probably refer to RFC 4632 as
> BCP 122 in the text when we reference it.
> 
> We may get someone asking for RFC 8612 to be normative since we expect
> the reader to be familiar with its terminology.  It is only an
> informational document and is not on the downref registry, so in some
> sense it is "safer" to preemptively move it to being a normative
> reference that will automatically get called out as a downref during the
> IETF LC announcement; on the other hand, the IESG does have leeway to
> just approve it without another IETF LC if it does need to change from
> informative to normative as a result of IETF LC or IESG comments.
> 
> I might suggest using a different slug for the [Sec] reference; the
> current short form is potentially misleading.
> 
> Appendix A
> 
>    The other approach is signaling the role of each DOTS agent (e.g., by
>    using the DOTS data channel).  For example, the DOTS agent in the
>    home network first initiates a DOTS data channel to the peer DOTS
>    agent in the ISP environment, at this time the DOTS agent in the home
>    network is the DOTS client and the peer DOTS agent in the ISP
>    environment is the DOTS server.  After that, the DOTS agent in the
>    home network retrieves the DOTS Call Home capability of the peer DOTS
>    agent.  If the peer supports the DOTS Call Home, the DOTS agent needs
>    to subscribe to the peer to use this extension.  [...]
> 
> I don't remember how such a capability negotiation on the data channel
> would work.  Is it supposed to just be keying off of the peer's
> supported YANG modules or something like that?  Might be worth
> clarifying.
>