[Dots] AD evaluation of draft-ietf-dots-signal-call-home-09

Benjamin Kaduk <kaduk@mit.edu> Tue, 13 October 2020 23:08 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EF2D3A11FA; Tue, 13 Oct 2020 16:08:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eyLdP1OUnmyy; Tue, 13 Oct 2020 16:08:28 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E9A593A11F9; Tue, 13 Oct 2020 16:08:27 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 09DN8KPn027189 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Oct 2020 19:08:24 -0400
Date: Tue, 13 Oct 2020 16:08:19 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: draft-ietf-dots-signal-call-home.all@ietf.org
Cc: dots@ietf.org
Message-ID: <20201013230819.GA50845@kduck.mit.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/1PFJNzY2ewKOGt7bct-AUtC0Jhk>
Subject: [Dots] AD evaluation of draft-ietf-dots-signal-call-home-09
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Oct 2020 23:08:32 -0000

Hi all,

Nothing super-earth-shattering in here, but there's enough that we'll need
a revised I-D and some more WG input before I'm ready to start the IETF LC.

-Ben


YANG lint says:

ietf-dots-call-home@2020-07-07.yang:11: warning: imported module
"ietf-dots-signal-channel" not used

AFAICT that's a yanglint bug, since we do augment nodes from the signal
channel.

I see that this document has a few IPR notices against it; in
particular, the one at https://datatracker.ietf.org/ipr/3318/ does not
list any licensing terms ("Unwilling to Commit to the Provisions of a),
b), or c) Above", where a/b/c are "no license needed for
implementation"/"RAND with no fee"/"RAND with possible fee").
I believe we are working to see if the situation can be clarified and
the IPR disclosure updated, but I don't know that we need to delay any
progress until that has occurred.  That said, personally, I'm not very
comfortable about a protocol that potentially is encumbered by IPR with
unknown license terms.  However, I am willing to at least advance the
document to IETF LC if there is clear WG support for doing so, even
knowing that use of the protocol would potentially be subject to
arbitrary terms from the IPR holder.  I would really like to hear from
the WG members on this point.

I mention it in the section-by-section comments, but we introduce the
possibility of "wait for administrator approval" in the
mitigation-request processing pipeline, which could exceed a normal CoAP
request timeout.  I think we need to have some substantive text
discussion the expected behavior in this case.

Section 1.1

   Some of the DDoS attacks like spoofed RST or FIN packets, Slowloris,
   and Transport Layer Security (TLS) re-negotiation are difficult to
   detect on a home network device without adversely affecting its

(side note) TLS renegotiation as an attack is just keeping a TLS
connection open and repeatedly re-negotiating to cause the server to
burn CPU on signing operations?  I don't think I've heard of that one
before.

Section 1.2

   'DOTS signal channel Call Home' (or DOTS Call Home, for short) refers
   to a DOTS signal channel established at the initiative of a DOTS
   server.  That is, the DOTS server initiates a secure connection to a
   DOTS client, and uses that connection to receive the attack traffic
   information (e.g., attack sources) from the DOTS client.  More
   details are provided in Section 3.

I think this introductory section needs to be crystal clear for the
reader about the relationship between the "conventional" signal channel
DOTS client and the corresponding call home DOTS client for a given
mitigation process.  Some people will expect both things with "client"
in the name to be on the same device, and some people will expect the
(call home) client to be the device that makes mitigation requests, but
both cannot be right.  The phrase "role reversal" might be useful in
describing these interpretations, as might a forward reference to
section 1.4.  (Yes, I understand that there is no need for either DOTS
call home peer to be colocated with any base signal channel element, but
I think that the default scenario in many readers' minds will be the
simple role-reversal setup.)

Section 1.3

It's a little unfortunate that we are using the "DMS" acronym in the
figure here, when we only mention the terminology of RFC 8612 in section
2.  (That said, I'm willing to leave it as-is for now and see if anyone
complains.)

Section 3.1

   The DOTS signal channel Call Home preserves all but one of the DOTS
   client/server roles in the DOTS protocol stack, as compared to DOTS
   client-initiated DOTS signal channel protocol

I suppose one could quibble about whether "party allowed to behave
passively with respect to heartbeats" qualifies as a "client/server
role" ... I think I'm okay leaving this as-is for now, though.

   For example, a home network element (e.g., home router) co-located
   with a Call Home DOTS server is the (D)TLS server.  However, when

Figure 7 has a box labelled "Call Home DOTS server" and also "(D)TLS
client" (not server).  At first I thought this was trying to make an
analogy to a (normal) signal channel DOTS server as being the (D)TLS
server, but that would not be in a home network element.  So maybe this
is just a typo?

   calling home, the DOTS server initially assumes the role of the
   (D)TLS client, but the network element's role as a DOTS server

We may want to continue using "DOTS Call Home server" in these two lines.

   remains the same.  Furthermore, existing certificate chains and
   mutual authentication mechanisms between the DOTS agents are
   unaffected by the Call Home function.  This Call Home function

This may merit a bit more text, or at least a bit more thought, since we
are now asking the certificate validation to be used for a different
logical purpose.  While it's true that the entity acting as a (D)TLS
server is likely to be the same device as a regular DOTS server (or at
least operated by the same DMS provider), and so there is a fairly
strong analogue there in terms of the (D)TLS server certificate
validation procedures, the (D)TLS client is something of a different
entity than a traditional DOTS client.  It's true that RFC 8782 leaves
the specifics of the mutual authentication a bit under-specified (and
essentially just says that it has to happen somehow), but one might
imagine that the number of DOTS clients are fairly small and tied to
specific legal contracts, so manual provisioning (or provisioning during
onboarding) of client certificate information is reasonable.  For the
call home case, we should expect a lot more call home DOTS servers
(i.e., (D)TLS clients) and thus should probably have a better story for
automating the mutual authentication check.  Defining an
extendedKeyUsage value that indicate authorization to act as a call home
server would be one typical way to do so (it's perhaps unfortunate that
we didn't define an EKU for DOTS client usage), but if we're not going
to do that we should at least put some words in about how the mutual
authentication requirement remains but a different ACL may be needed for
call home than for traditional DOTS sessions.

   enables the DOTS server co-located with a network element (possibly

(Still Call Home server, right?)

Section 3.2.1

   If the Call Home DOTS server does not receive any traffic from the
   peer Call Home DOTS client during the time span required to exhaust
   the maximum 'missing-hb-allowed' threshold, the Call Home DOTS server
   concludes the session is disconnected.  Then, the Call Home DOTS
   server MUST try to resume the (D)TLS session.

Why resume specifically, as opposed to the broader "initate a new (D)TLS
connection" that could encompass both resumption and a full handshake?

Section 3.2.2

   If a Call Home DOTS client wants to redirect a Call Home DOTS server
   to another Call Home DOTS client, it MUST send a Non-confirmable PUT
   request to the predefined resource ".well-known/dots/redirect" with
   the new Call Home DOTS client FQDN or IP address in the body of the
   PUT similar to what is described in Section 4.6 of
   [I-D.ietf-dots-rfc8782-bis].  [...]

I suggest that we mention the actual element in the YANG module that
contains the structure that will be used as the PUT body, as this text
in isolation feels like it's attempting to define the protocol by way of
example and analogy, which is not a great pattern for protocol design.

Section 3.3.1

                                                                     In
      addition, the DOTS client MUST validate that attacker prefixes are
      within the scope of the DOTS server domain.

What does "within the scope" mean in the context of the base signal
channel?

   (i.e., the Call Home scenario depicted in Figure 7).  The 'target-
   uri' or 'target-fqdn' parameters can be included in a mitigation
   request for diagnostic purposes to notify the Call Home DOTS server
   domain administrator, but SHOULD NOT be used to determine the target
   IP addresses.  Note that 'target-prefix' becomes a mandatory
   attribute in the mitigation request signaling the attack information
   because 'target-uri' and 'target-fqdn' are optional attributes and
   'alias-name' will not be conveyed in a mitigation request.

I think we have to use normative language that 'target-prefix' is
mandatory for call home, since the "don't rely on target-fqdn or
target-uri' is not a MUST.  (Actually, I think they would have to be
"MUST NOT send", not just "MUST NOT rely on for identification", in
order for us to be able to get away with the current wording that states
it like a fact.)

Also, we might want to explain why 'alias-name' cannot be used (and we
don't need normative language to ensure it): they are created using the
data channel but there is no call home data channel (yet, at least).

   In order to help attack source identification by a Call Home DOTS
   server, the Call Home DOTS client SHOULD include in its mitigation
   request additional information such as 'source-port-range' or
   'source-icmp-type-range'.  The Call Home DOTS client may not include
   such information if 'source-prefix' conveys an IPv6 address/prefix.

I'm not sure what the "may not" is intending to convey, here.  Are these
mandaroy for IPv4 prefixes?

   The Call Home DOTS server MUST check that the 'source-prefix' is
   within the scope of the Call Home DOTS server domain.  Note that in a

(nit) this "MUST" seems redundant with the text I quoted previously.

   The Call Home DOTS server MUST check that the 'source-prefix' is
   within the scope of the Call Home DOTS server domain.  Note that in a
   DOTS Call Home scenario, the Call Home DOTS server considers, by
   default, that any routeable IP prefix enclosed in 'target-prefix' is
   within the scope of the Call Home DOTS client.  [...]

We say "by default" -- how would some other behavior be activated?

   with or without DOTS server domain administrator consent.  If the
   attack traffic is blocked, the Call Home DOTS server informs the Call
   Home DOTS client that the attack is being mitigated.

This is just a normal 2.xx response code (and body) to the mitigation
request?  It might be worth clarifying.

   If the attack traffic information is identified by the Call Home DOTS
   server or the Call Home DOTS server domain administrator as
   legitimate traffic, the mitigation request is rejected, and 4.09
   (Conflict) is returned to the Call Home DOTS client.  The conflict-

There may be quite some delay involved if the administrator needs to
decide.  Should we say more about (e.g.) using 5.03 and Max-Age in this
case?

   Once the request is validated by the Call Home DOTS server,
   appropriate actions are enforced to block the attack traffic within
   the source network.  The Call Home DOTS client is informed about the
   progress of the attack mitigation following the rules in
   [I-D.ietf-dots-rfc8782-bis].  For example, if the Call Home DOTS
   server is embedded in a CPE, it can program the packet processor to
   punt all the traffic from the compromised device to the target to

I think the sentence about "informed about the progress" might be
misplaced at this location within the paragraph -- the example given seems to
just be talking about the "appropriate actions" that are taken for blocking
traffic, not any mitigation-status updates.

Section 3.3.2

   If a Carrier Grade NAT (CGN, including NAT64) is located between the
   DOTS client domain and DOTS server domain, communicating an external
   IP address in a mitigation request is likely to be discarded by the
   Call Home DOTS server because the external IP address is not visible
   locally to the Call Home DOTS server (see Figure 10).  The Call Home
   DOTS server is only aware of the internal IP addresses/prefixes bound
   to its domain.  Thus, the Call Home DOTS client MUST NOT include the
   external IP address and/or port number identifying the suspect attack
   source, but MUST include the internal IP address and/or port number.

We're likely to get similar complaints about "how will they know there's
a NAT" that we did for the base signal channel.  I don't have any great
suggestions for trying to forestall such comments, though I do note that
8782 has some explicit text about "[t]his document does not make any
recommendations about possible translator discovery mechanisms".

Also, it's amusing that for the base signal channel we said to *not*
use internal addresses, but for call home we say you have to use
internal addresses.  In the base signal channel we also said that we did
not give recommendations on how to discover possible translator
mechanisms...

   To that aim, the Call Home DOTS client SHOULD rely on mechanisms,
   such as [RFC8512] or [RFC8513], to retrieve the internal IP address

... yet here we seem to be making such recommendations!

   If a MAP Border Relay [RFC7597] or lwAFTR [RFC7596] is enabled in the
   provider's domain to service its customers, the identification of an
   attack source bound to an IPv4 address/prefix MUST also rely on
   source port numbers because the same IPv4 address is assigned to
   multiple customers.  The port information is required to
   unambiguously identify the source of an attack.

[same question about how to know that they are in use]

   If a translator is enabled on the boundaries of the domain hosting
   the Call Home DOTS server (e.g., a CPE with NAT enabled as shown in
   Figures 11 and 12), the Call Home DOTS server uses the attack traffic
   information conveyed in a mitigation request to find the internal
   source IP address of the compromised device and blocks the traffic

In a similar vein, I expect to get some questions about how the call
home DOTS server finds the internal source IP address from the attack
traffic information conveyed.  I don't have a specific change to propose
at this time, since I don't know, myself, but we should at least have
some answer to give in response to such questions.

The text is also a little unclear on why we provide both Figures 11 and
12 -- while both cases are valid, we don't seem to have any discussion
that highlights differences between the cases.  So perhaps we should say
that the behavior of the call home DOTS server is the same whether or
not the call home DOTS server is integrated with the CPE/NAT (if true)?

Section 3.3.3

I think the YANG module might benefit from being moved up a level or two
in the section hierarchy.

Section 3.3.3.1

Should we give some indication that 'signal' is the import prefix for
"ietf-dots-signal-channel" before going into the tree diagram?  (I do
not know what the convention is in this regard.)

Section 3.3.3.2

I suggest reiterating the note from 8782 about needing to check the
mapping output provided by YANG-to-CBOR in light of the situations where
differing CBOR/JSON types can arise (e.g., enumerations and 64-bit
quantities).

I guess it's implicit that we reuse the CBOR map keys 8 and 9 for
lower-port and upper-port in the source-port-range array?

Section 3.3.3.3

   This module uses the common YANG types defined in [RFC6991] and the
   data structure defined in [RFC8791].

(nit) I think we need another word here, maybe "data structure
extension" or "data structure statement"?

       list source-icmp-type-range {
         key "lower-type";
         description
           "ICMP type range. When only lower-type is
            present, it represents a single ICMP type.";

It seems that the interpretation of the source-icmp-type-range list is
dependent on the IP address family in use.  Presumably one is supposed
to infer this from the source-prefix (though we don't say so), but the
source-prefix is optional when these fields are used in the base signal
channel.  It is not entirely clear whether it is safe to rely on the
target-prefix for address-family determination, though (I do not recall
any reason why DOTS signal channel doesn't work in the presence of a
NAPT function).  Should the icmp attributes only be allowed if the
source-prefix is present?

     sx:augment-structure "/signal:dots-signal/signal:message-type/"
                        + "signal:redirected-signal" {
       description
         "The alternate Call Home DOTS client.";

Is there something we can/should do for the redirected-signal
augmentation to indicate that the alt-server and alt-server-record nodes
are removed/useless?

           leaf alt-ch-client {
             type string;
             description
               "FQDN of an alternate Call Home DOTS client.";

Can we discuss a bit what the implications of redirection are for (D)TLS
mutual authentication of the post-redirection channel?  It seems that in
most cases the alt-ch-client FQDN will be needed to perform certificate
validation.  Perhaps, though, there would be a case where a call home
server has a set of preconfigured call home clients (each with IP
address(es) and credentials), so a redirection by IP address to a
different client in that set would still be functional.  So, I am not
sure that we need to make alt-ch-client a mandatory field (whether in
the YANG or in the prose), but we probably do need to have some text
earlier in the document covering the implications for authenticating the
redirected-to peer.  I note that in the base signal channel, alt-server
is mandatory, so we would have some precendent for just making
als-ch-client mandatory as well.

Section 4.2

Table 2 doesn't seem consistent with Table 1 -- Table 1 lists a couple
parameters that admit multiple CBOR types, but Table 2 only lists a
single CBOR Major Type for them.

Section 4.3

We don't have any visible note about removing TBA9 (and should probably
add some text about 4 only being the *requested* value as well, though
I'm pretty sure we'd know if there were other Standards Actions in the
works that would be potentially requesting a conflicting value!).

Section 5

We should probably say something about how the the call home channel is
a potential vector for attack, with mitigation requests potentially
causing the indicated device to be partially blocked or booted off the
network entirely; mutual authentication of a trusted call home server,
"a healthy dose of skepticism about the indicated attack" (or
rather, local inspection of the indicated traffic), and involvement of
the local administrator can mitigate the new risks that are opened up.
This is related to what we currently have in the last paragraph, but we
don't specifically discuss it as a new risk/attack vector due to this
protocol.

We might also consider referencing the security considerations of RFC
8071 (NETCONF/RESTCONF call home) since the "considerations not
associated with server authentication" are likely similar.

There may also be some considerations when the indicated attack
source-prefix is in a private or local-use address range -- the "in
scope" check at the call home server doesn't mean much.

   Common precautions mitigating DoS attacks are recommended, such as
   temporarily blacklisting the source address after a set number of
   unsuccessful authentication attempts.

[I note that we used the term "drop-list" in RFC 8783.]

Section 6

We should probably say something about how the use of DPI and similar
IPS technologies will have privacy considerations of their own, but the
specific considerations are specific to the specific device or
technology in question (thus, out of scope for this specific document).

   Concretely, the protocol does not leak any new information that can
   be used to ease surveillance.  In particular, the Call Home DOTS
   server is not required to share information that is local to its
   network (e.g., internal identifiers of an attack source) with the
   Call Home DOTS client.

I guess it's true that we don't require the call-home server to share
internal addresses with the call-home client, but we do require the
call-home client to (only) use internal addresses in the mitigation
request.  Presumably the client has to learn those addresses somehow,
and one could argue that this protocol is requiring that to happen even
if it doesn't convey them in-band in that direction.  So I think we need
to say more about the privacy considerations of using internal addresses
-- it's not safe to claim that it's out of scope.

Also, this paragraph seems to be the main part that is directly
privacy-relevant in this section -- would the other paragraphs be a
better fit in the Security Considerations section?  I guess the last
paragraph does touch on privacy with the "not meant to track the
activity of users", so it could stay here.

   Triggers to send a DOTS mitigation request to a Call Home DOTS server
   are deployment-specific.  For example, a Call Home DOTS client may
   rely on the output of some DDoS detection systems deployed within the
   DOTS client domain to detect potential outbound DDoS attacks or on
   abuse claims received from remote victim networks.  Such DDoS
   detection and mitigation techniques are not meant to track the
   activity of users, but to protect the Internet and avoid altering the
   IP reputation of the DOTS client domain.

Perhaps we could say a little more about what steps this mechanism takes
to avoid identifying users?  E.g., the indicated data refer only to the
source and target addresses where suspected attack flows are present;
while it does permit expressing flow-level granularity there is no
in-band protocol element that would correlate one suspected-attack flow
with another suspected-attack flow as being associated with the same
user.  It is, however, possible that a faulty attack classification
algorithm could consistently identify the legitimate behaviors of a
particular user as being suspected attacks, in which case that user's
traffic would consistently be flagged, but that user's traffic is still
grouped in with the entire anonymity set of "all suspected attack
traffic".

Section 9.2

If the toolchain supports it, we should probably refer to RFC 4632 as
BCP 122 in the text when we reference it.

We may get someone asking for RFC 8612 to be normative since we expect
the reader to be familiar with its terminology.  It is only an
informational document and is not on the downref registry, so in some
sense it is "safer" to preemptively move it to being a normative
reference that will automatically get called out as a downref during the
IETF LC announcement; on the other hand, the IESG does have leeway to
just approve it without another IETF LC if it does need to change from
informative to normative as a result of IETF LC or IESG comments.

I might suggest using a different slug for the [Sec] reference; the
current short form is potentially misleading.

Appendix A

   The other approach is signaling the role of each DOTS agent (e.g., by
   using the DOTS data channel).  For example, the DOTS agent in the
   home network first initiates a DOTS data channel to the peer DOTS
   agent in the ISP environment, at this time the DOTS agent in the home
   network is the DOTS client and the peer DOTS agent in the ISP
   environment is the DOTS server.  After that, the DOTS agent in the
   home network retrieves the DOTS Call Home capability of the peer DOTS
   agent.  If the peer supports the DOTS Call Home, the DOTS agent needs
   to subscribe to the peer to use this extension.  [...]

I don't remember how such a capability negotiation on the data channel
would work.  Is it supposed to just be keying off of the peer's
supported YANG modules or something like that?  Might be worth
clarifying.