[Dots] AD evaluation of draft-ietf-dots-signal-filter-control-03

Benjamin Kaduk <bkaduk@akamai.com> Wed, 13 May 2020 02:12 UTC

Date: Tue, 12 May 2020 19:12:22 -0700
From: Benjamin Kaduk <bkaduk@akamai.com>
To: draft-ietf-dots-signal-filter-control.all@ietf.org, kaduk@mit.edu
Cc: dots@ietf.org
Message-ID: <20200513021222.GY3811@akamai.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.9.4 (2018-02-28)
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/TYWdRloLP9Ws06ByAI5bspHNHFQ>
Subject: [Dots] AD evaluation of draft-ietf-dots-signal-filter-control-03
Precedence: list

Hi all,

This one is also generally in good shape, with mostly nit-level comments.
(I didn't find it on github, so those are inline.)

I have a nagging suspicion that I'm repeating some comments I made on
the core protocol documents (having forgotten the response and thus why
the change in question should not be made; my apologies in advance for
my poor memory.

Section 1.1

   The DOTS data channel protocol [I-D.ietf-dots-data-channel] is used
   for bulk data exchange between DOTS agents to improve the
   coordination of parties involved in the response to the Distributed

nit: s/the/a/

   the inbound link to the DOTS client domain.  In other words, the DOTS
   client cannot use the DOTS data channel protocol to withdraw the
   accept-list filters when a DDoS attack is in progress.  This assumes
   that this DOTS client is the owner of the filtering rule.

nit: the last sentence feels a little disconnected from the previous
discussion at the moment; I'd suggest adding at the end ", since
otherwise this DOTS client would not be authorized to modify it anyway"
or similar.

Section 1.2

   This specification addresses the problems discussed in Section 1.1 by
   adding the capability of managing filtering rules using the DOTS
   signal channel protocol, which enables a DOTS client to request the
   activation (or deactivation) of filtering rules during a DDoS attack.

nit: the grammar here does not quite match up ("capability of" vs.
"using"); perhaps s/the capability of/a capability for/ (though other
fixes are possible).

   Sample examples are provided in Section 4, in particular:

nit: "sample examples" feels redundant; maybe "some examples" or just
"examples"?

   o  Section 4.1 illustrates how the filter control extension is used
      when conflicts with Access Control List (ACLs) are detected and
      reported by a DOTS server.

nit: s/List/Lists/

Section 3.1

   A filtering rule controlled by the DOTS signal channel is identified
   by its ACL name (Section 7.2 of [I-D.ietf-dots-data-channel]).  Note

The referenced section is just an example; the actual YANG definition is
in Section 4.3 thereof.

   The activation or deactivation of an ACL by the DOTS signal channel
   overrides the 'activation-type' (defined in Section 7.2 of
   [I-D.ietf-dots-data-channel]) a priori conveyed with the filtering
   rules using the DOTS data channel protocol.

Is there a way to "cancel" the signal-channel-made changes and revert to
the previous activation-type?  Or do we just make the client remember
the previous status and manually set that status in such a case?

Section 3.2.1

Why is the acl-name an optional attribute?  What would the
interpretation be if it was not provided?

   intended control of configured filtering rules.  Concretely, the DOTS
   client conveys 'acl-list' attribute with the following sub-attributes
   in the CBOR body of a mitigation request (see the YANG-encoded
   structure in Section 3.2.2.1):

nit: YANG is a data description language, not an encoding; perhaps
"YANG-formatted structure" or just "YANG structure" is better.

   As the attack evolves, DOTS clients can adjust the 'activation-type'
   of an ACL conveyed in a mitigation request or control other filters
   as necessary.  This can be achieved by sending a PUT request with a
   new 'mid' value.

This potentially involves a lot of new 'mid' values if many change are
made using signal-channel filtering control over the course of a given
mitigation.  Is this going to cause a noticeable increase in the amount
of state required at the DOTS server?

   If the DOTS client receives a 5.03 (Service Unavailable) with a
   diagnostic payload indicating a failed ACL update as a response to an
   initial mitigation or a mitigation with adjusted scope, the DOTS
   client MUST immediately send a new request which repeats all the
   parameters as sent in the failed mitigation request but without
   including the ACL attributes.  After the expiry of Max-Age returned

Why does this need to be a MUST-level requirement?  What if the
situation has changed in the interim and the mitigation update is no
longer needed?

   in the 5.03 (Service Unavailable) response, the DOTS client retries
   with a new mitigation request (i.e., a new 'mid') that repeats all
   the parameters as sent in the failed mitigation request.

Perhaps mention "including the ACL update" for clarity?

Section 3.2.2.1

       +--rw acl-list* [acl-name] {control-filtering}?

A question for the YANG doctor, no doubt, but is there a need for a
feature indicator that gates the entire content of a given module (as
opposed to just using module-level granularity to indicate support)?

Section 3.2.2.2

     namespace
        "urn:ietf:params:xml:ns:yang:ietf-dots-signal-control";
     prefix signal-control;

Can we think of anything else that might be in scope in the future for
which "signal-control" would be ambiguous or a name conflict?

       list acl-list {
         key "acl-name";

Seeing this reminds me of a shadow of a memory that list keys are
implicitly mandatory, which seems relevant for an earlier comment.

         [...]
       }
       leaf activation-type {
         type ietf-data:activation-type;
         default "activate-when-mitigating";
         description
           "Sets the activation type of an ACL.";

This structure seems to bind us to having a single activation-type that
applies lto all of the ACLs in the acl-list; furthermore, the global
structure of where we augment means that we can only have one (acl-list,
activation-type) pair within a given mitigation request.  I *believe*
that is consistent with the core signal protocol in terms of what's
expected to be in a single message, but please confirm.

Section 4

(The section heading should probably get the same treatment of "Sample
Examples" as was previously used, and the body text here as well.)

Section 4.1

   Host: {host}:{port}

nit(?): the data-channel doc just uses "example.com" for the host header
field; it's not clear that we need to diverge from that style (applies
throughout).

   The DOTS client can also decide to send a PUT request to deactivate
   the "an-accept-list" ACL, if suspect traffic is received from an
   accept-listed source (2001:db8:1234::/48).  The structure of that PUT
   is the same as the one shown in Figure 5.

   [...]
   Uri-Path: "mid=124"

Just as a sanity-check: this "mid" of 124 is consistent with the note in
Section 3.2.1 that adjusting the 'activation-type' "[...] can be
achieved by sending a PUT request with a new 'mid' value", so the 124
(vs. 123) is intentional and correct.

            "ietf-dots-signal-control:acl-list": [
              {
                "ietf-dots-signal-control:acl-name": "an-accept-list",
                "ietf-dots-signal-control:activation-type": "deactivate"
              }
            ]
           "lifetime": 3600

Shouldn't there be a comma after the ']'?
(Have the examples been run through a JSON validator?)

Section 4.3

   Consider a DOTS client that contacts its DOTS server during 'idle'
   time to install an accept-list that rate-limits all (or a part
   thereof) traffic to be forwarded to 2001:db8:123::/48 as a last
   resort countermeasure whenever required.  It does so by sending, for

nit: I could imagine a little confusion as to whether the action
referenced by "it does so" is the installation of the accept-list (my
belief) or the rate-limiting.  Perhaps "Installing the accpe-list can be
done by sending, [...]"?

   example, a PUT request shown in Figure 9.  The DOTS server installs

nit: either "the PUT request shown" or "a PUT request as shown".

                 "actions": {
                   "forwarding": "accept",
                   "rate-limit": "20.00"

20 bytes per second?  That seems rather small.

   For some reason (e.g., the DOTS server, or the mitigator, is lacking
   a capability or capacity), the DOTS client is still receiving the
   attack traffic which saturates available links.  To soften the

nit: I think s/the attack traffic/attack traffic/ is fine grammar.

Section 5.1

   o  Note to the RFC Editor: Please delete (TBA1-TBA2-TBA3) once the
      CBOR key is assigned from the 1-16383 range.  Please update
      Table 1 accordingly.

IIRC the unallocated values in this range include both values with a
2-byte encoding and with a 3-byte encoding; do we have a reason to
prefer one or the other?  Since we're allocating from an "IETF Review"
range, we get to suggest a value to IANA and it's not immediately clear
to me how much say the DEs have in which value gets used.

Section 6

We may get someone asking "this document uses YANG; why aren't you using
the YANG security considerations template?"  I guess the signal-channel
doc made it through without any particular note in that regard, so maybe
we'll be safe for this one, too.

I'd consider mentioning that with both signal and data channels
available for updating ACL activation status, there is potential for
"skew" wherein an update made in one place is ~immediately overridden by
another update.  Fortunately, there was already some degree of
asynchronicity/external-changes possible, so clients should already be
prepared for this sort of situation and will cope properly by
polling/OBSERVE notification.

We could also consider saying something about how the restriction to
only changing activation status over the signal channel (but not
creating new ACLS entirely) means that if a client hasn't prepared
adequately prior to an attack, it can get stuck in a bad place.  This,
of course, is also not really new with this document, but is a pretty
relevant consideration for its use.

Similarly, a reminder that bad things happen when a DOTS server fails to
maintain namespace separation across clients for ACL names could be in
order (but is not required).

   A compromised DOTS client can use the filtering control capability to
   exacerbate an ongoing attack.  Likewise, such compromised DOTS client
   may abstain from reacting to an ACL conflict notification received

nit: singular/plural mismatch "DOTS client"/"such compromised" (so
either "such a compromised" or "DOTS clients").

Thanks,

Ben

[Dots] AD evaluation of draft-ietf-dots-signal-fi… Benjamin Kaduk
Re: [Dots] AD evaluation of draft-ietf-dots-signa… mohamed.boucadair
Re: [Dots] AD evaluation of draft-ietf-dots-signa… kaname nishizuka
Re: [Dots] AD evaluation of draft-ietf-dots-signa… mohamed.boucadair
Re: [Dots] AD evaluation of draft-ietf-dots-signa… Jon Shallow
Re: [Dots] AD evaluation of draft-ietf-dots-signa… mohamed.boucadair
Re: [Dots] AD evaluation of draft-ietf-dots-signa… Benjamin Kaduk