Re: [Dots] Some notes on ietf-dots-signal-channel-02

"Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com> Wed, 09 August 2017 08:28 UTC

From: "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com>
To: Dave Dolson <ddolson@sandvine.com>, "dots@ietf.org" <dots@ietf.org>
Thread-Topic: Some notes on ietf-dots-signal-channel-02
Thread-Index: AdMA7vzOFdFEa4W8RY6lSPNLnIlWkAMAfw0Q
Date: Wed, 09 Aug 2017 08:28:13 +0000
Message-ID: <DM5PR16MB1788B1B880EEC2F7E9BB88AEEA8B0@DM5PR16MB1788.namprd16.prod.outlook.com>
References: <E8355113905631478EFF04F5AA706E98A906B8A4@wtl-exchp-2.sandvine.com>
In-Reply-To: <E8355113905631478EFF04F5AA706E98A906B8A4@wtl-exchp-2.sandvine.com>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: McAfee.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_DM5PR16MB1788B1B880EEC2F7E9BB88AEEA8B0DM5PR16MB1788namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Aug 2017 08:28:14.0428 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4943e38c-6dd4-428c-886d-24932bc2d5de
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR16MB1786
X-OriginatorOrg: mcafee.com
X-NAI-Spam-Flag: NO
X-NAI-Spam-Level:
X-NAI-Spam-Threshold: 15
X-NAI-Spam-Score: 0.1
X-NAI-Spam-Version: 2.3.0.9418 : core <6089> : inlines <6013> : streams <1757812> : uri <2479562>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/XwcpPrLuC90JAPKFor-fjisvqbY>
Subject: Re: [Dots] Some notes on ietf-dots-signal-channel-02
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Aug 2017 08:28:28 -0000

Hi Dave,

Thanks for the review.  Please see inline.

From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of Dave Dolson
Sent: Thursday, July 20, 2017 8:06 AM
To: dots@ietf.org<mailto:dots@ietf.org>
Subject: [Dots] Some notes on ietf-dots-signal-channel-02

Generally I think the document is pretty good, but I found a number of questions and nits.

BTW, is this document in github? I was going to make a pull request, but couldn't find it.

For target-protocol, I think could be clarified using this reference: https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml

[TR] Yes, updated draft.

There is missing information about some filters:
fqdn: which FQDN is supposed to go here? I'm unclear on what would be mitigated? DNS? HTTP?

uri: does this mean an HTTP resource is under attack? It doesn't say HTTP.

[TR] Added the following line to address the above two comments:
FQDN and URI mitigation scopes may be thought of as a form of scope alias, in which the  addresses to which the domain name or URI resolve represent the full scope of the mitigation.

lifetime:  I recommend a very large number be used for no timeout. (this might be a pet peeve of mine, but why do people like to use 0 to mean infinite?)  I think in the future we might want to use 0 to mean "end now".

[TR]Instead of a large number used -1 to indicate indefinite lifetime.

CBOR supports binary byte strings. Therefore I suggest that IP addresses be represented as 4-byte or 16-byte byte-strings rather than the bulkier human-readable strings that may require ~40 bytes.
However, then prefixes would be represented like: "target-prefix": [ {"prefix": byte-string[16], "length": integer}]

At this time I also suggest supporting IPv4-mapped-v6 addresses to unify IPv4 and IPv6 code.

[TR] In https://tools.ietf.org/html/rfc6021 YANG data types for IP address and prefix (for both IPv4 and IPv6) are defined. IP addresses are prefixes are represented using YANG string built-in type.  We followed the encoding rules given in https://tools.ietf.org/html/draft-ietf-core-yang-cbor-04#section-5.4 to represent IP address and prefix in CBOR text data string.

Although multiple mitigation-ids may be set, this wording confused me:
   If two mitigation requests
   have overlapping mitigation scopes the mitigation request with higher
   numeric mitigation-id value will override the mitigation request with
   a lower numeric mitigation-id value
This sounds like higher-valued IDs are supposed to replace smaller values. But that isn't intended, I don't think. Both mitigation-IDs should be stored, but that comment is about logic in scrubbing rules and attributing bytes-dropped.

[TR] Higher-valued ID replaces lower-valued ID only when it has overlapping mitigations scope otherwise mitigation scopes in both higher-valued and lower-valued IDs are used.

I think there may be a race condition to consider, when we allow for reordering:
If this sequence is sent:
  PUT  (ID=1)
  PUT (ID=1)
  DELETE (ID=1)
But this sequence is received:
  PUT (ID=1)
  DELETE (ID=1)
  PUT  (ID=1)
Then the ID 1 will be incorrectly stored. The solution is that the server needs to remember deleted IDs for some time.

[TR] Good point, updated draft.

The URIs for PUT and DELETE are specified differently (v1 and version). I suspect they are intended to be the same, probably "v1" ?

[TR[ Yes, fixed.

Currently the DELETE is specified to return an error if deleting something not found. What is the value of this error? Why not always return 2.02, thereby giving DELETE idempotent behavior.

[TR] Yes, updated draft (return code 2.02 will be returned even if mitigation-id is not found).

Regarding the 5min timeout after client DELETE, did we consider making that configurable in the delete message?

[TR] No.

This could allow a client to be a bit smarter. If timeout=0, it would mean stop right now.

[TR] Updated draft to align with the requirement SIG-005 in https://tools.ietf.org/html/draft-ietf-dots-requirements-06; SIG-005 does not discuss configurable active-but-terminating period.

I'm unclear on how I would determine bps-dropped or pps-dropped. What time denominator? (Generally I find rate measurements very difficult to compute and explain in a way that makes everyone happy.) Can we just let the client compute bytes/time ?

[TR] bytes per second and packets per second are supported by network devices, BGP flow spec supports traffic rate-limit in bps and pps.

The -dropped counters show a clear bias towards scrubbers that simply drop packets. Are there other actions to be considered? E.g., DSCP marking?

[TR] No, DDoS mitigator does not drop all the traffic, it drops attack traffic. I don't think DSCP remarking will help handle the DDoS attack !

Regarding "status", I'm unclear on what code 2 means. It sounds like all traffic is being dropped, but I don't consider that the most successful outcome.

[TR] No, only the attack traffic is dropped.

I think it might be better to say "Mitigation is in progress within capability" (in contrast to "exceeded capability" code 4).

[TR] No, "exceeded capability" is to discuss the scenario where the DDoS attack grows beyond the mitigating domain's capabilities (see the discussion in https://tools.ietf.org/html/draft-ietf-dots-architecture-04#section-3.2.3).

Regarding "observe" It says,

    A DOTS client that is no longer

   interested in receiving notifications from the DOTS server can simply

   "forget" the observation.
More specifically, doesn't it have to make the request with "Observe=1" ?

[TR] No, the client forgets the observation and if it receives notification from the server then it sends reset message, this causes the server to remove the associated entry from the list of observers. https://tools.ietf.org/html/rfc7641#section-3.6 discusses this mechanism and "observer=1" to deregister the client. Updated draft to suggest "observe=1" as an alternate mechanism.

I think there may be some race conditions with the "observe" option.
E.g., these messages reordered:
  Observe=0
  Observe=1
I'm not sure what the solution is. (I didn't really read RFC7641 to see if it was discussed)

[TR] The server should return an error, but it's not discussed in RFC7641.

The efficacy update seems to use the same URI as the one for requesting mitigation. How would the server know which type of message? I suspect we intended to use a different URI.
Scanning the document, a second look at all of the URIs might be in order. We want the URI to indicate the type of operation, not inferred from the content of the body.

[TR] Updated draft to use if-match Option to indicate mitigation update request (this option is discussed in https://tools.ietf.org/html/rfc7252#section-5.10.8.1). I don't see the need for a different URI.

Regarding heartbeat, what are the consequences of failure? I'm unclear on what action should be taken.

[TR] The DOTS client will consider that the session is terminated and may initiate a new session using (D)TLS session resumption.

Please note that when under attack, round-trip times might be VERY large due to buffer bloat. A colleague of mine measured ping times exceeding 60s in a hotel!

[TR] Yes, RTT and packet loss could be high. Heartbeats are used for multiple reasons to keep the NAT/firewall bindings alive,  check (D)TLS state is maintained by the peer to detect if the DOTS session is terminated.  To provide more flexibility to the DOTS agents, CoAP heartbeat is used instead of DTLS heartbeat.

On that note, should we give guidance about application-layer time-outs? A naïve implementation might pick something like 5s timeout. A client should be trying multiple transports and therefore have an async approach to writing the application.

[TR] Section 5.4.2 in this draft discusses various transmission parameters to deal with adverse network conditions (heartbeat timeout, max-retransmit and ack-timeout).  https://tools.ietf.org/html/rfc7252#section-4.8.2 explains the various time values derived from the default transmission parameters. CoAP allows applications to configure the transmission parameters, and the default transmission parameters can be modified to max-retransmit = 7, heartbeat timeout will be equal to MAX_TRANSMIT_TIME (371 seconds) and ack-timeout can be 2 seconds.

5.4.2 Configuration: why is this a POST?  I think this should be PUT, like the others, since the intent is to replace previous configuration.

[TR] Yes, updated draft to use PUT.

Regarding redirected signaling, where do you get response code 3.00 from? It isn't listed in https://tools.ietf.org/html/rfc7252#section-12.1.2 or https://www.iana.org/assignments/core-parameters/core-parameters.xhtml#response-codes  .  Can we use 3.00, or is there a reference you can add to the doc?

[TR] Updated IANA considerations sections in the draft to define 3.00 (alternate server) CoAP response code.

Do we intend to prohibit large datagrams that will be fragmented?

[TR] Yes.

These are not terrible, just perhaps less likely to all arrive. I think the wording should say that multiple mitigation requests should be created to keep the datagram size small.

[TR] It's discussed in https://tools.ietf.org/html/draft-ietf-dots-signal-channel-02#section-7.1 to split the DOTS signal into separate messages when the request size exceeds Path MTU.

Cheers,
-Tiru

(I didn't read the security sections in detail, expecting them to change.)

It looks like a lot of issues, but I think the comments are only possible because the document is good enough by having details.

[Tip: track github issues for the points I've raised, if you cannot answer/resolve them immediately]
-Dave

[Dots] Some notes on ietf-dots-signal-channel-02 Dave Dolson
Re: [Dots] Some notes on ietf-dots-signal-channel… Konda, Tirumaleswar Reddy
Re: [Dots] Some notes on ietf-dots-signal-channel… Dave Dolson
Re: [Dots] Some notes on ietf-dots-signal-channel… Konda, Tirumaleswar Reddy