Re: [Dots] Some notes on ietf-dots-signal-channel-02

Dave Dolson <ddolson@sandvine.com> Thu, 03 August 2017 13:47 UTC

From: Dave Dolson <ddolson@sandvine.com>
To: Dave Dolson <ddolson@sandvine.com>, "dots@ietf.org" <dots@ietf.org>
Thread-Topic: Some notes on ietf-dots-signal-channel-02
Thread-Index: AdMA7vzOFdFEa4W8RY6lSPNLnIlWkALb8vdA
Date: Thu, 03 Aug 2017 13:47:16 +0000
Message-ID: <E8355113905631478EFF04F5AA706E98A9092505@wtl-exchp-2.sandvine.com>
References: <E8355113905631478EFF04F5AA706E98A906B8A4@wtl-exchp-2.sandvine.com>
In-Reply-To: <E8355113905631478EFF04F5AA706E98A906B8A4@wtl-exchp-2.sandvine.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_E8355113905631478EFF04F5AA706E98A9092505wtlexchp2sandvi_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/f8vo6Qvwz71syMvz5uWoav3HEIE>
Subject: Re: [Dots] Some notes on ietf-dots-signal-channel-02
Precedence: list

I know I sent this at a busy time, so perhaps it was forgotten. Would the authors care to comment on the points below?

-Dave

From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of Dave Dolson
Sent: Wednesday, July 19, 2017 10:36 PM
To: dots@ietf.org
Subject: [Dots] Some notes on ietf-dots-signal-channel-02

Generally I think the document is pretty good, but I found a number of questions and nits.

BTW, is this document in github? I was going to make a pull request, but couldn't find it.

For target-protocol, I think could be clarified using this reference: https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml

There is missing information about some filters:
fqdn: which FQDN is supposed to go here? I'm unclear on what would be mitigated? DNS? HTTP?

uri: does this mean an HTTP resource is under attack? It doesn't say HTTP.

lifetime:  I recommend a very large number be used for no timeout. (this might be a pet peeve of mine, but why do people like to use 0 to mean infinite?)  I think in the future we might want to use 0 to mean "end now".

CBOR supports binary byte strings. Therefore I suggest that IP addresses be represented as 4-byte or 16-byte byte-strings rather than the bulkier human-readable strings that may require ~40 bytes.
However, then prefixes would be represented like: "target-prefix": [ {"prefix": byte-string[16], "length": integer}]

At this time I also suggest supporting IPv4-mapped-v6 addresses to unify IPv4 and IPv6 code.

Although multiple mitigation-ids may be set, this wording confused me:
   If two mitigation requests
   have overlapping mitigation scopes the mitigation request with higher
   numeric mitigation-id value will override the mitigation request with
   a lower numeric mitigation-id value
This sounds like higher-valued IDs are supposed to replace smaller values. But that isn't intended, I don't think. Both mitigation-IDs should be stored, but that comment is about logic in scrubbing rules and attributing bytes-dropped.

I think there may be a race condition to consider, when we allow for reordering:
If this sequence is sent:
  PUT  (ID=1)
  PUT (ID=1)
  DELETE (ID=1)
But this sequence is received:
  PUT (ID=1)
  DELETE (ID=1)
  PUT  (ID=1)
Then the ID 1 will be incorrectly stored. The solution is that the server needs to remember deleted IDs for some time.

The URIs for PUT and DELETE are specified differently (v1 and version). I suspect they are intended to be the same, probably "v1" ?

Currently the DELETE is specified to return an error if deleting something not found. What is the value of this error? Why not always return 2.02, thereby giving DELETE idempotent behavior.

Regarding the 5min timeout after client DELETE, did we consider making that configurable in the delete message? This could allow a client to be a bit smarter. If timeout=0, it would mean stop right now.

I'm unclear on how I would determine bps-dropped or pps-dropped. What time denominator? (Generally I find rate measurements very difficult to compute and explain in a way that makes everyone happy.) Can we just let the client compute bytes/time ?

The -dropped counters show a clear bias towards scrubbers that simply drop packets. Are there other actions to be considered? E.g., DSCP marking?

Regarding "status", I'm unclear on what code 2 means. It sounds like all traffic is being dropped, but I don't consider that the most successful outcome. I think it might be better to say "Mitigation is in progress within capability" (in contrast to "exceeded capability" code 4).

Regarding "observe" It says,

    A DOTS client that is no longer

   interested in receiving notifications from the DOTS server can simply

   "forget" the observation.
More specifically, doesn't it have to make the request with "Observe=1" ?

I think there may be some race conditions with the "observe" option.
E.g., these messages reordered:
  Observe=0
  Observe=1
I'm not sure what the solution is. (I didn't really read RFC7641 to see if it was discussed)

The efficacy update seems to use the same URI as the one for requesting mitigation. How would the server know which type of message? I suspect we intended to use a different URI.
Scanning the document, a second look at all of the URIs might be in order. We want the URI to indicate the type of operation, not inferred from the content of the body.

Regarding heartbeat, what are the consequences of failure? I'm unclear on what action should be taken.
Please note that when under attack, round-trip times might be VERY large due to buffer bloat. A colleague of mine measured ping times exceeding 60s in a hotel!

On that note, should we give guidance about application-layer time-outs? A naïve implementation might pick something like 5s timeout. A client should be trying multiple transports and therefore have an async approach to writing the application.

5.4.2 Configuration: why is this a POST?  I think this should be PUT, like the others, since the intent is to replace previous configuration.

Regarding redirected signaling, where do you get response code 3.00 from? It isn't listed in https://tools.ietf.org/html/rfc7252#section-12.1.2 or https://www.iana.org/assignments/core-parameters/core-parameters.xhtml#response-codes  .  Can we use 3.00, or is there a reference you can add to the doc?

Do we intend to prohibit large datagrams that will be fragmented? These are not terrible, just perhaps less likely to all arrive. I think the wording should say that multiple mitigation requests should be created to keep the datagram size small.

(I didn't read the security sections in detail, expecting them to change.)

It looks like a lot of issues, but I think the comments are only possible because the document is good enough by having details.

[Tip: track github issues for the points I've raised, if you cannot answer/resolve them immediately]
-Dave

[Dots] Some notes on ietf-dots-signal-channel-02 Dave Dolson
Re: [Dots] Some notes on ietf-dots-signal-channel… Konda, Tirumaleswar Reddy
Re: [Dots] Some notes on ietf-dots-signal-channel… Dave Dolson
Re: [Dots] Some notes on ietf-dots-signal-channel… Konda, Tirumaleswar Reddy