Re: [Dots] Comments for draft-reddy-dots-telemetry

"Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com> Tue, 20 August 2019 10:16 UTC

From: "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com>
To: "Panwei (William)" <william.panwei@huawei.com>, "draft-reddy-dots-telemetry@ietf.org" <draft-reddy-dots-telemetry@ietf.org>
CC: dots <dots@ietf.org>, MeiLing Chen <chenmeiling@chinamobile.com>
Thread-Topic: Comments for draft-reddy-dots-telemetry
Thread-Index: AdVTOBBDEvMvOVKpRu6D+SYiSuQpYQCjJDaQ
Date: Tue, 20 Aug 2019 10:15:20 +0000
Message-ID: <DM5PR16MB17050DE91DD44CD305D41789EAAB0@DM5PR16MB1705.namprd16.prod.outlook.com>
References: <30E95A901DB42F44BA42D69DB20DFA6A6DE651BA@nkgeml513-mbx.china.huawei.com>
In-Reply-To: <30E95A901DB42F44BA42D69DB20DFA6A6DE651BA@nkgeml513-mbx.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
dlp-product: dlpe-windows
dlp-version: 11.3.0.17
dlp-reaction: no-action
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 9d9042de-737b-4136-af93-08d725574e52
X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Aug 2019 10:15:20.1769 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4943e38c-6dd4-428c-886d-24932bc2d5de
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: vJwQzU0vLtuD1+wWodDu7G/K2e9dur+QTpOCfPyBJqrJ5ARcCT7desH2Ggvoli6hZOW77veKMbwfYBwbmEDub9ww2Ye6KuzMTcdrq8IHjMnlHXEmV9Osa9qTIoF8c36L
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR16MB0058
X-OriginatorOrg: mcafee.com
X-NAI-Spam-Flag: NO
X-NAI-Spam-Threshold: 15
X-NAI-Spam-Score: 0
X-NAI-Spam-Version: 2.3.0.9418 : core <6615> : inlines <7136> : streams <1830625> : uri <2886751>
X-MC-Unique: vxgZ-fa0NrmtIOhQ2Lm2Zg-1
X-Mimecast-Spam-Score: 0
Content-Type: multipart/alternative; boundary="_000_DM5PR16MB17050DE91DD44CD305D41789EAAB0DM5PR16MB1705namp_"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/Mxb5E73bLSKscI0SG9vD1hqoLvY>
Subject: Re: [Dots] Comments for draft-reddy-dots-telemetry
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Aug 2019 10:16:41 -0000

Hi Wei,

Thanks for the detailed review. Please see inline [TR]

From: Dots <dots-bounces@ietf.org> On Behalf Of Panwei (William)
Sent: Thursday, August 15, 2019 12:44 PM
To: draft-reddy-dots-telemetry@ietf.org
Cc: dots <dots@ietf.org>; MeiLing Chen <chenmeiling@chinamobile.com>
Subject: [Dots] Comments for draft-reddy-dots-telemetry

CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

________________________________
Hi authors,

First of all, I’m sorry for the late comments.
In general, I support this work, I think telemetry can help the DDoS attack mitigation if it is used properly. Below is my comments.
1. I agree that DOTS telemetry should has a dedicated URI.

[TR] Yes, will add path-suffix '/telemetry'

2. In Section 4.1.1, totally I’m OK with the idea of the ‘Total Traffic Normal Baseline’.

[TR] Thanks.

  2.1. I didn’t figure out what the low, mid and high percentile actually stand for, or how to understand them. I think average bandwidth, peak bandwidth and usual bandwidth range (which contains minimum bandwidth and maximum bandwidth) are useful for reference.

[TR] Percentile can be used for statistical analysis and is better than average, see https://www.elastic.co/blog/averages-can-dangerous-use-percentile and https://www.dynatrace.com/news/blog/why-averages-suck-and-percentiles-are-great/

There may be a confusion between the peak bandwidth and the maximum bandwidth. But IMO they are different: the peak bandwidth is a burst high value that only lasts for a short while, e.g., a few seconds or minutes, and the maximum bandwidth is a continuous high value that can last for a long time, e.g., a few hours.

[TR] I thought both peak and maximum bandwidth are the same, Please point me to a reference distinguishing between peak and maximum bandwidth.

  2.2. The statistics results can be quite different during different time ranges, for example, the bandwidth may be much higher during 6:00 pm to 12:00 pm than 6:00 am to 12:00 am. So for accuracy, it’s better to separately calculate the baseline in different time ranges.

[TR] I don’t think time ranges are required to create baseline for statistical analysis. The bandwidth may also vary depending on the day in a week, holidays, specific events (e.g. games) and flash crowd scenarios (https://www.radware.com/resources/ddos_mitigation_layers.aspx)

3. In Section 4.1.3 and 4.1.4, about the ‘Total Attack Traffic’ and ‘Total Traffic’, I think they should reflect the current attack traffic bandwidth and current traffic bandwidth, so whether only one current value is enough? Are the low percentile, mid percentile, high percentile and peak values still needed ?

[TR] Yes, percentile is required to analyze the attack pattern.

4. In Section 4.1.5, about the ‘Attack Details’:
  4.1. I don’t like the ‘vendor-id’ and ‘attack-id’ unless they are optional. The ‘attack-id’ is maintained by each vendor, the combination of ‘vendor-id’ and ‘attack-id’ can be enormous, so it’s a burden for implementation to understand and map these elements. Especially the DOTS client needs to implement the ‘Attack Details’ attribute as described in Section 4.3.1.

  4.2. Here the ‘attack-name’ is designed to use textual representation to express the attack type. Meiling also designed a mechanism to express the attack type in her draft (draft-chen-dots-attack-informations-02). Meiling’s mechanism is more complicated in rules definition, but it may be easier to implement and understand. The textual representation mechanism here seems like very easy because it has no rules definition, but it needs Natural Language Processing techniques, are these NLP techniques easy for implementing? Because of no unified rules, will the analysis results be different by different implementation?

[TR] You may want to look into the discussion https://mailarchive.ietf.org/arch/msg/dots/uyq-AB4me7qZ2apuaw8b3J6JDnA

  4.3. For the ‘attack-severity’, I feel this element is too subjective, what’s the standards for distinguishing among ‘emergency’, ‘critical’ and ‘alert’?

[TR] Yes, it is subjective and only a hint.

5. In Section 4.2, about the ‘Mitigation Efficacy DOTS Telemetry Attributes’, except the ‘Total Attack Traffic’, I think ‘Total Traffic’ and ‘Total Pipe Capability’ can also be included here. Because for some cases, the DOTS client can’t distinguish attack traffic from total traffic, then it will not be able to send the ‘Total Attack Traffic’, but it can send the current ‘Total Traffic’ and ‘Total Pipe Capability’ to indicate the mitigation efficacy. This is also mentioned in Meiling’s draft.

[TR] If the traffic is scrubbed by the DDoS mitigation provider, the DOTS server already knows the ‘Total Traffic’. ‘Total pipe capability’ is a pre-mitigation attribute. The pipe capacity won’t change during a DDoS attack.

6. The telemetry attributes are divided into three categories: Pre-mitigation, Mitigation Efficacy, Mitigation Status. I think these categories are reasonable and clear. But I found the attributes are basically related to bandwidth. Bandwidth is useful for volume-based DDoS attack, but for resource-based DDoS attack, other attributes are needed.

[TR] Good point, will update draft.

  6.1. To assess the resource-based DDoS attack, the statistics of session will be helpful. This statistics can be made from different dimensions: the number of sessions based on protocols like TCP/UDP/ICMP, the number of sessions per source IP, the number of source IPs, etc..

[TR] If it is resource-based DDoS, what is the use of number of sessions per source IP and the number of source IPs ?

        6.2. This statistics of session can be added into the ‘Total Traffic Normal Baseline’, also be added into ‘Total traffic’ and ‘Total Capability’. The YANG module tree of my understanding is attached at the end for reference.

[TR] What type of statistics of a session are you referring to ?

  6.3. Some other information which can help identify an attack can also be considered and included. For example, in some attacks the attackers establish many sessions with a very long lifetime, so the statistics of session lifetime may help.

[TR] Please point me to DDoS attacks using sessions with very long lifetime.

7. Discussion:
I’d like to raise a discussion here. I tried to consider the telemetry from aspects of ‘why’, ‘what’, ‘who’, ‘where’, ‘when’ and ‘how’. ‘Why we need telemetry’ is described in Section 3 and ‘What are the telemetry attributes’ is describe is Section 4. For the left ‘who’, ‘where’, ‘when’ and ‘how’, I conclude them as ‘how will we use this telemetry’, i.e., in which scenario which role will send which telemetry attributes by which channel, this is not described yet. So do we need to describe ‘how will we use this telemetry’?

[TR] The use of the telemetry is implementation specific. For example, an DOTS server can use the telemetry for statistical analysis or deep learning or notify the DOTS server security operation teams.

Cheers,
-Tiru
----------------------------------------------------------
The YANG Module Tree for the statistics of session:
module: ietf-dots-telemetry
   +--rw pre-mitigation-telemetry
   |  +--rw total-traffic-normal-baseline* [start-time end-time]
   |  |  +--rw start-time           yang:date-and-time
   |  |  +--rw end-time             yang:date-and-time
   |  |  +--rw normal-bandwidth
   |  |  |  ......
   |  |  +--rw normal-session
   |  |     +--rw number-of-user  // The statistics of users (e.g., source IPs)
   |  |     |  +--rw average        uint64
   |  |     |  +--rw peak           uint64
   |  |     |  +--rw minimum        uint64
   |  |     |  +--rw maximum        uint64
   |  |     +--rw sessions-per-user  // The statistics of sessions per user (e.g., per source IP)
   |  |     |  +--rw average        uint64
   |  |     |  +--rw peak           uint64
   |  |     |  +--rw minimum        uint64
   |  |     |  +--rw maximum        uint64
   |  |     +--rw sessions-in-total  // The statistics of total sessions
   |  |     |  +--rw average        uint64
   |  |     |  +--rw peak           uint64
   |  |     |  +--rw minimum        uint64
   |  |     |  +--rw maximum        uint64
   |  |     +--rw sessions-per-protocol* [ip-version ip-protocol protocol]  // The statistics of sessions per protocol
   |  |        +--rw ip-version     inet:ip-version  // IPv4 or IPv6
   |  |        +--rw ip-protocol    uint8   // IP protocol number, for example ICMP is 1, TCP is 6, UDP is 17, etc.
   |  |        +--rw port           uint16  // TCP port/UDP port/ICMP type etc., this can indicate the type of upper layer protocol
   |  |        +--rw average        uint64
   |  |        +--rw peak           uint64
  |  |        +--rw minimum        uint64
   |  |        +--rw maximum        uint64
   |  +--rw total-capability
   |  |  +--rw pipe-capability
   |  |  |  ......
   |  |  +--rw session-capability
   |  |     +--rw maximum-users              uint64  // Maximum number of users (e.g., source IPs)
   |  |     +--rw maximum-sessions-per-user  uint64  // Maximum number of sessions per user (e.g., source IP)
   |  |     +--rw maximum-sessions-in-total  uint64  // Maximum number of total sessions
   |  |     +--rw maximum-sessions-per-protocol* [ip-version ip-protocol protocol]
   |  |        +--rw ip-version              inet:ip-version
   |  |        +--rw ip-protocol             uint8
   |  |        +--rw port                    uint16
   |  |        +--rw maximum                 uint64  // Maximum number of sessions for each protocol
   |  +--rw total-attack-traffic
   |  |  ......
   |  +--rw total-traffic
   |     +--rw current-bandwidth
   |     |  ......
   |     +--rw current-session
   |        +--rw current-users              uint64  // Current number of users (e.g., source IPs)
   |        +--rw current-sessions-per-user  uint64  // Average number of current sessions per user (e.g., source IP)
   |        +--rw current-sessions-in-total  uint64  // Current number of total sessions
   |        +--rw current-sessions-per-protocol* [ip-version ip-protocol protocol]
   |           +--rw ip-version              inet:ip-version
   |           +--rw ip-protocol             uint8
   |           +--rw port                    uint16
   |           +--rw current-sessions        uint64  // Current number of sessions for each protocol
   +--rw mitigation-efficacy-telemetry
   |  +--rw total-capability
   |  |  +--rw pipe-capability
   |  |  |  ......
   |  |  +--rw session-capability
   |  |     +--rw maximum-users              uint64  // Maximum number of users (e.g., source IPs)
   |  |     +--rw maximum-sessions-per-user  uint64  // Maximum number of sessions per user (e.g., source IP)
   |  |     +--rw maximum-sessions-in-total  uint64  // Maximum number of total sessions
   |  |     +--rw maximum-sessions-per-protocol* [ip-version ip-protocol protocol]
   |  |        +--rw ip-version              inet:ip-version
   |  |        +--rw ip-protocol             uint8
   |  |        +--rw port                    uint16
   |  |        +--rw maximum                 uint64  // Maximum number of sessions for each protocol
   |  +--rw total-attack-traffic
   |  |  ......
   |  +--rw total-traffic
   |     +--rw current-bandwidth
   |     |  ......
   |     +--rw current-session
   |        +--rw current-users              uint64  // Current number of users (e.g., source IPs)
   |        +--rw current-sessions-per-user  uint64  // Average number of current sessions per user (e.g., source IP)
   |        +--rw current-sessions-in-total  uint64  // Current number of total sessions
   |        +--rw current-sessions-per-protocol* [ip-version ip-protocol protocol]
   |           +--rw ip-version              inet:ip-version
   |           +--rw ip-protocol             uint8
   |           +--rw port                    uint16
   |           +--rw current-sessions        uint64  // Current number of sessions for each protocol
   +--rw mitigation-status-telemetry
      ......

Regards & Thanks!
潘伟 Wei Pan
华为技术有限公司 Huawei Technologies Co., Ltd.

[Dots] Comments for draft-reddy-dots-telemetry Panwei (William)
Re: [Dots] Comments for draft-reddy-dots-telemetry Konda, Tirumaleswar Reddy
Re: [Dots] Comments for draft-reddy-dots-telemetry Panwei (William)
Re: [Dots] Comments for draft-reddy-dots-telemetry Konda, Tirumaleswar Reddy