[Dots] Comments for draft-reddy-dots-telemetry

"Panwei (William)" <william.panwei@huawei.com> Thu, 15 August 2019 07:14 UTC

Return-Path: <william.panwei@huawei.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0C4AD120058; Thu, 15 Aug 2019 00:14:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sXUINgTxCiAm; Thu, 15 Aug 2019 00:14:34 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 78BA612004E; Thu, 15 Aug 2019 00:14:34 -0700 (PDT)
Received: from lhreml708-cah.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id B3C42E222E84BADD388B; Thu, 15 Aug 2019 08:14:32 +0100 (IST)
Received: from lhreml702-chm.china.huawei.com (10.201.108.51) by lhreml708-cah.china.huawei.com (10.201.108.49) with Microsoft SMTP Server (TLS) id 14.3.408.0; Thu, 15 Aug 2019 08:14:32 +0100
Received: from lhreml702-chm.china.huawei.com (10.201.108.51) by lhreml702-chm.china.huawei.com (10.201.108.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1713.5; Thu, 15 Aug 2019 08:14:31 +0100
Received: from NKGEML412-HUB.china.huawei.com (10.98.56.73) by lhreml702-chm.china.huawei.com (10.201.108.51) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P256) id 15.1.1713.5 via Frontend Transport; Thu, 15 Aug 2019 08:14:31 +0100
Received: from NKGEML513-MBX.china.huawei.com ([169.254.1.66]) by nkgeml412-hub.china.huawei.com ([10.98.56.73]) with mapi id 14.03.0439.000; Thu, 15 Aug 2019 15:14:13 +0800
From: "Panwei (William)" <william.panwei@huawei.com>
To: "draft-reddy-dots-telemetry@ietf.org" <draft-reddy-dots-telemetry@ietf.org>
CC: dots <dots@ietf.org>, MeiLing Chen <chenmeiling@chinamobile.com>
Thread-Topic: Comments for draft-reddy-dots-telemetry
Thread-Index: AdVTOBBDEvMvOVKpRu6D+SYiSuQpYQ==
Date: Thu, 15 Aug 2019 07:14:12 +0000
Message-ID: <30E95A901DB42F44BA42D69DB20DFA6A6DE651BA@nkgeml513-mbx.china.huawei.com>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.134.37.117]
Content-Type: multipart/alternative; boundary="_000_30E95A901DB42F44BA42D69DB20DFA6A6DE651BAnkgeml513mbxchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/n7PJdu9WRBzztW_Qq2Z-XkHRpyw>
Subject: [Dots] Comments for draft-reddy-dots-telemetry
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Aug 2019 07:14:38 -0000

Hi authors,

First of all, I’m sorry for the late comments.
In general, I support this work, I think telemetry can help the DDoS attack mitigation if it is used properly. Below is my comments.
1. I agree that DOTS telemetry should has a dedicated URI.
2. In Section 4.1.1, totally I’m OK with the idea of the ‘Total Traffic Normal Baseline’.
  2.1. I didn’t figure out what the low, mid and high percentile actually stand for, or how to understand them. I think average bandwidth, peak bandwidth and usual bandwidth range (which contains minimum bandwidth and maximum bandwidth) are useful for reference. There may be a confusion between the peak bandwidth and the maximum bandwidth. But IMO they are different: the peak bandwidth is a burst high value that only lasts for a short while, e.g., a few seconds or minutes, and the maximum bandwidth is a continuous high value that can last for a long time, e.g., a few hours.
  2.2. The statistics results can be quite different during different time ranges, for example, the bandwidth may be much higher during 6:00 pm to 12:00 pm than 6:00 am to 12:00 am. So for accuracy, it’s better to separately calculate the baseline in different time ranges.
3. In Section 4.1.3 and 4.1.4, about the ‘Total Attack Traffic’ and ‘Total Traffic’, I think they should reflect the current attack traffic bandwidth and current traffic bandwidth, so whether only one current value is enough? Are the low percentile, mid percentile, high percentile and peak values still needed?
4. In Section 4.1.5, about the ‘Attack Details’:
  4.1. I don’t like the ‘vendor-id’ and ‘attack-id’ unless they are optional. The ‘attack-id’ is maintained by each vendor, the combination of ‘vendor-id’ and ‘attack-id’ can be enormous, so it’s a burden for implementation to understand and map these elements. Especially the DOTS client needs to implement the ‘Attack Details’ attribute as described in Section 4.3.1.
  4.2. Here the ‘attack-name’ is designed to use textual representation to express the attack type. Meiling also designed a mechanism to express the attack type in her draft (draft-chen-dots-attack-informations-02). Meiling’s mechanism is more complicated in rules definition, but it may be easier to implement and understand. The textual representation mechanism here seems like very easy because it has no rules definition, but it needs Natural Language Processing techniques, are these NLP techniques easy for implementing? Because of no unified rules, will the analysis results be different by different implementation?
  4.3. For the ‘attack-severity’, I feel this element is too subjective, what’s the standards for distinguishing among ‘emergency’, ‘critical’ and ‘alert’?
5. In Section 4.2, about the ‘Mitigation Efficacy DOTS Telemetry Attributes’, except the ‘Total Attack Traffic’, I think ‘Total Traffic’ and ‘Total Pipe Capability’ can also be included here. Because for some cases, the DOTS client can’t distinguish attack traffic from total traffic, then it will not be able to send the ‘Total Attack Traffic’, but it can send the current ‘Total Traffic’ and ‘Total Pipe Capability’ to indicate the mitigation efficacy. This is also mentioned in Meiling’s draft.
6. The telemetry attributes are divided into three categories: Pre-mitigation, Mitigation Efficacy, Mitigation Status. I think these categories are reasonable and clear. But I found the attributes are basically related to bandwidth. Bandwidth is useful for volume-based DDoS attack, but for resource-based DDoS attack, other attributes are needed.
  6.1. To assess the resource-based DDoS attack, the statistics of session will be helpful. This statistics can be made from different dimensions: the number of sessions based on protocols like TCP/UDP/ICMP, the number of sessions per source IP, the number of source IPs, etc..
  6.2. This statistics of session can be added into the ‘Total Traffic Normal Baseline’, also be added into ‘Total traffic’ and ‘Total Capability’. The YANG module tree of my understanding is attached at the end for reference.
  6.3. Some other information which can help identify an attack can also be considered and included. For example, in some attacks the attackers establish many sessions with a very long lifetime, so the statistics of session lifetime may help.

7. Discussion:
I’d like to raise a discussion here. I tried to consider the telemetry from aspects of ‘why’, ‘what’, ‘who’, ‘where’, ‘when’ and ‘how’. ‘Why we need telemetry’ is described in Section 3 and ‘What are the telemetry attributes’ is describe is Section 4. For the left ‘who’, ‘where’, ‘when’ and ‘how’, I conclude them as ‘how will we use this telemetry’, i.e., in which scenario which role will send which telemetry attributes by which channel, this is not described yet. So do we need to describe ‘how will we use this telemetry’?

----------------------------------------------------------
The YANG Module Tree for the statistics of session:
module: ietf-dots-telemetry
   +--rw pre-mitigation-telemetry
   |  +--rw total-traffic-normal-baseline* [start-time end-time]
   |  |  +--rw start-time           yang:date-and-time
   |  |  +--rw end-time             yang:date-and-time
   |  |  +--rw normal-bandwidth
   |  |  |  ......
   |  |  +--rw normal-session
   |  |     +--rw number-of-user  // The statistics of users (e.g., source IPs)
   |  |     |  +--rw average        uint64
   |  |     |  +--rw peak           uint64
   |  |     |  +--rw minimum        uint64
   |  |     |  +--rw maximum        uint64
   |  |     +--rw sessions-per-user  // The statistics of sessions per user (e.g., per source IP)
   |  |     |  +--rw average        uint64
   |  |     |  +--rw peak           uint64
   |  |     |  +--rw minimum        uint64
   |  |     |  +--rw maximum        uint64
   |  |     +--rw sessions-in-total  // The statistics of total sessions
   |  |     |  +--rw average        uint64
   |  |     |  +--rw peak           uint64
   |  |     |  +--rw minimum        uint64
   |  |     |  +--rw maximum        uint64
   |  |     +--rw sessions-per-protocol* [ip-version ip-protocol protocol]  // The statistics of sessions per protocol
   |  |        +--rw ip-version     inet:ip-version  // IPv4 or IPv6
   |  |        +--rw ip-protocol    uint8   // IP protocol number, for example ICMP is 1, TCP is 6, UDP is 17, etc.
   |  |        +--rw port           uint16  // TCP port/UDP port/ICMP type etc., this can indicate the type of upper layer protocol
   |  |        +--rw average        uint64
   |  |        +--rw peak           uint64
  |  |        +--rw minimum        uint64
   |  |        +--rw maximum        uint64
   |  +--rw total-capability
   |  |  +--rw pipe-capability
   |  |  |  ......
   |  |  +--rw session-capability
   |  |     +--rw maximum-users              uint64  // Maximum number of users (e.g., source IPs)
   |  |     +--rw maximum-sessions-per-user  uint64  // Maximum number of sessions per user (e.g., source IP)
   |  |     +--rw maximum-sessions-in-total  uint64  // Maximum number of total sessions
   |  |     +--rw maximum-sessions-per-protocol* [ip-version ip-protocol protocol]
   |  |        +--rw ip-version              inet:ip-version
   |  |        +--rw ip-protocol             uint8
   |  |        +--rw port                    uint16
   |  |        +--rw maximum                 uint64  // Maximum number of sessions for each protocol
   |  +--rw total-attack-traffic
   |  |  ......
   |  +--rw total-traffic
   |     +--rw current-bandwidth
   |     |  ......
   |     +--rw current-session
   |        +--rw current-users              uint64  // Current number of users (e.g., source IPs)
   |        +--rw current-sessions-per-user  uint64  // Average number of current sessions per user (e.g., source IP)
   |        +--rw current-sessions-in-total  uint64  // Current number of total sessions
   |        +--rw current-sessions-per-protocol* [ip-version ip-protocol protocol]
   |           +--rw ip-version              inet:ip-version
   |           +--rw ip-protocol             uint8
   |           +--rw port                    uint16
   |           +--rw current-sessions        uint64  // Current number of sessions for each protocol
   +--rw mitigation-efficacy-telemetry
   |  +--rw total-capability
   |  |  +--rw pipe-capability
   |  |  |  ......
   |  |  +--rw session-capability
   |  |     +--rw maximum-users              uint64  // Maximum number of users (e.g., source IPs)
   |  |     +--rw maximum-sessions-per-user  uint64  // Maximum number of sessions per user (e.g., source IP)
   |  |     +--rw maximum-sessions-in-total  uint64  // Maximum number of total sessions
   |  |     +--rw maximum-sessions-per-protocol* [ip-version ip-protocol protocol]
   |  |        +--rw ip-version              inet:ip-version
   |  |        +--rw ip-protocol             uint8
   |  |        +--rw port                    uint16
   |  |        +--rw maximum                 uint64  // Maximum number of sessions for each protocol
   |  +--rw total-attack-traffic
   |  |  ......
   |  +--rw total-traffic
   |     +--rw current-bandwidth
   |     |  ......
   |     +--rw current-session
   |        +--rw current-users              uint64  // Current number of users (e.g., source IPs)
   |        +--rw current-sessions-per-user  uint64  // Average number of current sessions per user (e.g., source IP)
   |        +--rw current-sessions-in-total  uint64  // Current number of total sessions
   |        +--rw current-sessions-per-protocol* [ip-version ip-protocol protocol]
   |           +--rw ip-version              inet:ip-version
   |           +--rw ip-protocol             uint8
   |           +--rw port                    uint16
   |           +--rw current-sessions        uint64  // Current number of sessions for each protocol
   +--rw mitigation-status-telemetry
      ......

Regards & Thanks!
潘伟 Wei Pan
华为技术有限公司 Huawei Technologies Co., Ltd.