Re: [Dots] Comments for draft-reddy-dots-telemetry

"Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com> Fri, 23 August 2019 10:55 UTC

Return-Path: <tirumaleswarreddy_konda@mcafee.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A950B120810 for <dots@ietfa.amsl.com>; Fri, 23 Aug 2019 03:55:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.299
X-Spam-Level:
X-Spam-Status: No, score=-4.299 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=mcafee.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dtudOXkJ2B0p for <dots@ietfa.amsl.com>; Fri, 23 Aug 2019 03:55:39 -0700 (PDT)
Received: from us-smtp-delivery-140.mimecast.com (us-smtp-delivery-140.mimecast.com [63.128.21.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BB8F4120142 for <dots@ietf.org>; Fri, 23 Aug 2019 03:55:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mcafee.com; s=mimecast20190606; t=1566557737; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RPOM3arjPIvTWjeIzme5phmuoUkAZxu8+01/7TfYG7g=; b=jIMxwZlEMLa6864dn5ys2RcLdUVwJsAT/dcx5dvR8t7pEYNo682AFOCReJSFGOp4NCw8BR 6fjn2jcGjouUulKLIJyzFhESQgtMK2DWChTPnUanpjPnrKzn/PygnwLARPawU7rfPuWQft pyy74Jj942YC7apzhJOSm/JvZaIalSw=
Received: from MIVWSMAILOUT1.mcafee.com (mivwsmailout1.mcafee.com [161.69.47.167]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-48-0GMJIDoyN5WKsHawFljNuw-1; Fri, 23 Aug 2019 06:55:32 -0400
Received: from DNVEXAPP1N05.corpzone.internalzone.com (DNVEXAPP1N05.corpzone.internalzone.com [10.44.48.89]) by MIVWSMAILOUT1.mcafee.com with smtp (TLS: TLSv1/SSLv3,256bits,ECDHE-RSA-AES256-SHA384) id 7a6b_b121_d0590f04_6cda_4f55_96b7_d29fe997c4ed; Fri, 23 Aug 2019 06:56:16 -0400
Received: from DNVEXAPP1N04.corpzone.internalzone.com (10.44.48.88) by DNVEXAPP1N05.corpzone.internalzone.com (10.44.48.89) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 23 Aug 2019 04:55:01 -0600
Received: from DNVO365EDGE2.corpzone.internalzone.com (10.44.176.74) by DNVEXAPP1N04.corpzone.internalzone.com (10.44.48.88) with Microsoft SMTP Server (TLS) id 15.0.1395.4 via Frontend Transport; Fri, 23 Aug 2019 04:55:00 -0600
Received: from NAM02-SN1-obe.outbound.protection.outlook.com (10.44.176.241) by edge.mcafee.com (10.44.176.74) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 23 Aug 2019 04:54:59 -0600
Received: from DM5PR16MB1705.namprd16.prod.outlook.com (10.172.44.147) by DM5PR16MB1484.namprd16.prod.outlook.com (10.173.210.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2199.19; Fri, 23 Aug 2019 10:54:59 +0000
Received: from DM5PR16MB1705.namprd16.prod.outlook.com ([fe80::532:f001:84e1:55ba]) by DM5PR16MB1705.namprd16.prod.outlook.com ([fe80::532:f001:84e1:55ba%10]) with mapi id 15.20.2178.020; Fri, 23 Aug 2019 10:54:59 +0000
From: "Konda, Tirumaleswar Reddy" <TirumaleswarReddy_Konda@McAfee.com>
To: "Panwei (William)" <william.panwei@huawei.com>
CC: dots <dots@ietf.org>, "draft-reddy-dots-telemetry@ietf.org" <draft-reddy-dots-telemetry@ietf.org>
Thread-Topic: Comments for draft-reddy-dots-telemetry
Thread-Index: AdVY3j2uxxImJy8tR8GGGvdCYIuylQAGAsKg
Date: Fri, 23 Aug 2019 10:54:58 +0000
Message-ID: <DM5PR16MB17051330D8429A080D61B2A5EAA40@DM5PR16MB1705.namprd16.prod.outlook.com>
References: <30E95A901DB42F44BA42D69DB20DFA6A6DE9CDD0@nkgeml513-mbx.china.huawei.com>
In-Reply-To: <30E95A901DB42F44BA42D69DB20DFA6A6DE9CDD0@nkgeml513-mbx.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
dlp-product: dlpe-windows
dlp-version: 11.3.0.17
dlp-reaction: no-action
x-originating-ip: [49.37.202.168]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 4da871ee-3969-4b75-2755-08d727b8577c
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600166)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:DM5PR16MB1484;
x-ms-traffictypediagnostic: DM5PR16MB1484:
x-ms-exchange-purlcount: 8
x-microsoft-antispam-prvs: <DM5PR16MB1484748DBC01AD2FAC66CFDCEAA40@DM5PR16MB1484.namprd16.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:3276;
x-forefront-prvs: 0138CD935C
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(39860400002)(396003)(376002)(136003)(346002)(366004)(32952001)(189003)(199004)(51914003)(54906003)(316002)(33656002)(26005)(6246003)(53946003)(53546011)(6506007)(53936002)(4326008)(446003)(186003)(76176011)(6116002)(790700001)(3846002)(9686003)(54896002)(6306002)(55016002)(7696005)(236005)(6436002)(71190400001)(5660300002)(71200400001)(99286004)(102836004)(30864003)(14444005)(5024004)(256004)(80792005)(21615005)(52536014)(86362001)(66066001)(66556008)(66446008)(64756008)(8676002)(81156014)(81166006)(66476007)(6916009)(76116006)(966005)(14454004)(9326002)(8936002)(2906002)(478600001)(476003)(229853002)(66946007)(7736002)(486006)(11346002)(74316002)(25786009)(606006)(85282002); DIR:OUT; SFP:1101; SCL:1; SRVR:DM5PR16MB1484; H:DM5PR16MB1705.namprd16.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: J15My55YsY/B8W/MAcaxlW/KN7BtVJ2chllBvNyrjyoedItGA4kcG1VDCmuJ1DuuniGKC+bNVfegE3sPpXAahmfaaMtW0L/QTQDkFQfTtk476lkYoljUlSR/MV9sCpeX/bTM1TcXeoInLhB4MTNGV6Ce24jy8geE8o7q0j5yeLbApNqSAycGhrH1310aVCpzS0vTxtLzT6bom0dFWyHMnKAwE8Lh+0lTQTWt4E1eqxJ1Q8/bbwj5YJb41Lt03+hGGek+4kZuw36Vgcs3us5XLLg2g7qIGEVboIWaL50TLP2/BpRCxE+7Yuvl1/tgUoxoo1eQu9gkC5Edc8MItipiqsGmN1s39ZllM9BdmlORHlOYUYs+kFUjU2/i3f61bbTXP8IEYdR+9T1ax7Oyr5XesaBuZ5t+ap9Ulrv4vOJkCJk=
x-ms-exchange-transport-forked: True
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 4da871ee-3969-4b75-2755-08d727b8577c
X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Aug 2019 10:54:58.9955 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4943e38c-6dd4-428c-886d-24932bc2d5de
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: +goTx2YL5WgEDzW1wY1ZLeOQBs7oOY/87fw25zJCsCPmNbUmpM/WNLe2d7xZ//692eRXMfQj/rpckmcj9jqadeVVFy4c8pXYFIgNpZUx9I8VSwljFYVM08mbaE6KrraU
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR16MB1484
X-OriginatorOrg: mcafee.com
X-NAI-Spam-Flag: NO
X-NAI-Spam-Level:
X-NAI-Spam-Threshold: 15
X-NAI-Spam-Score: 0.1
X-NAI-Spam-Version: 2.3.0.9418 : core <6618> : inlines <7138> : streams <1830912> : uri <2888472>
X-MC-Unique: 0GMJIDoyN5WKsHawFljNuw-1
X-Mimecast-Spam-Score: 0
Content-Type: multipart/alternative; boundary="_000_DM5PR16MB17051330D8429A080D61B2A5EAA40DM5PR16MB1705namp_"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/X-IuEvzw8OW37hj4oGn7I0AKqH0>
Subject: Re: [Dots] Comments for draft-reddy-dots-telemetry
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Aug 2019 10:55:47 -0000

Hi Wei,

Please see inline [TR2]

From: Panwei (William) <william.panwei@huawei.com<mailto:william.panwei@huawei.com>>
Sent: Thursday, August 22, 2019 6:14 PM
To: Konda, Tirumaleswar Reddy <TirumaleswarReddy_Konda@McAfee.com<mailto:TirumaleswarReddy_Konda@McAfee.com>>
Cc: dots <dots@ietf.org<mailto:dots@ietf.org>>; draft-reddy-dots-telemetry@ietf.org<mailto:draft-reddy-dots-telemetry@ietf.org>
Subject: RE: Comments for draft-reddy-dots-telemetry


CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

________________________________
Hi Tiru,

Please see inline.

Regards & Thanks!
潘伟 Wei Pan
华为技术有限公司 Huawei Technologies Co., Ltd.

From: Dots [mailto:dots-bounces@ietf.org] On Behalf Of Konda, Tirumaleswar Reddy
Sent: Tuesday, August 20, 2019 6:15 PM
To: Panwei (William) <william.panwei@huawei.com<mailto:william.panwei@huawei.com>>; draft-reddy-dots-telemetry@ietf.org<mailto:draft-reddy-dots-telemetry@ietf.org>
Cc: dots <dots@ietf.org<mailto:dots@ietf.org>>; MeiLing Chen <chenmeiling@chinamobile.com<mailto:chenmeiling@chinamobile.com>>
Subject: Re: [Dots] Comments for draft-reddy-dots-telemetry

Hi Wei,

Thanks for the detailed review. Please see inline [TR]

From: Dots <dots-bounces@ietf.org<mailto:dots-bounces@ietf.org>> On Behalf Of Panwei (William)
Sent: Thursday, August 15, 2019 12:44 PM
To: draft-reddy-dots-telemetry@ietf.org<mailto:draft-reddy-dots-telemetry@ietf.org>
Cc: dots <dots@ietf.org<mailto:dots@ietf.org>>; MeiLing Chen <chenmeiling@chinamobile.com<mailto:chenmeiling@chinamobile.com>>
Subject: [Dots] Comments for draft-reddy-dots-telemetry


CAUTION: External email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

________________________________
1. I agree that DOTS telemetry should has a dedicated URI.

[TR] Yes, will add path-suffix '/telemetry'

2. In Section 4.1.1, totally I’m OK with the idea of the ‘Total Traffic Normal Baseline’.

  2.1. I didn’t figure out what the low, mid and high percentile actually stand for, or how to understand them. I think average bandwidth, peak bandwidth and usual bandwidth range (which contains minimum bandwidth and maximum bandwidth) are useful for reference.

[TR] Percentile can be used for statistical analysis and is better than average, see https://www.elastic.co/blog/averages-can-dangerous-use-percentile and https://www.dynatrace.com/news/blog/why-averages-suck-and-percentiles-are-great/
[Wei] Thanks for sharing this, I have a better understanding about percentile now.
This is like, for example, continuously sampling 100 times at regular intervals, sorting these sample values in ascending order, then the value of the 10th sample is the 10th percentile, the value of the 50th sample is the 50th percentile, the value of the 90th sample is the 90th percentile.

[TR2] Yes, for example in our case 90th percentile says that 90% of the time, the total traffic usage is below the specified limit.

But I still have some questions:

1.  The percentile looks like more useful, but does it mean the average value is totally useless?

[TR2] It may not be useless, but does not look required with percentile (Note that 50% percentile is the median, and equal to the average in a bell curve, see https://www.mathsisfun.com/data/standard-normal-distribution.html).


2.  The text in the draft is very simple, so I didn’t get the reason why time ranges information is not required. Is there an existing standard about the traffic baseline, like what information the baseline contains, how to create the baseline, how to understand or use the baseline? Can you give me a reference or elaborate on these?

[TR2] Several articles and products discuss baselining normal traffic and detect anomalies using both statistical analysis and deep learning techniques.


3.  There are 100 percentile values, from 1% to 100%, what’s the reason to only choose the 10%, 50% and 90% three values? Will more values be more helpful? Can all these 100 values be sent?

[TR2] You don’t need all the values, see https://en.wikipedia.org/wiki/Percentile_rank#/media/File:PR_and_NCE.gif (we picked low, mid, high and peak values, and should cover most of the entire curve for baselining)

Is it better to make this attribute as a list, for example as below, to let the DOTS client decide what percentiles to be sent?
+--rw total-traffic-normal-baseline
   +--rw bandwidth-percentile* [percentile-position]
      +--rw percentile-position     uint8 // range is from 1 to 100
      +--rw percentile-value-pps    uint64
      +--rw percentile-value-bps    uint64

[TR2] Sure, works for me.

4. If the DOTS server wants to know some percentiles that the DOTS client didn’t send, can the DOTS server actively ask the DOTS client to send a specific percentile?

[TR2] I don’t see the need to complicate the mechanism.

There may be a confusion between the peak bandwidth and the maximum bandwidth. But IMO they are different: the peak bandwidth is a burst high value that only lasts for a short while, e.g., a few seconds or minutes, and the maximum bandwidth is a continuous high value that can last for a long time, e.g., a few hours.

[TR] I thought both peak and maximum bandwidth are the same, Please point me to a reference distinguishing between peak and maximum bandwidth.
[Wei] If using the percentile as the ‘list’ way that I described above, I think no longer need a separate peak bandwidth, the 100th percentile is the peak value.

[TR2] Yes.

  2.2. The statistics results can be quite different during different time ranges, for example, the bandwidth may be much higher during 6:00 pm to 12:00 pm than 6:00 am to 12:00 am. So for accuracy, it’s better to separately calculate the baseline in different time ranges.

[TR] I don’t think time ranges are required to create baseline for statistical analysis. The bandwidth may also vary depending on the day in a week, holidays, specific events (e.g. games) and flash crowd scenarios (https://www.radware.com/resources/ddos_mitigation_layers.aspx)
[Wei] I’m really confused. Can you elaborate more details about the baseline?

[TR2] I meant the baseline is created by observing the traffic during the peacetime, the time duration of observation can vary from days to weeks or weeks to months.

3. In Section 4.1.3 and 4.1.4, about the ‘Total Attack Traffic’ and ‘Total Traffic’, I think they should reflect the current attack traffic bandwidth and current traffic bandwidth, so whether only one current value is enough? Are the low percentile, mid percentile, high percentile and peak values still needed ?

[TR] Yes, percentile is required to analyze the attack pattern.
[Wei] Aren’t the ‘Total Attack Traffic’ and ‘Total Traffic’ instantaneous values? Why do the sampling and statistics of these two attributes?

[TR2] The sampling and statistics helps to know the trends of the total traffic and total attack traffic.

4. In Section 4.1.5, about the ‘Attack Details’:
  4.1. I don’t like the ‘vendor-id’ and ‘attack-id’ unless they are optional. The ‘attack-id’ is maintained by each vendor, the combination of ‘vendor-id’ and ‘attack-id’ can be enormous, so it’s a burden for implementation to understand and map these elements. Especially the DOTS client needs to implement the ‘Attack Details’ attribute as described in Section 4.3.1.

  4.2. Here the ‘attack-name’ is designed to use textual representation to express the attack type. Meiling also designed a mechanism to express the attack type in her draft (draft-chen-dots-attack-informations-02). Meiling’s mechanism is more complicated in rules definition, but it may be easier to implement and understand. The textual representation mechanism here seems like very easy because it has no rules definition, but it needs Natural Language Processing techniques, are these NLP techniques easy for implementing? Because of no unified rules, will the analysis results be different by different implementation?

[TR] You may want to look into the discussion https://mailarchive.ietf.org/arch/msg/dots/uyq-AB4me7qZ2apuaw8b3J6JDnA
[Wei] I didn’t see a conclusion.

[TR2] The conclusion looks like standard attack definition won’t work.

  4.3. For the ‘attack-severity’, I feel this element is too subjective, what’s the standards for distinguishing among ‘emergency’, ‘critical’ and ‘alert’?

[TR] Yes, it is subjective and only a hint.
[Wei] So I don’t think this is needed.

[TR2] ‘attack-severity’ is optional and can be ignored by the DOTS serve.

5. In Section 4.2, about the ‘Mitigation Efficacy DOTS Telemetry Attributes’, except the ‘Total Attack Traffic’, I think ‘Total Traffic’ and ‘Total Pipe Capability’ can also be included here. Because for some cases, the DOTS client can’t distinguish attack traffic from total traffic, then it will not be able to send the ‘Total Attack Traffic’, but it can send the current ‘Total Traffic’ and ‘Total Pipe Capability’ to indicate the mitigation efficacy. This is also mentioned in Meiling’s draft.

[TR] If the traffic is scrubbed by the DDoS mitigation provider, the DOTS server already knows the ‘Total Traffic’. ‘Total pipe capability’ is a pre-mitigation attribute. The pipe capacity won’t change during a DDoS attack.
[Wei] Is there a strong binding relation between the pre-mitigation telemetry and the mitigation efficacy telemetry? Must the DOTS client send the pre-mitigation telemetry before the mitigation efficacy telemetry?

[TR2] Yes

In my understanding, they are separated, so when just consider the mitigation efficacy telemetry, the ‘Total Traffic’ and the ‘Total Pipe Capability’ can also be used for measuring efficacy.

[TR2] The attributes ‘Total Traffic’ and the ‘Total Pipe Capability’ are not specific to a mitigation, hence it is optimal not to include these parameters in every mitigation efficacy update for each mitigation request.

6. The telemetry attributes are divided into three categories: Pre-mitigation, Mitigation Efficacy, Mitigation Status. I think these categories are reasonable and clear. But I found the attributes are basically related to bandwidth. Bandwidth is useful for volume-based DDoS attack, but for resource-based DDoS attack, other attributes are needed.

[TR] Good point, will update draft.
[Wei] OK

  6.1. To assess the resource-based DDoS attack, the statistics of session will be helpful. This statistics can be made from different dimensions: the number of sessions based on protocols like TCP/UDP/ICMP, the number of sessions per source IP, the number of source IPs, etc..

[TR] If it is resource-based DDoS, what is the use of number of sessions per source IP and the number of source IPs ?
[Wei] I will do more study to find some useful attributes and related examples.

[TR2] Thanks.

        6.2. This statistics of session can be added into the ‘Total Traffic Normal Baseline’, also be added into ‘Total traffic’ and ‘Total Capability’. The YANG module tree of my understanding is attached at the end for reference.

[TR] What type of statistics of a session are you referring to ?
[Wei] I mean all the attributes related to sessions.

[TR2] Got it,  added the following initial attributes:
   o  The maximum number of simultaneous connections that are allowed to
      the target server and the threshold is transport-protocol
      specific. For example, the target server could support both HTTP/2 and HTTP/3.

   o  The maximum number of simultaneous connections that are allowed to
      the target server per client.

   o  The maximum number of simultaneous embryonic connections that are
      allowed to the target server and the threshold is transport-
      protocol specific.

   o  The maximum number of simultaneous embryonic connections that are
      allowed to the target server per client.


  6.3. Some other information which can help identify an attack can also be considered and included. For example, in some attacks the attackers establish many sessions with a very long lifetime, so the statistics of session lifetime may help.

[TR] Please point me to DDoS attacks using sessions with very long lifetime.
[Wei] I will do more study to find some useful attributes and related examples

[TR2] Thanks.

7. Discussion:
I’d like to raise a discussion here. I tried to consider the telemetry from aspects of ‘why’, ‘what’, ‘who’, ‘where’, ‘when’ and ‘how’. ‘Why we need telemetry’ is described in Section 3 and ‘What are the telemetry attributes’ is describe is Section 4. For the left ‘who’, ‘where’, ‘when’ and ‘how’, I conclude them as ‘how will we use this telemetry’, i.e., in which scenario which role will send which telemetry attributes by which channel, this is not described yet. So do we need to describe ‘how will we use this telemetry’?

[TR] The use of the telemetry is implementation specific. For example, an DOTS server can use the telemetry for statistical analysis or deep learning or notify the DOTS server security operation teams.
[Wei] But, without use cases, how to prove what attributes are needed and what are not needed?

[TR2] As Med suggested, the detailed use cases can be explained in a separate draft.

Cheers,
-Tiru