Re: [netconf] Comments on draft-he-netconf-adaptive-collection-usecases

"hexm4@chinatelecom.cn" <hexm4@chinatelecom.cn> Tue, 05 April 2022 15:40 UTC

Return-Path: <hexm4@chinatelecom.cn>
X-Original-To: netconf@ietfa.amsl.com
Delivered-To: netconf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E24763A0BD6 for <netconf@ietfa.amsl.com>; Tue, 5 Apr 2022 08:40:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O2Xr5_yvuSTi for <netconf@ietfa.amsl.com>; Tue, 5 Apr 2022 08:40:07 -0700 (PDT)
Received: from chinatelecom.cn (prt-mail.chinatelecom.cn [42.123.76.226]) by ietfa.amsl.com (Postfix) with ESMTP id 7B6183A0BD4 for <netconf@ietf.org>; Tue, 5 Apr 2022 08:40:04 -0700 (PDT)
HMM_SOURCE_IP: 172.18.0.48:42568.1665470239
HMM_ATTACHE_NUM: 0000
HMM_SOURCE_TYPE: SMTP
Received: from clientip-116.21.228.94 (unknown [172.18.0.48]) by chinatelecom.cn (HERMES) with SMTP id 6138028009D; Tue, 5 Apr 2022 23:39:48 +0800 (CST)
X-189-SAVE-TO-SEND: 44031069@chinatelecom.cn
Received: from ([172.18.0.48]) by app0024 with ESMTP id 9798809c9f814671b655a42c218698d1 for bill.wu@huawei.com; Tue, 05 Apr 2022 23:39:52 CST
X-Transaction-ID: 9798809c9f814671b655a42c218698d1
X-Real-From: hexm4@chinatelecom.cn
X-Receive-IP: 172.18.0.48
X-MEDUSA-Status: 0
Sender: hexm4@chinatelecom.cn
Date: Tue, 05 Apr 2022 23:39:47 +0800
From: "hexm4@chinatelecom.cn" <hexm4@chinatelecom.cn>
To: "bill.wu" <bill.wu@huawei.com>, Netconf <netconf@ietf.org>
References: <a8f231fc8cf24dd3959acce78e96132f@huawei.com>
X-Priority: 3
X-Has-Attach: no
X-Mailer: Foxmail 7.2.18.95[cn]
Mime-Version: 1.0
Message-ID: <202204052339465365782@chinatelecom.cn>
Content-Type: multipart/alternative; boundary="----=_001_NextPart565507853175_=----"
Archived-At: <https://mailarchive.ietf.org/arch/msg/netconf/C_MwM5xEZq3GdATL7JGLOrjZJxA>
X-Mailman-Approved-At: Wed, 06 Apr 2022 16:07:36 -0700
Subject: Re: [netconf] Comments on draft-he-netconf-adaptive-collection-usecases
X-BeenThere: netconf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NETCONF WG list <netconf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/netconf>, <mailto:netconf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/netconf/>
List-Post: <mailto:netconf@ietf.org>
List-Help: <mailto:netconf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/netconf>, <mailto:netconf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Apr 2022 15:40:13 -0000

Hi, Qin:

I am glad that you have read through my draft-he-netconf-adaptive-collection-usecases, and thank you very much for your comments and questions. 

This draft mainly focuses on interface traffic collection of IP network device (e.g,. router, switch,gateway, BRAS, etc) . As is well known,  SNMP has been widely deployed in ISPs' networks, especially in collecting interface traffic. Limited by its low efficiency and more demands for processing capacity, SNMP has no ability to realize real-time traffic sampling at sub seconds or even milliseconds level, which is neccesary for deterministic services with delay,  jitter, and packet loss bound. Telemetry based on gRPC has the capability to collect interface traffic at a higher frequency, i.e., millisecond interval. However, it is impractical to gain the real-time traffic visibility at the cost of persistent sampling at millisecond intervals, bacause of considerable resource consumption (i.e., exhaust resources of the forwarding plane and the control plane of network device as well as computing and storage resources of collector) .  

The goal of this draft is to give the problem statement and set forth the importance of adaptive traffic data collection mechanism to capture real-time network state (for example, interface congestion caused by microbust) at minimum resource consumption. In the case of normal non-congested network condition, which happen at the time of 95% above, minutes-level sampling cycle is enough ,because it can give rise to less jitter and loss .  But, while detecting a congestion state or congestion trend, sampling period must be timely tuned to milliseconds so as to capture a microburst traffic of interface. 

In order to monitor the congestion state of interface timely, the dedicated hardware is preferable rather than using CPU on main control board through query. As soon as the event is triggered by output queue overflow, queue depth beyond the threshold or too high link utilization, the sampling cycle must be switched to milliseconds so as to capture the real-time interface traffic visibility.

In 4.1, Multi-dimensional real-time portrait of interface traffic characteristic is the most important scenario of the adaptive traffic data collection. Obtaining the holistic and genuine characteristic of interfacetraffic is a basic requirement for the statistical multiplexing model of IP network, which is of great significance for traffic prediction, network planning, network capacity expansion, network optimization,etc.

In 4.2, Microburst traffic detecting is another scenario of the adaptive traffic data collection. In order to detect  microburst, it is impractical to persist sampling at millisecond intervals, the adaptive collection mechanism is the best approach for minimizing resource consumption.

As for use case 3 and 4, more precisely to say, they are two application instances of real-time interface traffic visibility. Actually,  more application instances for different purpose can be based on the real-time interface traffic visibility, such as traffic prediction, network capacity expansion, network optimization, etc. 

Draft-wang-netconf-adaptive-subscription defines a YANG data model for the adaptive Subscription, I think this draft and my draft are complementary. And I hope we can have a deep cooperation,striving to promote standardization. 

Best regards.

Xiaoming He


hexm4@chinatelecom.cn
 
From: Qin Wu
Date: 2022-03-22 12:04
To: Netconf
CC: hexm4@chinatelecom.cn
Subject: Comments on draft-he-netconf-adaptive-collection-usecases
Hi, Xiaoming:
Thank for presenting draft-he-netconf-adaptive-collection-usecases. 
I have read through your document, if my understanding is correct, the big challenge you raised for data collection is excessive consumption resource by millisecond level collection, especially when the massive data is collected from interfaces need to be processed at once, e.g., batch data processing or microburst traffic detection. The consequence of this problem is instantaneous congestion of the output queue in the IP RAN, IP MEN, IP backbone. The network congestion maybe future aggravate such congestion problem. 
 
One example you give is compare with 5 minutes sampling cycle based on SNMP with 10 millisecond sampling, the required resource will increase by 30000 times. I am wondering whether you test YANG Push telemetry and and give the resource consumption
comparison.
 
In section 4, you provides 4 interesting use cases for your proposed adaptive solution. 
For use case 1, you want to provide multi-dimensional real-time portrait of interface traffic, yes, collect traffic data at different data collection rate did provide multi-dimensional data analysis, in addition, you can classify operational data into several categories, use data node tag proposed in draft-ietf-netmod-node-tags-06 to capture different categories of characteristics data. This also help you provide multi-dimensional real time data analysis.
 
For use case 2, microburst detection relies events trigger setting, I am not sure in your case, does the device require built-in hardware design to support monitoring? I think draft-wang-netconf-adaptive-subscription only requires the device to support different data collection rates and built on xpath capability supported by YANG. But I think microburst detection is more related to data measurement while draft-wang-netconf-adaptive-subscription is more related to streaming data reporting, exporting. They can be complimentary.
 
For use case 3, congestion avoidance for deterministic network, I believe it just focuses
on centralized controller based solution, not consider distributed solution. As we experimented in hackathon, centralized based solution also has limitation in the case of adaptive collection ,e.g., service disruption, overwhelming by thousands of device management, error prone, more network resource are consumed.
 
For use case 4, on-path telemetry based on adaptive traffic sampling, I am not sure this use case is limited to on-path telemetry, I see this not only related to data plane telemetry but also management telemetry, for management plane telemetry, adaptive subscription proposed in draft-wang-netconf-adaptive-subscription is one candidate solution, gaining real-time network state and traffic visibility at minimum resource consumption is exactly what propose to do.
 
Lastly, I am happy to discuss with you if you want to scope your problem space and targeted to NETCONF specific solution.
 
-Qin