[OPSAWG] Review comments (Re: I-D Action: draft-ietf-opsawg-ntf-03.txt)

Alexander L Clemm <ludwig@clemm.org> Thu, 17 September 2020 23:24 UTC

Return-Path: <ludwig@clemm.org>
X-Original-To: opsawg@ietfa.amsl.com
Delivered-To: opsawg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A23A73A0EC6 for <opsawg@ietfa.amsl.com>; Thu, 17 Sep 2020 16:24:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.918
X-Spam-Level:
X-Spam-Status: No, score=-1.918 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1gZjnBfDe_sy for <opsawg@ietfa.amsl.com>; Thu, 17 Sep 2020 16:24:55 -0700 (PDT)
Received: from mout.perfora.net (mout.perfora.net [74.208.4.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49C253A0EC4 for <opsawg@ietf.org>; Thu, 17 Sep 2020 16:24:54 -0700 (PDT)
Received: from [172.16.0.44] ([73.189.160.186]) by mrelay.perfora.net (mreueus002 [74.208.5.2]) with ESMTPSA (Nemesis) id 0LgaZP-1koUYJ1jbJ-00o1W1 for <opsawg@ietf.org>; Fri, 18 Sep 2020 01:24:50 +0200
To: opsawg@ietf.org
References: <158680439051.4717.17184407968613868086@ietfa.amsl.com>
From: Alexander L Clemm <ludwig@clemm.org>
Message-ID: <24298f8f-6524-c081-c89b-d6305905446d@clemm.org>
Date: Thu, 17 Sep 2020 16:24:46 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <158680439051.4717.17184407968613868086@ietfa.amsl.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Provags-ID: V03:K1:L8FBh9dumGxh+RigN7ZYkgP7Tl80xzANVklJLpCOyy0wzPJJAFY QMYF0PyA8afT5RLjnzNoss40d6IJlzTFeTDwKm/T0cBtYTQejOhKOQ8EKzPHvoFR/y+iixa bx4lfrly4klyUl94JbAEL50y9eVVaCrvmd71Df93it7qO95p08zaNrhmMLRg3kAnfAvX6kV lThWIxrJHphOtbwI3Ld7g==
X-UI-Out-Filterresults: notjunk:1;V03:K0:Ya0l0El6Apc=:8+rY+v0HUtJkcDIwSaL8LU /m3sweOxckWYTERKkqGzPgf+nHpYPKeXffy8H92S3660MKN37QshnMfWGgySzTtRCou8xYdfo f05Ky2T0bZsQrFDav4xqkdZlM7BYzdtBE+IOUV76ibVnmJ7TTsLG9xGZxOWLkJfnPUaZ4yfdC B7SrhlZDAiXTlHx4+qFS5zDjxowVn/Oor8PPp9pv9THwxqjWsCFgURRPe93NEnRcSwxEJszVR E0Vjp59vffV0DAXKakAKJdJ0belOw7/q2Ga48YB5bAmGkFWeNjbtiwwenogSvhMBcMNsiyb55 5tZeItIRmUJKRHyBz6gYkDn6d56k2rKmpyl95k2W2WR6KpX2kRy291VrpNyoHErC/9kXRJV5X nBcw0Z2dMq9+bUNEXxrHTznEsNQ/4Yh35ectfPwpbjSvHvbQkhiQ/DTD8MawILxbBzWdQXGU9 04+O6LL7REZqJvaSAYYR86bvIV69rBiW+OS5Vg+AImGUa5Znzo0R0nJElPV6mGaspfNL+jp6l hJp6ch7mmLZY+5/X+4dtaFBdSEeuTtrdz6Yww6jxlkUxRC9eu1X+Ij7b1CoQVnhXxpotWnEM8 lJcZZzQbfdZoCpXaAHvPlkFg3+Vp569kH6smyDkqg7x4OSqHfkl9NXYXz+2WB4NYH+MHg7aXW 6+TxE2Pb052T2/CNfrC/n8dok3VbybxEKVWTcKx75cREQbo9T7cR5ulBsYo+coJ5Nt9g3tUqS ezwyEyjHu5igxsN8T/cE4BQL8MHyq+r1SGG1byBdWo0H/Bxe7PRznyW0+DMFQB7BNYLcgy/mZ AfYJw4biRqN3fnD3Abe7PsHf4OCAiEry+ix0+BdiQika9uohphbMHyv/ZpL78dcmj5GJoSn
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/Koyu1wSVUuzCdHnO-Zg8IfPpvLQ>
Subject: [OPSAWG] Review comments (Re: I-D Action: draft-ietf-opsawg-ntf-03.txt)
X-BeenThere: opsawg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: OPSA Working Group Mail List <opsawg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsawg>, <mailto:opsawg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/opsawg/>
List-Post: <mailto:opsawg@ietf.org>
List-Help: <mailto:opsawg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsawg>, <mailto:opsawg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Sep 2020 23:24:58 -0000

Hello Haoyu and draft authors,

I have reviewed draft-ietf-opsawg-ntf-03, Network Telemetry Framework,
and a number of comments and suggestions for your consideration for
further improvements of the document.  I am aware this is a bit lengthy,
but hopefully you will find at least some of it useful.  FWIW, here goes
(in sequential order in the document, not in order of relevance of the
comment):

- Intoduction (p3):

As a general comment, I noticed that the document tends to conflate (1)
telemetry data, (2) modules of a framework architecture to generate,
collect, and process telemetry data, as well as (3) functionality of
applications or use cases that leverage such framework architecture.  It
would be useful to clearly distinguish between these concepts and call
out their relationship here. 

An example is the sentence "We show how network telemetry can meet ...
network operation requirements, and the challenges each telemetry module
is facing".  It is not clear at this point what "network telemetry"
really refers to:  is it network telemetry data (but that does not meet
network operation requirements by itself, although it contributes to a
solution), or is the framework, but then what does "telemetry module"
refer to which would presumably be part of the framework, not an
independent entity whose challenges are addressed through the framework. 

- Motivation (p4):

For intent, may want to add a reference to
draft-irtf-nmrg-ibn-concepts-definitions

I am not sure if "actionable information" (and the activities used to
translate "network data" into such information) is part of telemetry. 
To me, "network data" referred to here is really the telemetry data. 
Sure, this data will get processed, aggregated, abstracted, etc, but
telemetry is really the "raw data" that fuels all of those activities. 
I think what is needed is a clear definition or introduction of what
"telemetry data" entails, and what will be covered by the framework -
just the framework to generate and collect that data, or any other
applications "on top" that process that data further and use it for
different purposes (which would make this more of a service assurance
framework, not merely a telemetry framework IMHO).  Either way, I think
it would be good to frame the scope more clearly. 

- Section 2.1 (p5):

There is an inline definition of intent that equates it with a policy. 
This is not consistent with the definition in other places.  Please
refer to the definition e.g. in draft-irtf-nmrg-ibn-concepts-definitions

For use cases, I think the two most important ones are missing; I would
suggest adding these and leading with these, actually.  This concerns
Security (e.g., intrusion detection systems analyzing telemetry data to
detect suspicious activities and traffic), as well as Monitoring.  These
are also arguably the most important use cases for e.g. Netflow/IPFIX
today; it is not clear if flow records qualify as telemetry data in your
definition but the document hints at them in several places. 

- Section 2.x

In general, what is missing is a section 2.x that gives a brief overview
of what is considered as "telemetry data".  This is left to the
imagination of the reader.  Just statistics and snapshots of state?  Or
also config data?  Flow records?  What about event records and logs? 
Measurements?  Packets (including sampled ones) stored/copied for
analysis?  Control packets?  All of the above, or if not, what are
things that would not be considered "telemetry data"? 

- Section 2.2

The state of the art includes a lot more than SNMP, CLI, syslog.  At a
minimum, you need to mention flow records here.  Possibly also
measurements, YANG-Push, etc etc. 

- Section 2.3 (p8)
Update reference to YANG-Push (RFC 8641+8639). 

- Section 2.4
Missing here is mention of flow information export (Netflow/IPFIX).
 
It is not clear what the document is doing here.  In some places it
sounds as if it argues that other types of telemetry (which ones?) or
other protocols / techniques (which ones/ for what needs?), but why - I
thought the purpose was to define a framework that describes what pieces
are needed, where different pieces fit, and how they are expected to
interwork, not so much criticize the current state of the art.  To
analyze the state of the art, important pieces are missing and not
mentioned - for example, I am missing a mention of measurement,
including e.g. OWAMP/TWAMP and RFC 6812 (currently missing entirely from
the references, but widely deployed in the industry so should be added). 

It is not clear if In-network processing and action should be included
here.  See also my earlier comment.  This looks like scope creep to
include not just telemetry but service assurance functionality that
would operate on top of it.

The mention of SDN and centralized processing is a bit confusing, its
purpose in this context not clear. 

- Section 3

It would be good to distinguish here between the need for a telemetry
framework vs the need for telemetry data.  The need for the latter is
clear; the need for the former should be explained as well.  

- Section 4.1

Not clear on sub-pub vs pub-sub.  May need an editorial scrub. 

It is not clear why you mention on-request queries.  Do you consider the
ability to request any type of management data as part of the telemetry
framework?  If so, where do you draw the line between a telemetry
framework and a broader management framework (the telemetry framework
becomes fairly all-encompassing at that point)? 

I am not sure I would categorize complex data that has been synthesized
and processed using complex algorithms as "telemetry data".  This goes
again to the question of what is considered telemetry data.  Is it the
raw data generated by the network (that's where I would draw the line),
or does it include any data that can be derived from that data? If it's
the latter, there is a slipper slope - you may need to include also data
derived from correlation with service and other data, learned models,
etc etc - before you know it it will encompass any data, information,
knowledge involving a network.  

- Figure 1 (p 13) 

I don't think the data type relationships as depicted are correct. 
Simple data could be streamed in its own right, and its generation also
triggered through events.  I think what triggers the generation/sending
of data is independent of the type of data itself.  Really, you might
have simple data and complex data (arguably, complex data might not be
telemetry data, but ok), and then different ways to trigger generation
of that data: As part of an event, as part of a subscription
(streaming), when some condition is met, etc etc. 

- Section 4.2.1

There is no section 4.2.2.  Having a single subsection looks a bit
strange.  Suggest to either remove 4.2.1 as a separate subsection, or to
elevate it to a second-level section, or to add a 4.2.2. 

- Section 4.2.1.3.1

Your discussion of methods active, passive, and hybrid covers very
different types of information - anything from sampled packet to flow
records to measurements to OAM.  I think it would be good to
differentiate by type of information.  Also, please consider mentioning
RFC 6812 (IPSLA) in addition to OWAMP/TWAMP. 

- Section 4.3 and Figure 4

While per se there is nothing wrong with what you describe here, and
perhaps it is best to keep things simple and to the basics as there will
be myriads of implementation variations, I think expectations for what
the framework actually entails should be framed a bit better earlier in
the document.  After all this leadup, finally seeing the proposed
framework appears a bit anti-climactic. This pretty much describes best
current practice today, pretty much describing a generic management
agent on a node will provide.  It does not cover aspects such as
end-to-end measurements (covering multiple nodes), any security
components (signing of data to prevent tampering, perhaps), integration
with the real resources and data sources (this starts at the "data
object", what about collecting data across multiple line card etc etc). 
At a minimum, it would be good to state what it is in scope and out of
scope.   

In the tables of the section, as Netconf and YANG are mentioned, SMIv2
should probably be mentioned and referenced along with SNMP as well. 

- Section 5

The difference betwen level 1 (dynamic) and level 2 (interactive)
telemetry is not very clear.  Any telemetry data can be used in a closed
loop; it is not clear why this is called level 3 (you can use level 0
telemetry data for plenty of closed loops). 

- Section 6

For the security considerations, there are a number of additional
possible telemetry attack vectors that could be mentioned here.  E.g.,
attacks aiming at generating telemetry data to exhaust network resources
as well as resources on the node, attacks aimed at falsifying results
and tampering with telemetry, swamping of receivers (for streaming data).  


--- Alex

On 4/13/2020 11:59 AM, internet-drafts@ietf.org wrote:
> A New Internet-Draft is available from the on-line Internet-Drafts directories.
> This draft is a work item of the Operations and Management Area Working Group WG of the IETF.
>
>         Title           : Network Telemetry Framework
>         Authors         : Haoyu Song
>                           Fengwei Qin
>                           Pedro Martinez-Julia
>                           Laurent Ciavaglia
>                           Aijun Wang
> 	Filename        : draft-ietf-opsawg-ntf-03.txt
> 	Pages           : 34
> 	Date            : 2020-04-13
>
> Abstract:
>    Network telemetry is the technology for gaining network insight and
>    facilitating efficient and automated network management.  It engages
>    various techniques for remote data collection, correlation, and
>    consumption.  This document provides an architectural framework for
>    network telemetry, motivated by the network operation challenges and
>    requirements.  As evidenced by some key characteristics and industry
>    practices, network telemetry covers technologies and protocols beyond
>    the conventional network Operations, Administration, and Management
>    (OAM).  It promises better flexibility, scalability, accuracy,
>    coverage, and performance and allows automated control loops to suit
>    both today's and tomorrow's network operation.  This document
>    clarifies the terminologies and classifies the modules and components
>    of a network telemetry system from several different perspectives.
>    The framework and taxonomy help to set a common ground for the
>    collection of related work and provide guidance for related technique
>    and standard developments.
>
>
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-ietf-opsawg-ntf/
>
> There are also htmlized versions available at:
> https://tools.ietf.org/html/draft-ietf-opsawg-ntf-03
> https://datatracker.ietf.org/doc/html/draft-ietf-opsawg-ntf-03
>
> A diff from the previous version is available at:
> https://www.ietf.org/rfcdiff?url2=draft-ietf-opsawg-ntf-03
>
>
> Please note that it may take a couple of minutes from the time of submission
> until the htmlized version and diff are available at tools.ietf.org.
>
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
>
>
> _______________________________________________
> OPSAWG mailing list
> OPSAWG@ietf.org
> https://www.ietf.org/mailman/listinfo/opsawg