[OPSAWG] Benjamin Kaduk's Abstain on draft-ietf-opsawg-ntf-12: (with COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 02 December 2021 23:54 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-opsawg-ntf@ietf.org, opsawg-chairs@ietf.org, opsawg@ietf.org, ludwig@clemm.org, ludwig@clemm.org
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <163848927236.16105.4220774638600851890@ietfa.amsl.com>
Date: Thu, 02 Dec 2021 15:54:32 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsawg/zBLVHALKFGVgv7_iB44kJDkR508>
Subject: [OPSAWG] Benjamin Kaduk's Abstain on draft-ietf-opsawg-ntf-12: (with COMMENT)

Benjamin Kaduk has entered the following ballot position for
draft-ietf-opsawg-ntf-12: Abstain

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-opsawg-ntf/



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Thanks for making the applicability statement more prominent in the -12.

I think the document paints an exciting picture of a new mindset in
which to frame discussion of network monitoring and management (even if
it does stray too far into marketing language for my taste in places).
It doesn't do quite as well at convincing me that an entirely new
technology suite is merited (as opposed to just extending existing
protocols to align with the new mindset), but I am willing to admit the
possibility that the new technology suite is the right approach.

That said, I have strong misgivings about the current state of the
document, mostly relating to privacy considerations and the risk of
pervasive monitoring, so I am balloting Abstain.

While we do clearly say to not analyze individual users, we also have
guidance (e.g., in §2.1) that only says "no user packet content should
be collected".  However, packet contents are not the only things that
can be a threat to user privacy, and we've seen numerous instances where
just metadata about user flows are sufficient to make strong conclusions
about user behavior that impact user privacy.  But if we try to
strengthen the requirement to be not collecting any data about user
packets, the utility of the system decreases greatly, and I don't see a
clear way to reconcile the impasse.

(There are also a few lingering references to "user flows", "user
packets", "user traffic", etc. in the main body text, especially in
§2.3.  I'm not convinced that all of the instanaces of these phrases are
compatible with the applicability statement.)

Furthermore, the applicability statement seems to be a case of wishful
thinking.  I do not see any proposals for technical measures to enforce
that data is not collected from networks where endpoints represent
users, and I also don't see any mechansisms to disincentivize such use
in favor of other, more privacy-friendly, alternatives.  So even if we
consider such usage of the network telemetry framework to be an abuse
case rather than a use case, if we are going to honestly document the
implications of the technology, I can't escape the conclusion that we
need to consider these scenarios in our assessment of whether we are
defining the right technology.


Though I am balloting Abstain, I will also some specific comments on the
document that might help improve it, even if I may not be completely
happy with the resulting document (for the reasons described above).

It's pretty surprising to see a document that mentions autonomic
networking and aims to achieve self-managing networks make no reference
at all to the IETF ANIMA WG or its outputs, a group that is specifically
chartered to produce protocols and procedures for automated network
management.  In particular, it's my understanding that ANMIA has had
very little traction with network Intent thus far, and this document
references IRTF documents in many places (both for Intent and other
things).  Are we confident that these concepts are ready to move from
the IRTF into the engineering world?

Section 1

   Network visibility is the ability of management tools to see the
   state and behavior of a network, which is essential for successful

In the TLS WG we've sometimes seen participants use the term
"visibility" to include the plaintext of encrypted data flows.  While I
have no reason to believe that that's a universally held understanding
of the term, I mention it only to ask that clarification be provided if
the intent of the term here is to include such decryption capabilities.
If the intent is only to observe the normal visible wire image of the
protocol, I don't see particular need for clarification.

Section 2

   forward.  When a network's endpoints do not represent individual
   users (e.g. in industrial, datacenter, and infrastructure contexts),
   network operations can often benefit from large-scale data collection
   without breaching user privacy.

In the vein of my toplevel remarks, I don't think that just "a network's
endpoints do not represent individual users" is sufficient to ensure
that large-scale data collection does not breach user privacy.  It
covers first-order effects, I think, but we've seen a lot of research
indicating that second- and higher-order analyses can still extract
information that reduces user privacy.

Section 2.1

   To preserve the privacy of end-users, no user packet content should
   be collected.  Specifically, the data objects generated, exported,
   and collected by a network telemetry application should not include
   any packet payload from traffic associated with end-users systems.

Also in the vein of my toplevel remarks, while "do not include user
traffic payload" is a minimum requirement, and I'm happy to see it
stated clearly, it in and of itself is not sufficient to fully protect
end-user privacy.

Section 2.2

      visibility into networks.  The ultimate goal is to achieve the
      security with no, or only minimal, human intervention.

It's easy to achieve security without human intervention, if you're
willing to accept a high false positive rate and denial of legitimate
traffic.  Should we say something about tempering the security goal with
a need for not disrupting legitimate traffic flows?

Section 2.3

      Conventional OAM only covers a narrow range of data (e.g., SNMP
      only handles data from the Management Information Base (MIB)).

This argument feels a bit weak given that anyone with an OID arc (that
is, just about anyone) can add to the MIB.

Section 2.4

   Network telemetry has emerged as a mainstream technical term to refer

It's a little surprising to see network telemetry called a "mainstream"
term here, when up in §1 we said that it lacks an unambiguous
definition.

   *  Model-based: The telemetry data is modeled in advance which allows
      applications to configure and consume data with ease.
   [...]
   *  In-Network Customization: The data that is generated can be
      customized in network at run-time to cater to the specific need of
      applications.  This needs the support of a programmable data plane
      which allows probes with custom functions to be deployed at
      flexible locations.

I'm having a hard time seeing how data that's customized in-network at
runtime would be compatible with being modeled in advance.  Maybe the
disclaimer about "not expected to be held by every specific technique"
is intended to apply here, but it might be worth acknowledging the
tradeoff.

   *  In-band Data Collection: In addition to the passive and active
      data collection approaches, the new hybrid approach allows to
      directly collect data for any target flow on its entire forwarding
      path [I-D.song-opsawg-ifit-framework].

I'm pretty skeptical that the functionality that's claimed here (and in
the referenced draft) can be achieved while complying with the existing
requirements from current IETF RFCs.  I recognize that this is under the
"an ideal [solution] may also have" heading, but it still feels a little
premature to include.

Section 3.1

I'm having a really hard time seeing how figure 2 is internally
consistent if it lists "plain text" as the only option for data encoding
of data modelled using YANG (e.g., in the forwarding plane column).

Section 3.1.1

   network statistics and state data.  The management plane includes
   many protocols, including some that are considered "legacy", such as
   SNMP and syslog.  Regardless the protocol, management plane telemetry

It's not clear that we gain any real value from labeling SNMP and syslog
as "legacy".  Perhaps we should just skip the examples and avoid debate
on what is or isn't legacy (leaving each person to hold their own
opinion on that question)?

Section 3.1.2

      Then in case of an unusually poor UE KPI or a service
      disconnection, it is non-trivial to delimit and pinpoint the issue
      in the responsible protocol layer (e.g., the Transport Layer or
      the Network Layer), the responsible protocol (e.g., ISIS or BGP at
      the Network Layer), and finally the responsible device(s) with

I don't really follow the example of IS-IS or BGP "at the Network
Layer" -- in what sense do we use "network layer" here?

Section 3.3

I don't really understand the logic behind the direction of arrowheads
in Figure 4.  I'd be more inclined to just remove the figure than add
more explanatory text, though, as the relationships don't seem terribly
key to the core purpose of this document.

Section 5

   *  Authentication and signing of telemetry data to make data more
      trustworthy.

Signing is typically treated as a way to provide authentication; it
might make more sense to discuss "authentication and integrity
protection" in terms of the typical security properties we consider.

NITS

Section 1

   operations.  Based on the distinction of modules and function
   components, we can map the existing and emerging techniques and

It would be "distinction between" or "definition of", I think.

   protocols into the framework.  The framework can also simplify the
   designing, maintaining, and understanding a network telemetry system.

The "the" leading into "designing, maintaining, and understanding"
should be removed.

   The purpose of the framework and taxonomy is to set a common ground
   for the collection of related work and provide guidance for future
   technique and standard developments.  To the best of our knowledge,

s/technique/techniques/

Section 1.2

   AI:  Artificial Intelligence.  In network domain, AI refers to the
      machine-learning based technologies for automated network
      operation and other tasks.

"the network domain"

   SNMP:  Simple Network Management Protocol.  Version 1, 2, and 3 are
      specified in [RFC1157], [RFC3416], and [RFC3414], respectively.

RFC 3411 might be a better reference for SNMPv3, as it's the
architecture doc (rather than the user-based security model doc).

Section 2

   It is conceivable that an autonomic network [RFC7575] is the logical
   next step for network evolution following Software Defined Network

I think "Software Defined Networking" would fit better in this
situation.

   protocols are insufficient for these use cases.  The discussion
   underlines the need of new methods, techniques, and protocols, as
   well as the extensions of existing ones, which we assign under the

s/need of/need for/

Section 2.2

      Given increasingly sophisticated attack vector coupled with

"vectors" plural

      visibility into networks.  The ultimate goal is to achieve the
      security with no, or only minimal, human intervention.

s/the//

      visibility that is provided through network telemetry data.  Any
      violation must be notified immediately, potentially resulting in
      updates to how the policy or intent is applied in the network to

The subject of the verb "notified" is the target of the notification,
not the thing that the notification is about.  So "reported" might fit
better here.

      operators need to evaluate how they can deliver the services that
      can meet the SLA based on realtime network telemetry data,
      including data from network measurements.

s/deliver the services/deliver services/

Section 2.3

   *  Comprehensive data is needed from packet processing engines to
      traffic manager, from line cards to main control board, from user
      flows to control protocol packets, from device configurations to
      operations, and from physical layer to application layer.

It's possible to read this as a set of "from A to B" relations where A
is sending data to B".  I think that's not the intent, and this is just
intending to show a broad spread of scenarios across many different
axes; if that's the case, I'd suggest "... needed, ranging from"

   *  The conventional passive measurement techniques can either consume
      excessive network resources and render excessive redundant data,

Something seems awry around "render excessive redundant data", to the
extent that I can't extrct meaning and propose an alternative.

Section 2.4

      overall network automation needs.  Efforts are made to normalize
      the data representation and unify the protocols, so to simplify
      data analysis and provide integrated analysis across heterogeneous

"so as to"

Section 2.5

      network with a low data sampling rate.  Only when issues arise or
      critical trends emerge should telemetry data source be modified
      and telemetry data rates boosted as needed.

I think we need ""the telemetry data source".

Section 3.1.2

   *  An example of the control plane telemetry is the BGP monitoring
      protocol (BMP), it is currently used for monitoring the BGP routes

I'd end the sentence at this comma to avoid a comma splice.

Section 3.2

      responsible for configuring the desired data that might not be
      directly available form data sources.  The subscription data can

s/form/from/

Section 5

   *  Protocol transport used telemetry data and inherent security
      capabilities;

There seems to be a word or two missing here, maybe "used for" and "its
inherent".

Section A.3.6

   Various data planes raises unique OAM requirements.  IETF has

s/raises/raise/

[OPSAWG] Benjamin Kaduk's Abstain on draft-ietf-o… Benjamin Kaduk via Datatracker
Re: [OPSAWG] Benjamin Kaduk's Abstain on draft-ie… Haoyu Song