[I2nsf] Benjamin Kaduk's Discuss on draft-ietf-i2nsf-nsf-monitoring-data-model-15: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Wed, 16 February 2022 07:26 UTC
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-i2nsf-nsf-monitoring-data-model@ietf.org, i2nsf-chairs@ietf.org, i2nsf@ietf.org, dunbar.ll@gmail.com, dunbar.ll@gmail.com
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <164499636418.12383.5348778569888235022@ietfa.amsl.com>
Date: Tue, 15 Feb 2022 23:26:04 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/i2nsf/1UI9tA1kZ8uQaTwvveIpkKW5I5E>
Subject: [I2nsf] Benjamin Kaduk's Discuss on draft-ietf-i2nsf-nsf-monitoring-data-model-15: (with DISCUSS and COMMENT)
Benjamin Kaduk has entered the following ballot position for
draft-ietf-i2nsf-nsf-monitoring-data-model-15: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-i2nsf-nsf-monitoring-data-model/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

(1) I'm not sure I understand the motivation for recommending (in §6.3.4) that
the HTTP Cookie header field be included in a notification about a Web
Attack Event.  In general, the cookie field can contain very sensitive
information, including credentials, and it is very risky to be sending the
cookies around outside of their primary protocol context.  Perhaps, if we
are fully confident that the NSF has correctly identified an attack, it
might be useful to send the cookies around, but I think there are still some
scenarios (e.g., a compromised end-user browser) where the cookies in an
attack request are still confidential information that should not be
disclosed.  Could we say more about why it is recommended to always include
the cookies or weaken the recommendation?

(2) I'm not sure I understand the relationship between the different pieces
of information listed in the information model (§6.7.1) for firewall
counters.  My understanding is that typically a firewall will function as if
it were a "bump in the wire" on a particular wire, processing all traffic
into and out of a given part of the network (at least on a particular
interface), and that the internal network might contain multiple machines
that reside on multiple network prefixes.  So when we have the information
model that looks to be the counters reported by a firewall security
function, I don't know how to interpret fields like "Source IP address" and
"Destination port", which are typically tied to a particular flow or
machine, whereas the firewall covers many different flows and potentially
many machines as well.  Is the intent to report on firewall behavior on a
per-flow granularity, akin to what IPFIX does?  That seems likely to produce
a very high volume of log information and it's not clear how useful it would
be to the NSF data collector.  The YANG data model uses a list with the
policy name as the list key, indicating that perhaps the intent is to show
what a given firewall policy has done, but (a) that should be made clear
from the description in the information model, and (b) it still doesn't help
relate the different 4-tuple components to each other.

(3) The information model gives a list of DPI action types that's prefaced
with "e.g.", indicating that it is giving examples only and is not
comprehensive, but the dpi-type YANG typedef is modeled as an enumeration
that does not allow extensibility for future values not listed here.  This
seems like an internal inconsistency that needs to be rectified, whether by
claiming to be a comprehensive list or by switching the YANG to use
extensible identities to represent the DPI action types.

(4) Please confirm that we have achieved the intended level of consistency
between the information model and the YANG data model, in light of the
remarks in my COMMENT section around Sections 6.3.2 through 6.3.5, 6.4.3,
and 6.5.1.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I share the surprise of the directorate reviewer that we need to report on
disk, CPU, and RAM, and interface statistics in the NSF-specific model
(rather than using a more generic OAM functionality for those reports), but
I guess it is harmless to duplicate the functionality.

The way that the timestamp grouping is integrated into the data nodes seems
quite unusual to me.  Not least because it's repeated in each element of a
list (rather than a single timestamp that applies to all entries, which "the
time of the message" description would imply), but also because we typically
think of YANG as providing a data model for state of the device being
monitored, whereas this data node is more a function of when data is
retrieved than a function of the state of the node.

Section 2

   *  Subscription: An agreement initialized by the NSF data collector
      to receive monitoring information from an NSF.  The method to
      subscribe follows the method explained in [RFC5277].

It looks like RFC 5277 is specific to NETCONF; do we want to reference RFC
8650 for RESTCONF notifications as well?

Section 3

   *  The I2NSF User that is the security administrator can configure a
      policy that is triggered on a specific event occurring in the NSF
      or the network [RFC8329]
      [I-D.ietf-i2nsf-consumer-facing-interface-dm].  If an NSF data
      collector detects the specified event, it configures additional
      security functions as defined by policies.

I wonder if it might be more appropriate to indicate that it is the security
controller (which is often, but not required to be, the data collector) that
would configure the additional security functions.

Section 4.1

   I2NSF Record:  A record is defined as an item of information that is
      kept to be looked at and used in the future.  Typically, records
      are information generated by a system entity (e.g., NSF) that is
      based on operational and informational data (i.e., various changes
      in system characteristics), and are generated at particular
      instants to be kept without any changes afterward.  A set of
      records has an ordering in time based on when they are generated.

(side note) I think you should probably keep this text as it is now, but in
a certain sense, you could get a set of records (most notably when from
different sources) and even though they all have timestamps attached, it may
not always be possible to recover a strict ordering in time for the events,
owing to clock skew, clock precision, and the possibility of parallel
execution.  But trying to describe that subtlety would detract from the
point being made here and would (in my opinion) take more words than it's
worth.

Section 4.2

   A specific task of an I2NSF User is to process I2NSF Policy Rules.

If the User is a human, would it make more sense to say that they "provide"
the policy rules, rather than to "process" them?  The current "process"
formulation seems to imply that the User is involved in applying or taking
action based on the policy rules, and I'm not sure if that's the intent.

Section 6

   *  Emission type: The cause type for the message to be emitted.  It
      can be "on-change", "periodic", or "on-request".  An "on-change"

Does the emission type matter at all when the acquisition method is "query"?
(That is, does it only apply when the aquisition method is "subscription"?)

Section 6.1.3

   *  usage: Specifies the size of disk space used.

The YANG instantiation of this represents 'usage' as percent, so we probably
don't want to say "size", which implies a non-percentage amount.

Section 6.2.4

It's not clear that the src-mac/dst-mac should always be included in the
traffic flow event.  They make sense in some situations, but (for example)
if the NSF is in the middle of a provider network, the MAC addresses will
just be from its neighboring routers in the provider and do not convey
information about the customer traffic that is triggering the
notification/report.

   *  arrival-rate: Arrival rate of packets of the traffic flow in
      packet per second calculated from the beginning of the flow.

   *  arrival-throughput: Arrival rate of packets of the traffic flow in
      bytes per second calculated from the beginning of the flow.

Thank you for making a clear definition of how these rates are computed.
That said, I am not sure that "calculated from the beginning of the flow"
will be the most useful value, as a more instantaneous rate might also be of
interest (e.g., measured over the past minute or ten minutes).

Section 6.3.1

   *  attack-src-ip: The IP address of the source of the DDoS attack.

Is this really going to always be a single IP address, even for a
*distributed* denial of service attack?
I see that the YANG models this as a leaf-list, which suggests that this
should be described using the plural sense of the terms.

However, it is still unclear that attempting to enumerate every IP addres
seen participating in a DDoS attack is a useful thing to do.  In other
contexts (e.g., the DOTS WG), we limit ourselves to just picking an
arbitrary sampling of the "top talkers" and conserving server resources by
not trying to list out the long tail of attack IPs.

   *  dst-port: The port number that the attack traffic aims at.

Likewise, might an attack target a range of ports?

Section 6.3.2, 6.3.3, 6.3.4

We do not list an "action" information element here, but there is a data
element for the log-action in the YANG data model in §8.

Section 6.3.3

We do not list information elements here for attack-rate or
attack-throughput, but there are such data elements in the corresponding
YAND model in §8.  (I think that is likely to be an error in the YANG rather
than an omission from here, though.)

   *  protocol: The employed transport layer protocol. e.g., TCP or UDP.

   *  app: The employed application layer protocol. e.g., HTTP or FTP.

It might be worth considering whether QUIC would be listed as the transport
or application layer protocol (or both).

Section 6.3.4, 6.3.5

We do not list an information element here for "severity", but such an
element existst in the corresponding YANG data model in §8.  (I think that
it is likely to be an error in the YANG rather than an omission from here,
though.)

Section 6.3.4

   *  req-user-agent: The HTTP User-Agent header field of the request.

Perhaps no action is needed at this time, but there is some effort underway
at the W3C to deprecate the user-agent header field in favor of something
like RFC 8942 Client Hints (see, e.g.,
https://blog.chromium.org/2021/05/update-on-user-agent-string-reduction.html).
The user-agent value may not remain a valuable piece of information for much
longer.

Section 6.4.3

We list "cause" as a potential additional information element here, but
there does not seem to be a way to represent that information in the YANG
data model in §8.

Section 6.5.1

I'm slightly curious what the src-user information would be used for when
processing the DPI logs.  It doesn't seem inherently problematic to include
this information, but I am not sure when it would be useful.

We list "action" as an information model element here, but I don't see a way
to represent that in the YANG data model in §8.  It may have been misplaced
in the ietf-nsf-detection-ddos container, which has such a leaf-list that is
not reflected in the corresponding information model.

Section 6.6.1

Would it make sense to have a "current rate/throughput" (computed over the
past, e.g., 1 or 10 minutes as previously) to supplement the average and
peak rates that are already listed?

Section 6.7.1

As I remarked on §6.5.1, some clarity on why the name of the I2NSF User that
generated the policy is worth reporting, would be useful.  (This sentiment
applies throughout the document, but I will stop repeating it.)

Likewise, my remark from §6.6.1 about "current rate" seems to apply here as
well.

Section 7

It's interesting that the subsection structure under §6 doesn't quite match
up to the YANG tree structure for the i2nsf-event notification's
sub-event-type choice.  (To be clear, it may or may not be problematic that
they don't match; I just don't know the motivation for doing it this way as
I read through from the top.)

Section 8

I would consider using a more complicated grouping structure so that the
'message' leaf (and probably severity and timestamp as well, if not
vendor-name and nsf-name) currently in common-monitoring-data does not need
to appear under the lists in the tree of data nodes.  The 'message' seems
tailored for notifications only.

I don't understand why there is no configuration leaf for the
virus-detection and VoIP/VoCN notifications (e.g., the "enabled" and
"dampening-period" that we have for the other feature-controlled
functionality).

Some of the nodes that are using uint32 might merit bumping up to a wider
type.  E.g., the ddos attack-rate is currently modeled as uint32, but we've
apparently seen an 809 Mpps attack a couple years ago, which is close enough
to values that are unrepresentable in uint32 to cause some worry about
future growth in attack size.

     typedef log-action {
       type enumeration {

I would suggest adding more description to most of the enum values.  E.g.,
what scope does a "block-ip" or "block-service" block apply to?

     identity dampening-type {
       base characteristics;
       description
         "The type of message dampening to stop the rapid transmission
          of messages. The dampening types are on-repetition and
          no-dampening";

Making a claim like this that there are only two possible types, seems more
aligned with a YANG enumeration than an identity-based scheme.  (But I do
not see a strong motivation to change it, at this time.)

     identity protocol {
       description
         "An identity used to enable type choices in leaves
          and leaflists with respect to protocol metadata. This is used
          to identify the type of protocol that goes through the NSF.";

Why are we defining our own set of identities to identify protocols?  Is
there no suitable prior art we could import?

     grouping characteristics {
       description
         "A set of characteristics of a notification.";

Note that this grouping is used in the system-interface, nsf-firewall, and
nsf-policy-hits lists, so the description of being "of a notification" is
not accurate at present.  (The grouping's members like "dampening-type"
don't seem to make much sense in a data node tree.)

       leaf acquisition-method {
         [...]
       leaf emission-type {

Also, it's slightly surprising that the "acquisition-method" and "emission-type"
are included in the notification payloads, when to some extent they  must be
known already from the context in which the notification is receieved.

         case i2nsf-system-detection-event {
           container i2nsf-system-detection-event {
             description
               "This notification is sent when a security-sensitive
                authentication action fails.";

This description does not seem to match the name of the event/case here.

             list changes {
               key policy-name;
               description
                 "Describes the modification that was made to the
                  configuration. The minimum information that must be
                  provided is the name of the policy that has been
                  altered (added, modified, or removed).
                  This list can be extended with the detailed
                  information about the specific changes made to the
                  configuration based on the implementation.";

Should we say this is only applicable to the configuration-change events?

         case i2nsf-traffic-flows {
           container i2nsf-traffic-flows {
           [...]
             leaf protocol {
               type identityref {
                 base protocol;
               }
               description
                 "The protocol type for nsf-detection-intrusion
                  notification";

but this isn't the nsf-detection-intrusion notification.

         case i2nsf-nsf-log-dpi {
           if-feature "i2nsf-nsf-log-dpi";
           container i2nsf-nsf-log-dpi {

(Per above,) it seems like this container is missing a "uses log-action".

             leaf end-time {
               type yang:date-and-time;
               description
                 "The time stamp indicating when the attack ended. If
                  the attack is still undergoing when sending out the
                  notification, this field can be empty.";

Empty or omitted?

             leaf-list attack-src-port {
               type inet:port-number;
               description
                 "The transport layer source ports of the DDoS attack";
             }
             leaf-list attack-dst-port {
               type inet:port-number;
               description
                 "The transport layer destination ports of the DDoS
                  attack";

We might say something about how not all ports will have been seen on all
the corresponding src/dest IP addresses.

             leaf file-type {
               type string;
               description
                 "The type of file virus code is found in (if
                  applicable).";
               reference
                 "IANA Website: Media Types";

I don't think media type is a common reference for the notion of "file
type", as might be reflected by the file's suffix.

Section 10

While this text probably suffices to convey the needed requirements, I note
that the REGEXT working group has a long-established formulation for
expressing what seems to be essentially the same requirement.  E.g., from
RFC 9095:

%  This document uses the prefix "b-dn" for the namespace
%  "urn:ietf:params:xml:ns:epp:b-dn" throughout.  Implementations cannot
%  assume that any particular prefix is used and must employ a
%  namespace-aware XML parser and serializer to interpret and output the
%  XML documents.

It might be worth considering reusing this established formulation instead
of creating a new one.

Section 12

I suggest cautioning consumers of the input and output of the system access
log to take care when processing those contents, in light of common shell
vulnerabilities relating to quoting and wildcard expansion.

   Additionally, many of the data nodes in this YANG module such as
   containers "i2nsf-system-user-activity-log", "i2nsf-system-detection-
   event", and "i2nsf-nsf-detection-voip-vocn" are privacy sensitive.
   They may describe specific or aggregate user activity including
   associating user names with specific IP addresses; or users with
   specific network usage.

Let's also add another couple sentences here: "They also may describe the
specific commands that were run by users and the resulting output.  Any
sensitive information in that command input or output will be visible to the
NSF data collector and potentially other entities, and care must be taken to
protect the confidentiality of such data from unauthorized parties."

NITS

Section 1

   This document defines an information model of an NSF monitoring
   interface that provides visibility into an NSF for the NSF data
   collector.  Note that an NSF data collector is defined as an entity
   to collect NSF monitoring data from an NSF, such as Security
   Controller.  It specifies the information and illustrates the methods
   that enable an NSF to provide the information required in order to be
   monitored in a scalable and efficient way via the NSF Monitoring
   Interface.  [...]

I think in the last quoted sentence the "it specifies" is referring again to
"this document" rather than the NSF data collector mentioned in the
preceding sentence.  We might state "this document" again specifically, or
put the "note" sentence inside parentheses (but not both), to clarify what
the pronoun is referring to.

Section 4

   Every system entity creates information about some context with
   defined I2NSF monitoring data, and so every entity can be an I2NSF
   component.  [...]

We might want to add another few words to clarify that when we say "every
(system) entity" we are referring to every entity within a certain scope.
(I.e., what is that scope?)

Section 4.1

      something that happens which may be of interest.  Examples for an
      event are a fault, a change in status, crossing a threshold, or an

s/for/of/

   I2NSF Record:  A record is defined as an item of information that is
      kept to be looked at and used in the future.  Typically, records
      are information generated by a system entity (e.g., NSF) that is
      based on operational and informational data (i.e., various changes

s/that is/that are/

      entity or NSF.  The examples of records include as user
      activities, device performance, and network status.  They are

s/include as/include/

Section 4.2

   In I2NSF monitoring, a notification is used to deliver either an
   event and a record via the I2NSF Monitoring Interface.  The
   difference between the event and record is the timing by which the
   notifications are emitted.  An event is emitted as soon as it happens

I think s/event and a record/event or a record/, to match with the preceding
"either".

Section 6.1.5

   *  severity: The severity level of the message.  There are total
      levels, i.e., critical, high, middle, and low.

s/total/four/

Section 6.3.1

   *  attack-type: The type of DoS or DDoS Attack, i.e., SYN flood, ACK
      flood, SYN-ACK flood, FIN/RST flood, TCP Connection flood, UDP
      flood, ICMP flood, HTTPS flood, HTTP flood, DNS query flood, DNS
      reply flood, SIP flood, SSL flood, and NTP amplification flood.

Please consider s/SSL/TLS/.

Section 6.4.1

   Access logs record administrators' login, logout, and operations on a
   device.  By analyzing them, security vulnerabilities can be
   identified.  [...]

I'd suggest s/security vulnerabilities can be/some security vulnerabilities
can be/.

Section 8

         enum block-service{

missing space before open curly brace.

     typedef dpi-type{

ditto

Please look for more instances; there seem to be too many to list
individually.

     identity application-protocol {
       base protocol;
       description
         "Base identity for Application protocol. Note that popular
          application protocols (e.g., HTTP, HTTPS, FTP, POP3, and
          IMAP) are handled in this YANG module, rather than all
          the existing application protocols.";

I suggest saying "a subset of" rather than "popular", to avoid sparking
arguments about what protocols are "popular".

             leaf src-ip {
               type inet:ip-address-no-zone;
               description
                 "The source IPv4 (or IPv6) address of the flow";
             }
             leaf dst-ip {
               type inet:ip-address-no-zone;
               description
                 "The destination IPv4 (or IPv6) address of the flow";

I think it would be okay to omit the parentheses and just say "IPv4 or
IPv6".
[I2nsf] Benjamin Kaduk's Discuss on draft-ietf-i2… Benjamin Kaduk via Datatracker