Re: [ippm] Tsvart early review of draft-ietf-ippm-ioam-flags-06

Tal Mizrahi <tal.mizrahi.phd@gmail.com> Sun, 03 October 2021 12:40 UTC

MIME-Version: 1.0
References: <163063564823.15336.14662092029682255691@ietfa.amsl.com>
In-Reply-To: <163063564823.15336.14662092029682255691@ietfa.amsl.com>
From: Tal Mizrahi <tal.mizrahi.phd@gmail.com>
Date: Sun, 03 Oct 2021 15:40:29 +0300
Message-ID: <CABUE3XkMnySVLtKx5EDaKGCt39tYhk03a4-8yOkm7Vwgffo1rg@mail.gmail.com>
To: Bernard Aboba <bernard.aboba@gmail.com>, IPPM Chairs <ippm-chairs@ietf.org>
Cc: tsv-art@ietf.org, draft-ietf-ippm-ioam-flags.all@ietf.org, IETF IPPM WG <ippm@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/GaWkuGjz8K5mjf0s8IEjzJCO2vo>
Subject: Re: [ippm] Tsvart early review of draft-ietf-ippm-ioam-flags-06
Precedence: list

Dear Bernard,

Many thanks for the thorough review.

Please see the responses and text update suggestions below, marked with [TM].
Please let us know if there are any further comments.

On Fri, Sep 3, 2021 at 5:20 AM Bernard Aboba via Datatracker
<noreply@ietf.org> wrote:
>
> Reviewer: Bernard Aboba
> Review result: Ready with Issues
>
> Transport Area Review Team (TSV-ART)
> Document: draft-ietf-ppm-ioam-flags-06
> Reviewer: Bernard Aboba
> Verdict: Ready with Issues
>
> Overall comment
>
> The review request indicated:
> "Please review this document, specifically for security considerations around
> amplification attacks or similar concerns."
>
> Indeed, the amplification attacks do appear to be an important potential issue
> with the document.
>
> In Section 8 it is stated:
>
> "  IOAM is assumed to be deployed in a restricted administrative domain,
>    thus limiting the scope of the threats above and their affect.  This
>    is a fundamental assumtion with respect to the security aspects of"
>
> While this is a common argument, it does not impress me, particularly now that
> security breaches have become so common. So I'd prefer to start from the
> assumption that the administrative domain will eventually be breached.
>
> For example, one might start from the assumption that:
>
> 1. An attacker can find a way to inject packets with one or more of the flags
> defined in the document set to a value of their choosing. 2. An attacker can
> compromise the firmware load of an IOAM encapsulating node so as to affect
> execution of
>    the algorithms defined in the document.
>
> Then based on these assumptions, see what mitigations can be added to minimize
> the damage.
>
> For example, one might minimize the damage by adding a mandatory-to-implement
> "circuit breaker".
>

[TM] You make a valid point regarding the basic assumption and how it
is described.
The following text edit is proposed:
OLD:
   An attacker may attempt to overload network devices by injecting
   synthetic packets that include an IOAM Trace Option with one or more
   of the flags defined in this document.  Similarly, an on-path
   attacker may maliciously set one or more of the flags of transit
   packets.
...
   IOAM is assumed to be deployed in a restricted administrative domain,
   thus limiting the scope of the threats above and their affect.  This
   is a fundamental assumtion with respect to the security aspects of
   IOAM, as further discussed in [I-D.ietf-ippm-ioam-data].
NEW:
   IOAM is assumed to be deployed in a restricted administrative domain,
   thus limiting the scope of the threats above and their effect.  This
   is a fundamental assumption with respect to the security aspects of
   IOAM, as further discussed in [I-D.ietf-ippm-ioam-data]. However,
   despite this assumption, security threats should still be
   considered and mitigated. Specifically,
   an attacker may attempt to overload network devices by injecting
   synthetic packets that include an IOAM Trace Option with one or more
   of the flags defined in this document.  Similarly, an on-path
   attacker may maliciously set one or more of the flags of transit
   packets.
...
END

Then, the rest of this section (marked by "..." above), as currently
structured, provides more details about the threats and mitigations.
The "circuit breaker" sounds like it may create a DoS threat. The
currently proposed mitigation methods are intended to limit the impact
of a potential attack, but not to "break" anything.

> Overall, it seems like the aspect of most concern is the Loopback Flag, which
> seems like it offers significant magnification potential.
>
> Comments on individual sections
>
> Section 4.1
>
>    The reason for allowing a
>    single data field per hop is to minimize the impact of amplification
>    attacks.
>
> [BA] This will be effective if we can assume that nodes are running compliant
> firmware. But what if an attacker can run firmware of their choosing? What are
> the mitigations that can be applied by a node if it detects that another node
> is adding more than one data field per hop?
>

[TM] Right. Suggested text:
OLD:
   An IOAM trace option that has the Loopback flag set MUST have the
   value '1' in the most significant bit of IOAM-Trace-Type, and '0' in
   the rest of the bits of IOAM-Trace-Type.  Thus, every transit node
   that processes this trace option only adds a single data field, which
   is the Hop_Lim and node_id data field.  The reason for allowing a
   single data field per hop is to minimize the impact of amplification
   attacks.
NEW:
   An IOAM trace option that has the Loopback flag set MUST have the
   value '1' in the most significant bit of IOAM-Trace-Type, and '0' in
   the rest of the bits of IOAM-Trace-Type.  Thus, every transit node
   that processes this trace option only adds a single data field, which
   is the Hop_Lim and node_id data field. A transit node that receives
   a packet with an IOAM trace option that has the Loopback flag set
   and the IOAM-Trace-Type is not equal to '1' in the most significant
   bit and '0' in the rest of the bits, MUST NOT loop back a copy of
   the packet. The reason for allowing a
   single data field per hop is to minimize the impact of amplification
   attacks.

> Section 4.1.1
>
>    If an IOAM encapsulating node incorporates the Loopback flag into all
>    the traffic it forwards it may lead to an excessive amount of looped
>    back packets, which may overload the network and the encapsulating
>    node.  Therefore, an IOAM encapsulating node that supports the
>    Loopback flag MUST support the ability to incorporate the Loopback
>    flag selectively into a subset of the packets that are forwarded by
>    it.
>
> [BA] This MUST is a fairly basic capability (e.g. a lot of damage can be
> done even by nodes obeying this requirement). It seems like more should
> be required.

[TM] Right, and the rest of Section 4.1.1. indeed specifies more
functionality that mitigates these amplification attacks.

>
>    Various methods of packet selection and sampling have been previously
>    defined, such as [RFC7014] and [RFC5475].  Similar techniques can be
>    applied by an IOAM encapsulating node to apply Loopback to a subset
>    of the forwarded traffic.
>
>    The subset of traffic that is forwarded or transmitted with a
>    Loopback flag SHOULD NOT exceed 1/N of the interface capacity on any
>    of the IOAM encapsulating node's interfaces.  It is noted that this
>    requirement applies to the total traffic that incorporates a Loopback
>    flag, including traffic that is forwarded by the IOAM encapsulating
>    node and probe packets that are generated by the IOAM encapsulating
>    node.  In this context N is a parameter that can be configurable by
>    network operators.  If there is an upper bound, M, on the number of
>    IOAM transit nodes in any path in the network, then it is recommended
>    to use an N such that N >> M.  The rationale is that a packet that
>    includes the Loopback flag triggers a looped back packet from each
>    IOAM transit node along the path for a total of M looped back
>    packets.  Thus, if N >> M then the number of looped back packets is
>    significantly lower than the number of data packets forwarded by the
>    IOAM encapsulating node.  If there is no prior knowledge about the
>    network topology or size, it is recommended to use N>100.
>
> [BA] This is OK as far as it goes. But what happens if a node detects that
> amplification is occurring somewhere (e.g. another node isn't obeying the
> SHOULD NOT) Is a circuit breaker triggered?
>

[TM] In order for a node to detect that another node is malfunctioning
some stateful monitoring functionality is required. We currently do
not require this functionality, although an operator may choose to use
various counters with threshold-based notifications to the management
plane in specific locations in the network in order to detect such
functionality. Not sure it would be useful to describe this in the
document, as it seems like common practice.

> Section 4.2
>
>    An IOAM node that supports the reception and processing of the
>    Loopback flag MUST support the ability to limit the rate of the
>    looped back packets.  The rate of looped back packets SHOULD be
>    limited so that the number of looped back packets is significantly
>    lower than the number of packets that are forwarded by the device.
>    The looped back data rate SHOULD NOT exceed 1/N of the interface
>    capacity on any of the IOAM node's interfaces.  It is recommended to
>    use N>100.  Depending on the IOAM node's architecture considerations,
>    the loopback response rate may be limited to a lower number in order
>    to avoid loading the IOAM node.
>
> [BA] Here the SHOULD does not seem strong enough to me. If for example,
> another node is forwarding more than 1/N, and this is detected, shouldn't
> some mandatory circuit breaker be triggered?
>
> Section 5
>
>    If the volume of traffic that incorporates the Active flag is large,
>    it may overload the network and the IOAM node(s) that process the
>    active measurement packet.  Thus, the rate of the traffic that
>    includes the Active flag rate SHOULD NOT exceed 1/N of the interface
>    capacity on any of the IOAM node's interfaces.  It is recommended to
>    use N>100.  Depending on the IOAM node's architecture considerations,
>    the rate of Active-enabled IOAM packets may be limited to a lower
>    number in order to avoid loading the IOAM node.
>
> [BA] Again, the SHOULD NOT does not seem strong enough to me. I'd suggest
> that some kind of mandatory circuit breaker should be required.

[TM] Regarding the last two comments, as mentioned above, we do not
require an IOAM transit node to be able to detect when another IOAM
node does not limit its rate. The assumption is that given that there
*is* an attacker, the current node limits its rate in order to limit
the impact of the attack and avoid amplifying the attack. However, the
current node does not necessarily implement any monitoring or
intrusion detection that reports potential attacks.

>
> NITs
>
> Abstract
>
>    a path between two points in the network.  This document defines two
>    new flags in the IOAM Trace Option headers, specifically the the
>
> s/the the/the/

[TM] Will be fixed.

>
> Section 8
>
>    IOAM is assumed to be deployed in a restricted administrative domain,
>    thus limiting the scope of the threats above and their affect.  This
>    is a fundamental assumtion with respect to the security aspects of
>
> s/assumtion/assumption/

[TM] Will be fixed.

>
>    IOAM, as further discussed in [I-D.ietf-ippm-ioam-data].
>
>
>

Thanks,
Tal.

[ippm] Tsvart early review of draft-ietf-ippm-ioa… Bernard Aboba via Datatracker
Re: [ippm] Tsvart early review of draft-ietf-ippm… Tal Mizrahi