[Ippm-ioam-ix-dt] IOAM Virtual Meeting Summary, December 11th, 2019

Tal Mizrahi <tal.mizrahi.phd@gmail.com> Wed, 11 December 2019 14:05 UTC

MIME-Version: 1.0
From: Tal Mizrahi <tal.mizrahi.phd@gmail.com>
Date: Wed, 11 Dec 2019 16:05:08 +0200
Message-ID: <CABUE3XkZvZymnr73Ys5cCuXLni0tvXY-CHfhs5DOekC35XAcHw@mail.gmail.com>
To: ippm-ioam-ix-dt@ietf.org
Content-Type: multipart/alternative; boundary="000000000000704aa205996e1e73"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm-ioam-ix-dt/Duil7LpwqmKkF1rCNOXGVbG1rX4>
Subject: [Ippm-ioam-ix-dt] IOAM Virtual Meeting Summary, December 11th, 2019
Precedence: list

=========================================
IPPM IOAM Immediate Exporting Design Team
Virtual meeting
December 11th, 2019, 07:00 UTC
Webex meeting
=========================================

Attendees:
Shwetha Bhandari, Frank Brockners, Barak Gafni, Greg Mirsky, Tal Mizrahi,
Mickey Spiegel.

Minutes by Tal Mizrahi.


Summary:
========
- Loopback flag: amplification attacks will be mitigated by minimizing the
size of the looped-back packets. Tal will propose new text that describes
this and create a pull request.
- Active flag: the use case of the active flag should be explained in
further detail. Tal will propose updated text.
- Active flag: cloning will be mentioned as an example in the flag draft,
but the mechanism will not be specified in the flag draft. Specifically,
security issues with cloning are out of scope.
- The next virtual meeting will be held on December 18th, 07:00 UTC.


Introduction:
=============
Tal: on our agenda we have the loopback flag and amplification attacks. Any
other issues?
Barak: I submitted a pull request about the DEX draft. I would like to
discuss it. One issue is the loopback. Another issue is how to export queue
occupancy in the data draft.
Tal: sure, we will start with the loopback issue and then continue with the
other issues.


Loopback flag and amplification attacks:
========================================
Tal: one of the main issues that came up regarding the loopback flag is
amplification attacks. There are a few possible ways of mitigating this
threat: (1) Rate limiting at the transit nodes. (2) Packet length
asymmetry: require probe packets with the loopback flag to be long, while
looped back packets are truncated to be significantly shorter. (3)
Cryptographic-based approach in which a token is used to verify the
identity of the sender. This does not prevent the threat, but limits its
scope. These are the possible directions to tackle this problem. Are there
any other directions?
Mickey: why do we need a response from every node? Why isn't one packet
good enough?
Barak: it is like Traceroute vs. Ping.
Frank: you don't know where you failed. Traceroute allows you to detect the
failure location.
Mickey: another possible solution is that the node that drops the packet
turns the packet around.
Frank: in some cases yes, but not generically. Not all devices will support
this. A bit similar to direct exporting.
Greg: I am confused. Traceroute is for troubleshooting, and when you lose
continuity you can use Traceroute to localize the problem. Comparing
loopback to direct exporting - these are mechanisms that achieve different
things.
Frank: whether it is for monitoring or for debugging does not necessarily
change the protocol.
Barak: debugging and monitoring can be the same thing in some cases.
Greg: when you lose continuity you perform a troubleshooting action such as
Traceroute. With IOAM you do not necessarily have a trigger for
troubleshooting, since you do not have a continuity check.
Frank: I agree. You need to have some means for continuity checking, which
is a trigger for the loopback. The question is how do you detect the
location of the problem within a single round trip time?
Greg: that is what active OAM is for.
Frank: IOAM originally was intended to be similar to UDP Pinger, using
probe traffic. It is used for simulating the live traffic, and when you
detect a problem you use the loopback in order to isolate the problem
within a single RTT.
Greg: so loopback collects data on the way back?
Tal: no. Following the discussion of IETF 106 we decided to remove that,
and now the loopback does not collect on the way back.
Greg: so why do you need to collect on the way forward? It is similar to
Linktrace in Ethernet (ITU-T Y.1731).
Tal: right, that would be similar to IOAM without any data collected.
Greg: Linktrace already exists. It does not collect data.
Tal: right. We are looking for something similar to Linktrace.
Specifically, you can choose not to collect data.
Greg: but there is no need to collect data.
Tal: like Linktrace, we want each device along the path to send its ID,
which can be done using the Node ID data field.
Greg: most network paths are unidirectional, and the response is over IP
networks.
Frank: how is Ethernet Linktrace related to IP networks? There is no
equivalent to Linktrace in IP networks.
Greg: the question is what are you trying to achieve. Failure localization
that can be done by OAM?
Frank: but let's not confuse L2 with L3. Continuity checks and Linktraces
are in L2. The UDP Pinger is in L4 and we are trying to do something
similar for IP networks with IOAM and the Loopback flag.
Greg: why do you need to collect data along the path?
Frank: it may be useful for the operator.
Greg: it is not needed. There is no need for other information beyond which
node responded last.
Mickey: I do not agree regarding port information. It is useful to know
which port is problematic.
Greg: in my view you do not need anything beyond which node responded last.
Mickey: you are comparing to historic solutions. It was not there in
previous solutions, but it still may be useful in a new solution.
Greg: overloading the function is a concern, as it becomes too heavy. If
loopback is only forward and response, that is fine. But if you are
collecting information that creates the amplification information. If you
say that in loopback you do not collect any information that would solve it.
Frank: that will not necessarily address the concern. For example, a
routing loop will cause multiple responding packets.
Greg: how do you create a routing loop?
Frank: imagine there already is a routing loop, and you are sending
loopback. That means you are sending loopback responses until the TTL
expires. That is a significant amplification. That is a problem. We want to
say that when you use loopback, the rate that is looped back is not
significant.
Greg: I agree.
Frank: maybe we can restrict loopback to just sending back the Node ID.
Greg: in IP networks you do not need to collect any information, since the
source IP is in fact the Node ID.
Mickey: adding the data is not necessarily a problem, but generating many
packets per second is the amplification issue at hand.
Greg: we are talking about an attack vector against the source. We want to
protect the source.
Mickey: number of packets is much more significant than the packet length.
Greg: if we want to localize the failure then we only need a response.
Tal: if I am hearing you correctly you are suggesting to limit the response
to a very short packet, and that way to reduce the impact of amplification.
Greg: as Frank suggests, also to limit how often you can send, and how
often you can respond.
Tal: I will create a pull request where we limit the amount of data on the
way back.

Active flag:
============
Tal: regarding the active flag. Cloning may create the same amplification
problem. One can argue that the flag draft only mentions cloning as an
example, and does not define the specification of the cloning mechanism.
Therefore, solving the amplification problem is not within the scope of the
current draft.
Mickey: I agree. It is important to mention cloning, but I believe we need
to say the specification of the mechanism is not within the scope of this
document.
Greg: what we think is sufficient information to understand the active flag
may not be the same as people who review the document. Especially for
people who implement the active flag. I do not understand the purpose of
the active flag from the document. Does it replace active OAM? There needs
to be more explanation regarding what it tries to achieve. Otherwise it may
be abused.
Mickey: this bit makes sure the packet is terminated by the decapsulating
node.
Greg: what are we trying to achieve?
Mickey: it is not intended to replace other OAM mechanisms.
Greg: does it mean that the packet is not meant to leave the IOAM domain?
Tal: it is meant to indicate termination by the decapsulating node. It is a
bit that can be utilized by an active OAM protocol, but is not intended to
replace active OAM protocols.
Greg: but we already have active OAM, like BFD. Why do you need to inject
synthetic packets?
Barak: please explain the question.
Greg: the IOAM processing is the same, whether the active flag is set or
not.
Tal: right. Fate sharing is one reason for that.
Greg: BFD requires bidirectional paths. It will not necessarily work with
IOAM, which is unidirectional.
Mickey: you are trying to get the information to your collector.
Greg: why do we need that active flag?
Barak: it simplifies the operation of the encapsulating node.
Mickey: we are collecting data along the path. Data flows might not give
you everything, since the network may be quiet. Also, for performance
issues it is going to help.
Greg: you can collect this information from the nodes. For example, with
segment routing we can collect information from the nodes.
Mickey: it is meant to be IOAM-specific but not transport specific.
Greg: there is no clear understanding of why active is needed.
Tal: I am taking an action item to add more text that clarifies the
motivation for the active flag.

AOB
===
Barak: please review my pull request.
Tal: next meeting will be next week.

[Ippm-ioam-ix-dt] IOAM Virtual Meeting Summary, D… Tal Mizrahi