Re: [ippm] Adoption call for draft-mizrahi-ippm-ioam-flags Re: Regarding draft-mizrahi-ippm-ioam-flags

On Thu, Aug 1, 2019 at 11:48 PM Frank Brockners (fbrockne)
<fbrockne@cisco.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Tom Herbert <tom@quantonium.net>
> > Sent: Freitag, 2. August 2019 00:27
> > To: Frank Brockners (fbrockne) <fbrockne@cisco.com>
> > Cc: Greg Mirsky <gregimirsky@gmail.com>; IPPM Chairs <ippm-
> > chairs@ietf.org>; IETF IPPM WG <ippm@ietf.org>
> > Subject: Re: [ippm] Adoption call for draft-mizrahi-ippm-ioam-flags Re:
> > Regarding draft-mizrahi-ippm-ioam-flags
> >
> > On Thu, Aug 1, 2019 at 12:12 PM Frank Brockners (fbrockne)
> > <fbrockne@cisco.com> wrote:
> > >
> > > Hi Greg,
> > >
> > >
> > >
> > > Please see inline…
> > >
> > >
> > >
> > > From: Greg Mirsky <gregimirsky@gmail.com>
> > > Sent: Donnerstag, 1. August 2019 20:54
> > > To: Frank Brockners (fbrockne) <fbrockne@cisco.com>
> > > Cc: Tom Herbert <tom@quantonium.net>; IPPM Chairs
> > > <ippm-chairs@ietf.org>; IETF IPPM WG <ippm@ietf.org>
> > > Subject: Re: [ippm] Adoption call for draft-mizrahi-ippm-ioam-flags
> > > Re: Regarding draft-mizrahi-ippm-ioam-flags
> > >
> > >
> > >
> > > Hi Frank,
> > >
> > > thank you for your expedient response and the clarification, much
> > appreciated. I have some follow-up questions but your response, in my opinion,
> > supports my original evaluation of the draft that it is not ready for WG adoption.
> > I don't agree that the presumed benefits of the proposed Loopback flag
> > outweigh risks that were called out during the meeting and were pointed by Tom
> > and me.
> > >
> > > Also, thank you for informing everyone that a design team is forming to define
> > the use of the Immediate flag. I think that that flag should be introduced along
> > with the clear and firm specification of its utilization.
> > >
> > > And I'm still not clear about how the Active flag can be used. You suggest that
> > it is intended as complementary to "an operator who uses his own probing".
> > What such "own probing" could be? Why would the operator use well-known
> > standard-based active OAM for fault management and performance
> > monitoring?
> > >
> > >
> > >
> > > …FB: draft-lapukhov-dataplane-probe-01 is an example of an operator’s
> > approach to probing. I’ve also seen deployments where the probing is integrated
> > with the application – i.e. part of the application solution, which is another
> > example domain where specific health checks are used.
> > >
> > >
> > >
> > > And, going back to the scenario in DC. I wonder why the well-known
> > Traceroute is not sufficient?
> > >
> > >
> > >
> > > …FB: In the scenario discussed below, detection speed was the driving factor –
> > the IOAM loopback solution gives you an indication of the failed link in less than
> > 1 RTT.
> >
> > Frank,
> >
> > I'm doubtful it would be practical to set loopback on every packet given the
> > amplification characteristic, which means that either it's done as a periodic
> > probe or on demand when the application has reason to suspect a failing link. In
> > either case, it seems like the latency to detect and identify a failing link would be
> > greater than 1 RTT. Am I missing something?
>
> Tom,
>
> you would not set loopback on every packet. Let me re-explain the deployment scenario:
>
> * Operator runs a custom application UDP probe - which makes probe traffic follow all paths the application uses.

Frank,

If the operator is tunneling everything IOAM could be probably
attached to every packet and active probing my not be needed. i.e. the
peer tunnel endpoint could reflect the forward path IOAM information
on packets in the reverse path of the tunnel. This motivates an
"endpoint reflect flag" that I mentioned previously.

> * On detecting failure of a specific probe for a specific connection, IOAM tracing is turned on with loopback for *that* connection.

I assume by connection you mean path in this context.

How is failure of a specific path determined? If just one probe is is
lost that be could the result of a transient condition like
congestion. It seems like multiple probes need to fail before link
failure should be suspected. So the time to detect a failed path might
be the period of sending a probe plus some time to observe the failure
of multiple probes. This might be several RTTs.

> * Once IOAM tracing is turned on, you can detect the node/link where traffic is stuck within one RTT. I.e. identification can be done in 1 RTT, once you detected the failure.
>
These probes might also be dropped due to transient conditions, so
once a candidate link is determined it might make sense to probe some
more to verify.

> So in other words, you only need the IOAM trace option with loopback added to a very small set of packets. In an ideal world even one packet would be sufficient.
>
But we don't live in an ideal world and "small set of packets" may be
relative :-). Consider a provider that has N possible paths that
applications follow and M intermediate nodes in each path. Now suppose
that there is common link in all paths that fails and that each prober
in step one of your algorithm detects the failed link. So the loopback
probes generate a flood of O(N*M) packets in the network. In a large
scale deployment N*M could be a large number to the extent that the
probes themselves create congestion in the network. There are some
classic examples similar to this where synchronized UDP probes have
resulted in bricking an application. The answer to this problem is
avoid synchronizing probes, but that probably means longer periods to
send the probe.

In any case, my point is that the whole time from when a link fails to
when a endpoint node is able to identify the failed link needs to be
taken into account when comparing loopback method and traceroute. A
single loopback or traceroute probe is just one component of the
algorithm above, so the net speedup we get from loopback may be
limited per Amdahl's law. It might be help to have some more specifics
on the link failure detection algorithm including some estimates about
the time required for the whole process and how multiple probers avoid
creating problems like congestion.

Thanks,
Tom

> Frank
>
> >
> > Tom
> >
> > >
> > >
> > >
> > > Cheers, Frank
> > >
> > >
> > >
> > > Regards,
> > >
> > > Greg
> > >
> > >
> > >
> > > On Thu, Aug 1, 2019 at 12:32 PM Frank Brockners (fbrockne)
> > <fbrockne@cisco.com> wrote:
> > >
> > >
> > > Some additional notes on the different flags - restating and expanding the
> > discussion we had at the WG meeting in Montreal:
> > >
> > > Loopback flag:
> > > The loopback flag was inspired by a specific use case, which could be
> > summarized as "rapid identification of a failed link/node in a DC": In a DC (read:
> > controlled/specific domain), one runs UDP probes (draft-lapukhov-dataplane-
> > probe-01) over a v6 fabric. In case a UDP probe detects a failure, one adds the
> > IOAM trace option and enables loopback mode - i.e. every node sends a copy
> > back to the source in addition to forwarding the packet. Correlating the
> > information from both ends allows one to pinpoint the failed node/link rapidly
> > and gives one a view of the overall forwarding topology. This use-case was
> > implemented in FD.io/VPP roughly 2 years ago and was also showcased at IETF
> > bits-n-bites. There is a rough outline of the open source implementation
> > available here: https://jira.fd.io/browse/VPP-471 .
> > > In more generic words: Loopback mode is like all IOAM, a domain specific
> > feature. Loopback mode is to enrich an existing (here the dataplane-probe)
> > active OAM mechanism.
> > > Reading through the comments below, it proves that the current draft is
> > indeed a good basis for the discussion and it also clearly shows that we need to
> > add a section to the document that expands on how loopback mode is expected
> > to be used.
> > >
> > > Immediate export flag:
> > > Per the WG discussion in Montreal - and the follow up breakout meeting
> > (https://mailarchive.ietf.org/arch/msg/ippm/Do9kJ9ED_grmTqwcZHSdpy3CmRk
> > ):
> > > The plan is to consolidate the IOAM-related content for a new "immediate
> > export option" from draft-song-ippm-postcard-based-telemetry-04 and the
> > description of the immediate export flag in draft-mizrahi-ippm-ioam-flags  into a
> > new draft.
> > >
> > > Active flag:
> > > The active flag is not to replace any existing active OAM mechanisms - but
> > rather allow an operator who uses his own probing along with IOAM to flag a
> > packet as a probe packet.
> > >
> > > Security considerations for flags in the context of PNF vs. VNF:
> > > Thanks for raising the point. It would be great to see specifics/details
> > discussed here on the list, so that those could be incorporated into the security
> > section.
> > >
> > > Thanks, Frank
> > >
> > > > -----Original Message-----
> > > > From: ippm <ippm-bounces@ietf.org> On Behalf Of Tom Herbert
> > > > Sent: Donnerstag, 1. August 2019 00:41
> > > > To: Greg Mirsky <gregimirsky@gmail.com>
> > > > Cc: IPPM Chairs <ippm-chairs@ietf.org>; IETF IPPM WG <ippm@ietf.org>
> > > > Subject: Re: [ippm] Adoption call for draft-mizrahi-ippm-ioam-flags Re:
> > > > Regarding draft-mizrahi-ippm-ioam-flags
> > > >
> > > > On Wed, Jul 31, 2019 at 11:53 AM Greg Mirsky <gregimirsky@gmail.com>
> > > > wrote:
> > > > >
> > > > > Dear Authors,
> > > > > thank you for bringing this proposal for the discussion. When
> > > > > considering WG
> > > > AP, I use the following criteria:
> > > > >
> > > > > is the document reasonably well-written; does it addresses a
> > > > > practical problem; is the proposed solution viable?
> > > > >
> > > > > On the first point, I commend you - the draft is easy to read.
> > > > > On the second point, I have several questions:
> > > > >
> > > > > What is the benefit of using Loopback flag in the Trace mode?
> > > >
> > > > This is unclear to me also. Additionally, I am concerned that
> > > > protocol blindly reflects the packet back to the source without any
> > > > regard to what else the packet contains. For instance, if a TCP
> > > > packet is reflected by ten intermediate nodes this is nonsensical.
> > > > The possibility of an amplification attack is obvious and in fact
> > > > mentioned in the security section, however I'm skeptical that the proposed
> > mitigation of rate limiting is sufficient.
> > > >
> > > > Minimally, it seems like the reflected packets should be wrapped in
> > > > ICMP to mitigate spoofing attacks. Also, I wonder if traceroute
> > > > methodology could be used for tracing, i.e. one sent packet results
> > > > in at most one return packet (ICMP), to mitigate the amplification problem.
> > > >
> > > > Tom
> > > >
> > > > > Why is it important to limit the applicability of Loopback to only Trace
> > mode?
> > > > > What is the benefit of collecting the same, as I understand the
> > > > > description,
> > > > data on the return path to the source?
> > > > > What is the benefit of using Active flag comparing to existing
> > > > > active OAM
> > > > protocols?
> > > > > What is the benefit of using Immediate flag comparing to
> > > > > Postcard-Based
> > > > Telemetry (PBT) proposal?
> > > > >
> > > > > On the third point, I'd appreciate your clarification on these points:
> > > > >
> > > > > In which transports (I find that iOAM encapsulation has been
> > > > > proposed for all
> > > > known transports) you've envisioned to use Loopback flag?
> > > > > The third bullet in Section 5 refers to a replica of the data
> > > > > packet that follows
> > > > the same path as the original packet. What controls that replication?
> > > > > The last paragraph in the Security Consideration section relies on
> > > > > "restricted
> > > > administrative domain" to mitigate the threat of malicious attacks
> > > > using a combination of iOAM extensions. That might be the case when
> > > > operating in a PNF environment, but it is much more challenging to
> > > > maintain such a trusted domain in VNF environment. How can these new
> > > > security risks be mitigated in a VNF environment?
> > > > >
> > > > > Appreciate your consideration and clarifications to my questions.
> > > > >
> > > > > Regards,
> > > > > Greg
> > > > >
> > > > > On Thu, Jul 25, 2019 at 2:07 PM Brian Trammell (IETF)
> > > > > <ietf@trammell.ch>
> > > > wrote:
> > > > >>
> > > > >> hi Greg,
> > > > >>
> > > > >> Thanks for the feedback; absolutely, we can do this the normal way.
> > Authors:
> > > > let's do a normal two-week adoption call for this document before
> > > > publishing the update.
> > > > >>
> > > > >> This adoption call starts now.
> > > > >>
> > > > >> IPPM, please respond to this message with an indication to the
> > > > >> mailing list of
> > > > your support for adopting draft-mizrahi-ippm-ioam-flags as a working
> > > > group document, in partial fulfillment of our charter milestone
> > > > "submit a Standards Track draft on inband OAM based measurement
> > methodologies to the IESG"
> > > > (obviously, depending on how many documents we end up sending to the
> > > > IESG, we may have to change the plurality of this milestone). If you
> > > > do not support this, please send a message to the list explaining why.
> > > > >>
> > > > >> Thanks, cheers,
> > > > >>
> > > > >> Brian (as IPPM co-chair)
> > > > >>
> > > > >>
> > > > >> > On 25 Jul 2019, at 13:15, Greg Mirsky <gregimirsky@gmail.com> wrote:
> > > > >> >
> > > > >> > Dear Chairs, et al.,
> > > > >> > I appreciate that editors of draft-ietf-ippm-ioam-data followed
> > > > >> > on the
> > > > decision of the WG reached at the meeting in Prague to extract
> > > > material not directly related to the definition of iOAM data
> > > > elements from the document. The new draft was presented earlier this
> > > > week and generated many comments. I feel that it would be right to
> > > > discuss the draft and its relevance to the charter of the IPPM WG before
> > starting WG adoption poll.
> > > > >> >
> > > > >> > Regards,
> > > > >> > Greg
> > > > >>
> > > > > _______________________________________________
> > > > > ippm mailing list
> > > > > ippm@ietf.org
> > > > > https://www.ietf.org/mailman/listinfo/ippm
> > > >
> > > > _______________________________________________
> > > > ippm mailing list
> > > > ippm@ietf.org
> > > > https://www.ietf.org/mailman/listinfo/ippm