Re: [mpls] Comment on draft-chen-mpls-p2mp-ingress-protection

Hi Nobo,

Here's a quick summary response (which is why I am top-posting).

Yes, it is hard to correctly determine precisely what failure has occurred.
 Doing so depends on the network topology, whether the CE can run BFD and
so on.  The set of how to determine the failure is large and NOT specified
in this draft.

What is specified in this draft are different "failure-detection modes"
which need to be signaled so that the Backup Ingress and Merge Points know
what it needs to do.  Each different mode is intended to address a
different scenario.  Remember that what matters is whether traffic
continues to flow and whether the repair might result in duplicate traffic.

For instance, if the CE determines that it can't send traffic to the
Ingress - then traffic won't flow - and the CE can send the traffic to the
Backup Ingress.  This is the source-detected mode.

Regards,
Alia

On Mon, Dec 16, 2013 at 6:01 PM, Nobo Akiya (nobo) <nobo@cisco.com> wrote:

> Hi Huaimo,
>
> Thanks for response!
>
> The fundamental thing is, node failure will fail relevant BFD in your
> proposal, but BFD failure may not mean node failure.
>
> Please see comments inline.
>
> > -----Original Message-----
> > From: Huaimo Chen [mailto:huaimo.chen@huawei.com]
> > Sent: Sunday, December 15, 2013 9:01 PM
> > To: Nobo Akiya (nobo); draft-chen-mpls-p2mp-ingress-
> > protection@tools.ietf.org
> > Cc: mpls@ietf.org
> > Subject: RE: Comment on draft-chen-mpls-p2mp-ingress-protection
> >
> > Hi Nobo,
> >
> >     Thanks for your comments!
> >     See my answers/explanations inline below.
> >
> > Best Regards,
> > Huaimo
> >
> > -----Original Message-----
> > From: mpls-bounces@ietf.org [mailto:mpls-bounces@ietf.org] On Behalf Of
> > Nobo Akiya (nobo)
> > Sent: Thursday, November 07, 2013 2:51 AM
> > To: draft-chen-mpls-p2mp-ingress-protection@tools.ietf.org
> > Cc: mpls@ietf.org
> > Subject: [mpls] Comment on draft-chen-mpls-p2mp-ingress-protection
> >
> > Hi Authors,
> >
> > I didn't get a chance to comment on your draft at the WG session today,
> so
> > sending to the list. My concern is similar to what Greg Mirsky stated.
> > Simply running 2 independent BFD sessions (as described in the slides)
> will
> > have issues.
> >
> > Huaimo: In normal operations, source CE sends the traffic to the primary
> > ingress, which imports the traffic into the primary LSP. It does not
> send the
> > traffic to the backup ingress.
> > When the primary ingress fails, the source CE will detect the failure of
> the
> > primary ingress through the BFD between CE and the primary ingress and
> > switches the traffic to the backup ingress. The backup ingress will also
> > detect the failure of the primary ingress through the BFD between the
> > backup ingress and the primary ingress, and put the traffic from the
> source
> > CE into the backup LSP to the next hops of the primary ingress, where the
> > traffic is merged into the primary LSP.
> > When the link between the backup ingress and the primary ingress fails,
> the
> > source CE will continue to send the traffic to the primary ingress, and
> does
> > not send the traffic to the backup ingress. The traffic will be
> delivered to the
> > destinations via the primary LSP.
> > It seems that it works as expected.
> > Can you give more details about "Simply running 2 independent BFD
> > sessions (as described in the slides) will have issues."?
>
> One example would be the case where BFD between CE and primary ingress
> fails but not BFD between primary ingress and backup ingress. Section 3.1
> says:
>
> [snip]
>    The backup ingress does not import any traffic from the source into
>    the backup LSP in normal operations.  When it detects a failure
>    involving the primary ingress, it imports the traffic from the source
>    into the backup LSP to the next hops of the primary ingress, where
>    the traffic is merged into the primary LSP.
> [snip]
>
> In this case, CE will be sending traffic to backup ingress, but backup
> ingress isn't sending the traffic into backup LSP, since BFD from backup
> ingress to primary ingress is still up. Because proposal [currently]
> requires synchronized decision made by multiple devices making independent
> decisions, this is an area which should require further attention.
>
> The detection which this proposal is really interested in is failures in
> following paths:
> CE --> primary ingress --> primary LSP --> in-band at least to nexthop
> Running multiple single-hop BFD doesn't accurately achieve this.
>
> For the backup activation, you probably want to have one failure
> detection, or one node making the decision based on multiple failure
> detections ... ideally.
>
> >
> >
> > > 3.  Ingress Failure Detection
> > >
> > >   Exactly how the failure of the ingress (e.g.  R1 in Figure 1) is
> > >   detected is out of scope for this document.
> >
> > I believe, at least, definition of the "failure" should be defined in
> the draft.
> > Without it, it can be interpreted by readers as complete node outage,
> > outage of all involved links, outage of just primary-backup link, or even
> > something else. And without defining what the "failure" is, it's
> difficult to
> > figure out the right techniques to detect the failure. And that can
> easily
> > result in deviating detection implementations for described solution to
> not
> > kick off in expected manner.
> >
> > Huaimo: The primary ingress node failure in this draft is similar to the
> node
> > failure in RFC 4090.
>
> For non-ingress-protection, failure of a downstream from upstream
> perspective is clear: there is no need to differentiate between link
> failure and node failure. What is proposed in the document is a detection
> mechanism where (link failure & node alive) scenario can create
> inconsistent backup takeover. In addition, because you have multiple
> "detector" nodes, failure of a single-hop BFD cannot be assumed as a node
> failure. It can just be a link failure, or LC failure of one node which
> doesn't impact any corresponding traffic. I think you can see from above
> that there's a bit of difference between the "failure" which you want
> backup activated vs. "failure" which proposed detection mechanism detects.
> This is why I think it'll be beneficial to define what the "failure" is.
>
> -Nobo
>
> >
> > -Nobo
> >
> > _______________________________________________
> > mpls mailing list
> > mpls@ietf.org
> > https://www.ietf.org/mailman/listinfo/mpls
> _______________________________________________
> mpls mailing list
> mpls@ietf.org
> https://www.ietf.org/mailman/listinfo/mpls
>