Re: [mpls] Comment on draft-chen-mpls-p2mp-ingress-protection

Hi Alias,

Thank you for summarizing the response.

I agree that source-detect mode make sense. For other modes, if we want to preserve these modes and if we want to have deterministic detection mechanism, as Greg Mirsky has pointed out in other thread, exploring usage of ICCP might be a good idea.
-Nobo

From: Alia Atlas [mailto:akatlas@gmail.com]
Sent: Tuesday, December 17, 2013 12:20 PM
To: Nobo Akiya (nobo)
Cc: Huaimo Chen; draft-chen-mpls-p2mp-ingress-protection@tools.ietf.org; mpls@ietf.org
Subject: Re: [mpls] Comment on draft-chen-mpls-p2mp-ingress-protection

Hi Nobo,

Here's a quick summary response (which is why I am top-posting).

Yes, it is hard to correctly determine precisely what failure has occurred.  Doing so depends on the network topology, whether the CE can run BFD and so on.  The set of how to determine the failure is large and NOT specified in this draft.

What is specified in this draft are different "failure-detection modes" which need to be signaled so that the Backup Ingress and Merge Points know what it needs to do.  Each different mode is intended to address a different scenario.  Remember that what matters is whether traffic continues to flow and whether the repair might result in duplicate traffic.

For instance, if the CE determines that it can't send traffic to the Ingress - then traffic won't flow - and the CE can send the traffic to the Backup Ingress.  This is the source-detected mode.

Regards,
Alia

On Mon, Dec 16, 2013 at 6:01 PM, Nobo Akiya (nobo) <nobo@cisco.com<mailto:nobo@cisco.com>> wrote:
Hi Huaimo,

Thanks for response!

The fundamental thing is, node failure will fail relevant BFD in your proposal, but BFD failure may not mean node failure.

Please see comments inline.

> -----Original Message-----
> From: Huaimo Chen [mailto:huaimo.chen@huawei.com<mailto:huaimo.chen@huawei.com>]
> Sent: Sunday, December 15, 2013 9:01 PM
> To: Nobo Akiya (nobo); draft-chen-mpls-p2mp-ingress-
> protection@tools.ietf.org<mailto:protection@tools.ietf.org>
> Cc: mpls@ietf.org<mailto:mpls@ietf.org>
> Subject: RE: Comment on draft-chen-mpls-p2mp-ingress-protection
>
> Hi Nobo,
>
>     Thanks for your comments!
>     See my answers/explanations inline below.
>
> Best Regards,
> Huaimo
>
> -----Original Message-----
> From: mpls-bounces@ietf.org<mailto:mpls-bounces@ietf.org> [mailto:mpls-bounces@ietf.org<mailto:mpls-bounces@ietf.org>] On Behalf Of
> Nobo Akiya (nobo)
> Sent: Thursday, November 07, 2013 2:51 AM
> To: draft-chen-mpls-p2mp-ingress-protection@tools.ietf.org<mailto:draft-chen-mpls-p2mp-ingress-protection@tools.ietf.org>
> Cc: mpls@ietf.org<mailto:mpls@ietf.org>
> Subject: [mpls] Comment on draft-chen-mpls-p2mp-ingress-protection
>
> Hi Authors,
>
> I didn't get a chance to comment on your draft at the WG session today, so
> sending to the list. My concern is similar to what Greg Mirsky stated.
> Simply running 2 independent BFD sessions (as described in the slides) will
> have issues.
>
> Huaimo: In normal operations, source CE sends the traffic to the primary
> ingress, which imports the traffic into the primary LSP. It does not send the
> traffic to the backup ingress.
> When the primary ingress fails, the source CE will detect the failure of the
> primary ingress through the BFD between CE and the primary ingress and
> switches the traffic to the backup ingress. The backup ingress will also
> detect the failure of the primary ingress through the BFD between the
> backup ingress and the primary ingress, and put the traffic from the source
> CE into the backup LSP to the next hops of the primary ingress, where the
> traffic is merged into the primary LSP.
> When the link between the backup ingress and the primary ingress fails, the
> source CE will continue to send the traffic to the primary ingress, and does
> not send the traffic to the backup ingress. The traffic will be delivered to the
> destinations via the primary LSP.
> It seems that it works as expected.
> Can you give more details about "Simply running 2 independent BFD
> sessions (as described in the slides) will have issues."?
One example would be the case where BFD between CE and primary ingress fails but not BFD between primary ingress and backup ingress. Section 3.1 says:

[snip]
   The backup ingress does not import any traffic from the source into
   the backup LSP in normal operations.  When it detects a failure
   involving the primary ingress, it imports the traffic from the source
   into the backup LSP to the next hops of the primary ingress, where
   the traffic is merged into the primary LSP.
[snip]

In this case, CE will be sending traffic to backup ingress, but backup ingress isn't sending the traffic into backup LSP, since BFD from backup ingress to primary ingress is still up. Because proposal [currently] requires synchronized decision made by multiple devices making independent decisions, this is an area which should require further attention.

The detection which this proposal is really interested in is failures in following paths:
CE --> primary ingress --> primary LSP --> in-band at least to nexthop
Running multiple single-hop BFD doesn't accurately achieve this.

For the backup activation, you probably want to have one failure detection, or one node making the decision based on multiple failure detections ... ideally.

>
>
> > 3.  Ingress Failure Detection
> >
> >   Exactly how the failure of the ingress (e.g.  R1 in Figure 1) is
> >   detected is out of scope for this document.
>
> I believe, at least, definition of the "failure" should be defined in the draft.
> Without it, it can be interpreted by readers as complete node outage,
> outage of all involved links, outage of just primary-backup link, or even
> something else. And without defining what the "failure" is, it's difficult to
> figure out the right techniques to detect the failure. And that can easily
> result in deviating detection implementations for described solution to not
> kick off in expected manner.
>
> Huaimo: The primary ingress node failure in this draft is similar to the node
> failure in RFC 4090.
For non-ingress-protection, failure of a downstream from upstream perspective is clear: there is no need to differentiate between link failure and node failure. What is proposed in the document is a detection mechanism where (link failure & node alive) scenario can create inconsistent backup takeover. In addition, because you have multiple "detector" nodes, failure of a single-hop BFD cannot be assumed as a node failure. It can just be a link failure, or LC failure of one node which doesn't impact any corresponding traffic. I think you can see from above that there's a bit of difference between the "failure" which you want backup activated vs. "failure" which proposed detection mechanism detects. This is why I think it'll be beneficial to define what the "failure" is.

-Nobo

>
> -Nobo
>
> _______________________________________________
> mpls mailing list
> mpls@ietf.org<mailto:mpls@ietf.org>
> https://www.ietf.org/mailman/listinfo/mpls
_______________________________________________
mpls mailing list
mpls@ietf.org<mailto:mpls@ietf.org>
https://www.ietf.org/mailman/listinfo/mpls