RE: [PWE3] BFD for MPLS PWs

Luca,

Your entire email suggests that all that OAM stuff isn't really that critical. To some extent I even agree with you. When you run MPLS over SONET or SDH, the probability of a pure MPLS or PW failure is rare and does not warrant a heavy investment in PW OAM. If I were an operator, I don't think that I would run BFD over every PW.

However, your comment about ATM not being used as the Internet Protocol is nonsense. The Internet is by design a connectionless network with completely different properties than the services for which ATM was designed. There are many different network architectures and many different services and for some of these OAM implementations are required that do not apply to the Internet.

The irony of the situation is that even though you don't see much value in OAM, you are proposing solutions that complicate the implementation of it: you believe that in many cases defects should be reported twice, once via in-band notifications and once via PW Status.

My proposal is to define OAM in way that is simple as possible. For each defect there is a well-defined, minimal set of consequent actions. That seems the obvious way to minimize the burden that OAM could potentially imply. One would think that you would resonate with that. Yet, for some reason, and I don't understand why, you are fighting it.

I can't prevent you from writing emails to explain what you believe is the best solution, but it is worth considering that several of us have spent a lot of time to define the OAM Message Mapping draft. Therefore, if you believe that there is a fundamentally better solution, I would rather see that you work it out in a document with sufficient detail so that we can compare apples with apples.

Peter

 -----Original Message-----
From: Luca Martini [mailto:lmartini@cisco.com]
Sent: Tuesday, August 22, 2006 6:47 PM
To: Busschbach, Peter B (Peter)
Cc: Swallow George; 'Thomas D. Nadeau'; Pignataro Carlos; Morrow Monique; pwe3 WG ((((((((E-mail)))))))); Danny McPherson; Agarwal Rahul; Stewart (stbryant) Bryant
Subject: Re: [PWE3] BFD for MPLS PWs

Busschbach, Peter B (Peter) wrote: 

Luca,

I would suggest that you look at the e-mail archive. If I remember 

correctly there were strong opinions about making the PW 

status messages mandatory.

I remember that there was a strong consensus to use PW Status instead of Label Withdraw. In that sense PW Status is indeed mandatory. However, I don't remember ever having seen a discussion about the mandatory use of PW Status for *every* single PW and AC defect. It would be helpful if you could identify that email exchange.

Further comments in-line.

Peter

-----Original Message-----

From: Luca Martini [ mailto:lmartini@cisco.com <mailto:lmartini@cisco.com> ]

Sent: Friday, August 18, 2006 6:15 PM

To: Busschbach, Peter B (Peter)

Cc: Swallow George; 'Thomas D. Nadeau'; Pignataro Carlos; Morrow

Monique; pwe3 WG ((((((((E-mail)))))))); Danny McPherson; 

Agarwal Rahul;

Stewart (stbryant) Bryant

Subject: Re: [PWE3] BFD for MPLS PWs

Busschbach, Peter B (Peter) wrote:

To move the discussion forward, let me follow up on my own question.

I believe that there is a broad concensus that insertion of 

AIS, as mandated by draft-ietf-pwe3-atm-encap-11, implies 

that the PE will NOT send an additional PW Status message to 

report the same defect.

Here I am assuming you mean insertion of an AIS alarm at the 

PE toward 

the local attachment circuit.

No. My example was about LOS, which is an AC defect, which according to draft-ietf-pwe3-atm-encap-11 triggers the PE to insert F4 AIS over the PW.

We should also send a status message with status "0x00000008 - Local 

PSN-facing PW (ingress) Receive Fault "  fault in this case.

Remember that the MPLS path might have gone down completely because 

somewhere a router was mis-configured , and can now only 

forward IP packets.

This message would be immediate , and much faster then any VCCV path 

fault detection scheme.

You seem to be confusing fault detection and fault notification. My example was about AC defects, but let's look at the PW defects that you are addressing.

no , I wanted to understand which kind of fault are you refering to: AC or or PSN faults.

If for some reason the MPLS path goes down, you need some mechanism to find out that that is the case. You could do this via BFD or Y.1711, or you could wait until you receive an RSVP error notification. Since, according to your assumption, the control plane stays up, a PE will not 

or more likely receive an OSPF/ISIS route update , or an LDP label withdraw.

detect the failure through an LDP session failure. PW Status only served to inform a PE about a defect detected by its peer. But then the peer 

Detecting LDP session failures is always a last resort. I have experienced this only once in 5 year of running  a real network. THere was always some other event first.

needs a way to detect the defect. Two PEs that don't do anything else than sending each other PW Staus messages will never detect a defect.

Defects are communicated by the network. Some people like the circuit based architectures and want every circuit to monitor and communicate defects. 
This was the ATM concept, and clearly it did not work as we are not running ATM as the Internet Protocol.

Is PW Status faster? It depends. Let's assume you use BFD for failure detection. A PE enters the defect state at expiry of the control detection time. In the next control packet it sends to its peer, it indicates that it entered the defect state. Since the two sessions are asynchronous, there may be a time lag between defect detection in PE1 and (BFD) notification to PE2, but that lag is smaller than the timer value. 

Consider two scenarios: in the first case, the operator requires sub-50-ms failure detection and restoration and therefore provisions use of a 10-ms timer. Within 10 ms after failure detection both sides know about the defect. With PW Status, that 

A 10 ms timer is not realistic. Anything below 10 seconds will not scale in a real network. Why would one spend so many resources to protect against possible bugs ? 
Under normal operation you will get a notification from the PSN that something has gone down. 

is hard to achieve, especially since an MPLS failure may bring hundreds of PWs down, each of which would trigger transmission of a PW Status message.

Ok, and generating hundreds of ATm AIS alarms is going to be easier ? No. it would take far more resources. If this is your concern , I suggest using the Grouping TLV ( or group ID ), and wind card PW status messages. this mechanism was designed specifically for ATM to improve the down time response.

Alternatively, consider that the operator provisions 10 minute timers. In that case, it might take several minutes before PE1 informs PE2 about the defect and the use of PW Status would result in a much faster failure notification. But in this case, what is the point? If it is acceptable that defect detection might take 30 minutes , it does not seem necessary to inform the peer PE about the defect within a much smaller time interval.

I think that it might very well be acceptable that a very rare , bug related , network defect be deteced more slowy. Especially since 99.9999% of the cases this will not happen.

RFC 4447 states "The PW status signaling procedures 

described in this section MUST be fully implemented." The 

document itself, however, specifies HOW to use PW Status 

signaling but is vague about WHEN to use PW Status signaling. 

The latter is addressed in the encapsulation drafts, such as 

atm-encap, and is fully defined in OAM-MAP 

(draft-ietf-pwe3-oam-msg-map-04).

This is not my interpretation. When we say procedures 

described in this 

section MUST be fully implemented, implies that the protocol 

will send 

and receive status messages when appropriate. The trigger to send the 

status messages is attachment circuit specific, and therefore is  

described in the encapsulation drafts.

But the atm-encapsulation draft that I referenced specifies transmission of F4 AIS, not of PW Status. 

That is an omission that we can still fix. Matthew had at one point suggested that we remove this entire section.

The notion that RFC 4447 should be interpreted as a mandate 

to send a PW Status message for every single defect is an 

incorrect interpretation of the spirit of the standard, does 

not reflect WG concensus and, if implemented, would lead to 

inefficient implementations.

I disagree, and I have seen no indication that the WG interpreted the 

rfc4447 in this fashion.

Perhaps you should read OAM-MSG-MAP. It clearly shows that the people who worked on OAM never thought that PW Status would be transmitted for every single PW and AC defect.

I realize this , and we need to resolve the problem.

I will summarize what I believe is the best solution in a separate e-mail.

Thanks.
Luca

Therefore, I strongly disagree with the point of view that 

Tom and Luca have formulated regarding the use of BFD for 

both fault detection and status signaling. When this option 

is used, there is, IMO, no need to send PW Status messages. 

The current text of OAM-MAP is in line with this view.

I would suggest that you look at the e-mail archive. If I remember 

correctly there were strong opinions about making the PW 

status messages 

mandatory.

The current text OAM-MAP needs to be changed.

Luca

Peter

-----Original Message-----

From: Busschbach, Peter B (Peter) [ mailto:busschbach@lucent.com <mailto:busschbach@lucent.com> ]

Sent: Wednesday, August 16, 2006 4:40 PM

To: 'Luca Martini'

Cc: Swallow George; 'Thomas D. Nadeau'; Pignataro Carlos; Morrow

Monique; pwe3 WG ((((((((E-mail)))))))); Danny McPherson; 

Agarwal Rahul;

Stewart (stbryant) Bryant

Subject: RE: [PWE3] BFD for MPLS PWs

Luca,

Section 7.4 of the ATM Encapsulation draft 

(draft-ietf-pwe3-atm-encap-11.txt) mandates that upon LOS the 

ingress PE inserts F4 AIS for every affected VPC. Is it your 

opinion that because of RFC 4447 the ingress PE must send a 

PW Status message in addition to F AIS insertion?

Peter

-----Original Message-----

From: Luca Martini [ mailto:lmartini@cisco.com <mailto:lmartini@cisco.com> ]

Sent: Tuesday, August 15, 2006 6:29 PM

To: Busschbach, Peter B (Peter)

Cc: 'Thomas D. Nadeau'; Swallow George; Pignataro Carlos; Morrow

Monique; pwe3 WG ((((((((E-mail)))))))); Danny McPherson; 

Agarwal Rahul;

Stewart (stbryant) Bryant

Subject: Re: [PWE3] BFD for MPLS PWs

Busschbach, Peter B (Peter) wrote:

In my opinion, the work on VCCV and OAM-MAP provides a 

further specification of RFC4447 and in fact overrules the 

requirement that LDP Status signaling must be used.

Peter, 

We agreed a long time ago that the LDP status messaging was 

going to be mandatory , in fact people insisted I make it 

mandatory.

So I would have to say that the LDP status MUST always be 

used as mandated in RFC 4447, and the BFD status messaging is 

an optional, and , in my opinion, not a very useful option 

when LDP is in use.

Luca

A minor comment on your email:

I am not sure what you mean by the "de-facto status must 

always rely on LDP". Either you declare that that the PW is 

down when (1) BFD OR LSP indicate a PW failure, or (2) based 

only on the LDP status. In my mind "de-facto status" implies 

(2), whereas the beginning of your email says that (1) is the 

proposed procedure.

Peter

_______________________________________________

pwe3 mailing list

pwe3@ietf.org <mailto:pwe3@ietf.org> 

https://www1.ietf.org/mailman/listinfo/pwe3 <https://www1.ietf.org/mailman/listinfo/pwe3> 

_______________________________________________

pwe3 mailing list

pwe3@ietf.org <mailto:pwe3@ietf.org> 

https://www1.ietf.org/mailman/listinfo/pwe3 <https://www1.ietf.org/mailman/listinfo/pwe3>