Re: [mpls] proposed drafts for aligning MPLS-TP PSC linear protection protocol to transport requirements

Hi all,

I agree with the points raised by Huub. One further consideration, in a 
typical network without the exerciser the APS will essentially sit in an 
idle state sending "keep alive" messages for years, until a fault is 
detected on the working path.  At this time APS must "wake up" and execute 
a switch to protection. The purpose of the exerciser is to verify that the 
APS state machine has not suffered some obscure fault condition which 
otherwise would remain undetected until a second fault occurs.  It is 
these "silent failures" cause service outages.

Regards,

Malcolm

Huub van Helvoort <huubatwork@gmail.com> 
Sent by: mpls-bounces@ietf.org
22/07/2013 07:11 PM
Please respond to
huubatwork@gmail.com

To
"Eric Osborne (eosborne)" <eosborne@cisco.com>
cc
"Huub helvoort \(huub.van.helvoort@huawei.com\)" 
<huub.van.helvoort@huawei.com>, "mpls@ietf.org" <mpls@ietf.org>
Subject
Re: [mpls] proposed drafts for aligning MPLS-TP PSC linear protection 
protocol to transport requirements

Hello Eric,

You replied:

 > Inline with EO#, trimmed a bit.

My response inline [Huub2]

>>
>> There was atwo week ITU-T plenary meeting, and after that I took
>> (and still have) a holiday.
>
> If you're on holiday, why are you working?

[Huub2] It is too hot outside, and I had dug out my logbooks from
before 1984, I want to to return to storage as soon as possible  ^_^

>  I thought that was a uniquely American trait.

[Huub2] maybe it is infectious...

> Is this stuff that much fun to you? :)

[Huub2] my wife does not like the piles of logbooks and I like
to share my knowledge

> ...
>
>>> i) can you explain EXER at a higher level?  I'm not looking for a
>>> description of the state machine changes, and I'm not looking for the
>>> one line "It allows the FSM to be tested".  We have all of that in
>>> the draft and in the equivalent ITU specs.
>>>
>>> What I'd like to understand about EXER is where it came from.  The
>>> ITU specs that define it are pretty hard to follow, they seem to
>>> assume the reader already knows what EXER is and what problem it
>>> solves.  It feels very much like a mechanism used to catch a very
>>> specific implementation bug, back when transport gear was far less
>>> debuggable than what we have today.
>>
>> [Huub] EXER was not designed/intended to be used for bug finding
>> although it will detect problems with implementation.
>>
>> [Huub] EXER was designed to verify that the state-machine at the
>> far end is able to respond to APS/PSC messages it receives from
>> the local end.
>> Even though state-machines should be tested extensively, there is
>> no 100% warranty. It can still have stopped due to external
>> circumstances, be in a deadlock due to unforseen order of events,
>> etc.
>
> EO#  I agree with your last two statements.  To me, though, that
 > very text is a reasonable argument against EXER.
 > There is no mechanism within a protocol which can guarantee that
 > the entire state machine is 100% perfect.

[Huub2] indeed

> A node could be able to respond to EXER/RR but be unable to process
 > a real failure properly, either due to bug or to (as you indicate)
 > some unforseen combination of external events.

[Huub2] in the case you describe the node has to send periodical
NR _AND_ respond to the EXER while not responding to a real SF/SD,
this would be a very complex state-machine fault.

> The class of problems which EXER can catch but which will not be
 > detectable by other means (e.g. CC/CV, see below) seems pretty small.

[Huub2] the EXER will detect the most common "stuck-at" problems.

> In the transport world, what sort of problems does EXER _actually find_?
 >  I'm not asking about things it _could_ find, but things that it 
*does*.

[Huub2] stuck-at, deadlock

>> [Huub] note that APS/PSC should continue to operate even if no
>> control plane is available.
>
> EO#  I agree, but this is not relevant to the discussion at hand.
 > Lots of work was done in TP to ensure that it would function without
 > the ability to forward IP packets (which I think is what you mean by
 > 'no control plane').  None of that has anything to do with the set
 > of messages and states in PSC.

[Huub2] I did mean to say that the exercise request caanot be sent
via the control plane, so it should be in the dataplane. Since the
APS/PSC state-machine is exercised the EXER command should be part
of the APS/PSC protocol.

>> The EXER is to enable an operator to
>> take corrective action before a protection switch request fails
>> and the 50ms switch time is not met.
>>
>>> No other state machines that I'm familiar with (RSVP, LDP, BGP, OSPF,
>>> ISIS) have explicit signaling in them just to ask the neighbor
>>> whether it *would* be broken if if were, in the future, to be given a
>>> particular input.
>>
>> [Huub] All these rely on a control plane, some of then include
>> "keepalive" messages to see if the far end responds.
>
> EO#  PSC has keepalives; see section 4.1 of rfc6378 - "The purpose of
 > the continual messages is to verify that the PSC session is still 
alive."

[Huub2] it only proves that the state-machine can send NR priodically,
it does not prove that it wil respond to any external/remote events.
The EXER will give much more confidence of being alive.

> The next sentence says "If no valid PSC message is received, over a
 > period of several continual messages intervals, the last valid
 > received message remains applicable."  As an implementor, what that
 > means to me is that I time out only after the loss of a few
 > retransmissions.

[Huub2] yes, this is part if the PDU validation process

>>> Part of my reluctance to get behind EXER has been
>>> that I don't feel comfortable with the idea of keeping a 30-year-old
>>> workaround in a protocol.
>>
>> [Huub] it is NOT a workaround, it is an essential part of the
>> protocol.
>
> EO#  In a TDM world, I can see this point.  APS is carried in the
 > frame header, so the receipt of an APS message doesn't mean that
 > there's any intelligence behind it as it could just be the hardware
 > repeating the last APS overhead that it sent.  If a PSC message is
 > sent it must have been sent deliberately, as during steady state
 > we have periodic retransmissions.  Do you agree?

[Huub2] similarly hardware (or software) can contiouously send
NR messages. So I agree that there is no difference between TDM
and packet APS/PSC.

>>> Is there more to it than that?  Have I
>>> misread and misunderstood EXER?  Does modern transport gear ever
>>> actually detect a problem via EXER/RR that wasn't obvious to the
>>> operator using other means?
>>
>> [Huub] if there is no control plane I have no other means.
>> What means are available to verify if a state-machine that is
>> in a stable state is still functioning?
>
> EO#  Periodic retransmission of current state by the remote side.
 > This performs the exact same function as a keepalive in any other
 > protocol, and I think we agree that the keepalive function in
 > protocols such as OSPF is sufficient to ensure the sanity of the
 > remote end.

[Huub2] see above for the possibilty of stuck-at repeating only
the NR message.

> Many, many protocols use keepalives as a sort of belt-and-suspenders
 > failure detection mechanism.  In the IP world these mechanisms are
 > far, far less useful than they used to be as we now have BFD.

[Huub2] I would consider the "hello" message to have the same purpose
as the EXER message: check if the remote BFD session is still up.

> That brings us to CC/CV.   Lots of work was done to ensure that it
 > did not require IP to function.  I cannnot believe that any TP
 > implementation would ship without some sort of CC/CV, and isn't
 > that a strong enough mechanism to detect the failure of the remote
 > end?

[Huub2] CC/CV only checks the conductivity and connectivity of a path
It has no relation at all with the APS/PSC state-machine other than
in case CC/CV causes a signal fail defect the resulting SF event
triggers the APS/PSC state-machine.

Regards, Huub.

-- 
*****************************************************************
               请记住，你是独一无二的，就像其他每一个人一样
_______________________________________________
mpls mailing list
mpls@ietf.org
https://www.ietf.org/mailman/listinfo/mpls