Re: [mpls] proposed drafts for aligning MPLS-TP PSC linear protection protocol to transport requirements

Malcolm.BETTS@zte.com.cn Wed, 24 July 2013 15:24 UTC

Return-Path: <Malcolm.BETTS@zte.com.cn>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DBBFC11E8248; Wed, 24 Jul 2013 08:24:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -95.043
X-Spam-Level:
X-Spam-Status: No, score=-95.043 tagged_above=-999 required=5 tests=[AWL=1.493, BAYES_20=-0.74, HTML_MESSAGE=0.001, MIME_BASE64_TEXT=1.753, MIME_CHARSET_FARAWAY=2.45, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2mevPLF3GNQx; Wed, 24 Jul 2013 08:24:13 -0700 (PDT)
Received: from zte.com.cn (mx6.zte.com.cn [95.130.199.165]) by ietfa.amsl.com (Postfix) with ESMTP id 0B05211E8153; Wed, 24 Jul 2013 08:23:06 -0700 (PDT)
Received: from zte.com.cn (unknown [192.168.168.119]) by Websense Email Security Gateway with ESMTP id EF10E9211D; Wed, 24 Jul 2013 23:22:28 +0800 (CST)
Received: from mse02.zte.com.cn (unknown [10.30.3.21]) by Websense Email Security Gateway with ESMTPS id 15E7C757218; Wed, 24 Jul 2013 23:22:27 +0800 (CST)
Received: from notes_smtp.zte.com.cn ([10.30.1.239]) by mse02.zte.com.cn with ESMTP id r6OFMbfc000167; Wed, 24 Jul 2013 23:22:37 +0800 (GMT-8) (envelope-from Malcolm.BETTS@zte.com.cn)
In-Reply-To: <51EDBC18.4060104@gmail.com>
References: <22257C41A415324A984CD03D63344E271F1B7A8B@TELMBA002RM001.telecomitalia.local> <20ECF67871905846A80F77F8F4A2757210288D02@xmb-rcd-x09.cisco.com> <51ED1541.70108@gmail.com> <20ECF67871905846A80F77F8F4A275721028A463@xmb-rcd-x09.cisco.com> <51EDBC18.4060104@gmail.com>
To: huubatwork@gmail.com
MIME-Version: 1.0
X-KeepSent: BEC3442D:1D2724E1-85257BB2:0053FEB9; type=4; name=$KeepSent
X-Mailer: Lotus Notes Release 8.5.1 September 28, 2009
Message-ID: <OFBEC3442D.1D2724E1-ON85257BB2.0053FEB9-85257BB2.00547E57@zte.com.cn>
From: Malcolm.BETTS@zte.com.cn
Date: Wed, 24 Jul 2013 11:22:33 -0400
X-MIMETrack: Serialize by Router on notes_smtp/zte_ltd(Release 8.5.3FP1 HF212|May 23, 2012) at 2013-07-24 23:22:36, Serialize complete at 2013-07-24 23:22:36
Content-Type: multipart/alternative; boundary="=_alternative 00547E5685257BB2_="
X-MAIL: mse02.zte.com.cn r6OFMbfc000167
Cc: "Huub helvoort (huub.van.helvoort@huawei.com)" <huub.van.helvoort@huawei.com>, "mpls@ietf.org" <mpls@ietf.org>, mpls-bounces@ietf.org
Subject: Re: [mpls] proposed drafts for aligning MPLS-TP PSC linear protection protocol to transport requirements
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2013 15:24:18 -0000

Hi all,

I agree with the points raised by Huub. One further consideration, in a 
typical network without the exerciser the APS will essentially sit in an 
idle state sending "keep alive" messages for years, until a fault is 
detected on the working path.  At this time APS must "wake up" and execute 
a switch to protection. The purpose of the exerciser is to verify that the 
APS state machine has not suffered some obscure fault condition which 
otherwise would remain undetected until a second fault occurs.  It is 
these "silent failures" cause service outages.

Regards,

Malcolm




Huub van Helvoort <huubatwork@gmail.com> 
Sent by: mpls-bounces@ietf.org
22/07/2013 07:11 PM
Please respond to
huubatwork@gmail.com


To
"Eric Osborne (eosborne)" <eosborne@cisco.com>
cc
"Huub helvoort \(huub.van.helvoort@huawei.com\)" 
<huub.van.helvoort@huawei.com>, "mpls@ietf.org" <mpls@ietf.org>
Subject
Re: [mpls] proposed drafts for aligning MPLS-TP PSC linear protection 
protocol to transport requirements






Hello Eric,

You replied:

 > Inline with EO#, trimmed a bit.

My response inline [Huub2]

>>
>> There was atwo week ITU-T plenary meeting, and after that I took
>> (and still have) a holiday.
>
> If you're on holiday, why are you working?

[Huub2] It is too hot outside, and I had dug out my logbooks from
before 1984, I want to to return to storage as soon as possible  ^_^

>  I thought that was a uniquely American trait.

[Huub2] maybe it is infectious...

> Is this stuff that much fun to you? :)

[Huub2] my wife does not like the piles of logbooks and I like
to share my knowledge

> ...
>
>>> i) can you explain EXER at a higher level?  I'm not looking for a
>>> description of the state machine changes, and I'm not looking for the
>>> one line "It allows the FSM to be tested".  We have all of that in
>>> the draft and in the equivalent ITU specs.
>>>
>>> What I'd like to understand about EXER is where it came from.  The
>>> ITU specs that define it are pretty hard to follow, they seem to
>>> assume the reader already knows what EXER is and what problem it
>>> solves.  It feels very much like a mechanism used to catch a very
>>> specific implementation bug, back when transport gear was far less
>>> debuggable than what we have today.
>>
>> [Huub] EXER was not designed/intended to be used for bug finding
>> although it will detect problems with implementation.
>>
>> [Huub] EXER was designed to verify that the state-machine at the
>> far end is able to respond to APS/PSC messages it receives from
>> the local end.
>> Even though state-machines should be tested extensively, there is
>> no 100% warranty. It can still have stopped due to external
>> circumstances, be in a deadlock due to unforseen order of events,
>> etc.
>
> EO#  I agree with your last two statements.  To me, though, that
 > very text is a reasonable argument against EXER.
 > There is no mechanism within a protocol which can guarantee that
 > the entire state machine is 100% perfect.

[Huub2] indeed

> A node could be able to respond to EXER/RR but be unable to process
 > a real failure properly, either due to bug or to (as you indicate)
 > some unforseen combination of external events.

[Huub2] in the case you describe the node has to send periodical
NR _AND_ respond to the EXER while not responding to a real SF/SD,
this would be a very complex state-machine fault.

> The class of problems which EXER can catch but which will not be
 > detectable by other means (e.g. CC/CV, see below) seems pretty small.

[Huub2] the EXER will detect the most common "stuck-at" problems.

> In the transport world, what sort of problems does EXER _actually find_?
 >  I'm not asking about things it _could_ find, but things that it 
*does*.

[Huub2] stuck-at, deadlock

>> [Huub] note that APS/PSC should continue to operate even if no
>> control plane is available.
>
> EO#  I agree, but this is not relevant to the discussion at hand.
 > Lots of work was done in TP to ensure that it would function without
 > the ability to forward IP packets (which I think is what you mean by
 > 'no control plane').  None of that has anything to do with the set
 > of messages and states in PSC.

[Huub2] I did mean to say that the exercise request caanot be sent
via the control plane, so it should be in the dataplane. Since the
APS/PSC state-machine is exercised the EXER command should be part
of the APS/PSC protocol.

>> The EXER is to enable an operator to
>> take corrective action before a protection switch request fails
>> and the 50ms switch time is not met.
>>
>>> No other state machines that I'm familiar with (RSVP, LDP, BGP, OSPF,
>>> ISIS) have explicit signaling in them just to ask the neighbor
>>> whether it *would* be broken if if were, in the future, to be given a
>>> particular input.
>>
>> [Huub] All these rely on a control plane, some of then include
>> "keepalive" messages to see if the far end responds.
>
> EO#  PSC has keepalives; see section 4.1 of rfc6378 - "The purpose of
 > the continual messages is to verify that the PSC session is still 
alive."

[Huub2] it only proves that the state-machine can send NR priodically,
it does not prove that it wil respond to any external/remote events.
The EXER will give much more confidence of being alive.

> The next sentence says "If no valid PSC message is received, over a
 > period of several continual messages intervals, the last valid
 > received message remains applicable."  As an implementor, what that
 > means to me is that I time out only after the loss of a few
 > retransmissions.

[Huub2] yes, this is part if the PDU validation process

>>> Part of my reluctance to get behind EXER has been
>>> that I don't feel comfortable with the idea of keeping a 30-year-old
>>> workaround in a protocol.
>>
>> [Huub] it is NOT a workaround, it is an essential part of the
>> protocol.
>
> EO#  In a TDM world, I can see this point.  APS is carried in the
 > frame header, so the receipt of an APS message doesn't mean that
 > there's any intelligence behind it as it could just be the hardware
 > repeating the last APS overhead that it sent.  If a PSC message is
 > sent it must have been sent deliberately, as during steady state
 > we have periodic retransmissions.  Do you agree?

[Huub2] similarly hardware (or software) can contiouously send
NR messages. So I agree that there is no difference between TDM
and packet APS/PSC.

>>> Is there more to it than that?  Have I
>>> misread and misunderstood EXER?  Does modern transport gear ever
>>> actually detect a problem via EXER/RR that wasn't obvious to the
>>> operator using other means?
>>
>> [Huub] if there is no control plane I have no other means.
>> What means are available to verify if a state-machine that is
>> in a stable state is still functioning?
>
> EO#  Periodic retransmission of current state by the remote side.
 > This performs the exact same function as a keepalive in any other
 > protocol, and I think we agree that the keepalive function in
 > protocols such as OSPF is sufficient to ensure the sanity of the
 > remote end.

[Huub2] see above for the possibilty of stuck-at repeating only
the NR message.

> Many, many protocols use keepalives as a sort of belt-and-suspenders
 > failure detection mechanism.  In the IP world these mechanisms are
 > far, far less useful than they used to be as we now have BFD.

[Huub2] I would consider the "hello" message to have the same purpose
as the EXER message: check if the remote BFD session is still up.

> That brings us to CC/CV.   Lots of work was done to ensure that it
 > did not require IP to function.  I cannnot believe that any TP
 > implementation would ship without some sort of CC/CV, and isn't
 > that a strong enough mechanism to detect the failure of the remote
 > end?

[Huub2] CC/CV only checks the conductivity and connectivity of a path
It has no relation at all with the APS/PSC state-machine other than
in case CC/CV causes a signal fail defect the resulting SF event
triggers the APS/PSC state-machine.

Regards, Huub.



-- 
*****************************************************************
               请记住,你是独一无二的,就像其他每一个人一样
_______________________________________________
mpls mailing list
mpls@ietf.org
https://www.ietf.org/mailman/listinfo/mpls