Re: [Mpls-interop] Local and Remote

All, first apologies for my behaviour on the MEAD team call on Tuesday,
showing frustration is not an acceptable way of holding a technical
discussion.

Looking at the chain of email I see signs of convergence but no easy way
to simple text.  I saw a comment from Sasha on the network management
requirements draft:

"Section 5.5.2 deals with alarm suppression and defined a MUST
requirement to suppress "superfluous alarms". However, the notion of the
superfluous alarm" is not defined (neither directly in the draft nor by
reference to other sources). Taking into account that there are more
than 50 messages discussing AIS/FDI on this list, it seems that the
superfluosity of an alarm is in the eye of the beholder..."

That triggered a thought that perhaps we have better way to describe the
problem:

Since we have very different backgrounds first some basic definitions:

Fault:  An unintended event that impacts a network.  Examples: hardware
failure; software bug; configuration error.

Defect:  The observed perturbation of a signal. e.g. Loss of CC
messages:  Loss of light (following a cable cut fault).

Primary defect:  The first defect that is observed as the result of a
fault.

Consequential defect:  Any other defect that is caused by the same
fault.

As an example consider a node that has a 40G OTN interface carrying
4xODU2, one ODU2 carrying GFP mapped MPLS-TP LSPs,  some MPLS-TP LSPs
transit a switch/crossconnect function and leave the node; Others
terminate locally on a PW end point. 

If the fiber carrying the 40G signal is cut adjacent to the node:

The primary defect would be Loss of optical signal.  
Loss of OTU frame, loss of ODU, loss of CC messages on the terminated
LSP, loss of CC messages on the terminated PW are all be consequential
defects observed in this "local" node.

At some (remote) node the LSP's and the PW's that transited the node
adjacent to the fiber cut will be terminated.  Assuming no
recovery/repair action has taken place, then a loss of CC defect will be
observed for both the LSP and PW at this (remote) node.  These are
consequential defects.

So perhaps we could (in the OAM requirements document) include a
requirement that states:

It MUST be possible to differentiate between primary and consequential
defects.  The OAM toolkit MUST support mechanisms to support this
differentiation.

Comments:
1) This includes an implicit requirement that defect information is
passed from a server layer to a client layer within a node.
2) The (simple) example about is only to illustrate the concepts, many
other scenarios will need to be analyzed to determine if a proposed
solution adequately addresses the requirement.
3) If we take this approach do we need definitions for fault,
primary/consequential defect.

In the NM requirements we could state:

It MUST be possible to suppress alarms that result from consequential
defects.

Hope this helps.

Malcolm Betts
Nortel Networks
Phone: +1 613 763 7860 (ESN 393)
email: betts01@nortel.com

-----Original Message-----
From: mpls-interop-bounces@ietf.org
[mailto:mpls-interop-bounces@ietf.org] On Behalf Of Adrian Farrel
Sent: Wednesday, May 06, 2009 10:18 AM
To: Sprecher, Nurit (NSN - IL/Hod HaSharon); mpls-interop@ietf.org
Subject: Re: [Mpls-interop] Local and Remote

Hi,

> * We have a requirement in the document for "Remote Defect Indication"

> and it specifies that "The MPLS-TP OAM toolset MUST provide a function

> to enable an End Point to notify its associated End Point of the 
> detection of a fault or defect that it detects on a PW, LSP or Section

> between them." I.e. in the requirement refers to remote as to the 
> associated endpoint at the same layer!

Why the exclamation point?

The associated end-point at the same layer is certainly not local. Not
local = remote.

The defect indication comes from somewhere else (not local). That means
it is a remote indication.

> *  I would like to refer to the last paragraph in your e-mail:
>> When a server layer provides a connection that is used as a link in 
>> the client layer, a server layer fault (that is remote in the server 
>> layer) may be reported to the client layer as a fault in the client 
>> layer link (that is local in the client
>> layer)
> Assuming that we have an LSP that traversed nodes A-B-C-D-E-F-G and we

> have a fault in the link between D and E. According to the above the 
> server layer needs to notify the client layer.

Well, I said "may," but I agree it would be useful.

> Is it to the client layer in nodes D and E or to the endpoints of the 
> client layers, i.e. nodes A and G?

To D and E.
The server layer is only providing connectivity between D and E.
It has (and should have) no knowledge of the client services carried
over the connection.
As far as the server layer is concerned, it provides a connection from D
to E.
The client layer decides to use this connection as the link D-E in the
client layer, and decides to route the LSP AG over that link.

> If it is node D and E (this is local) they need now to notify nodes A 
> and G

First question is "why?"

It may help to think of the client layer network as a single layer
network and resolve the issues there first.

Let us suppose that the fault was in the client layer at node D. How
would A and G find out about the fault?

Now suppose that the fault is in the client layer link D-E. How would A
and G find out about the fault?

Now we can come back to your question. The fault is in the server layer
between D and E. It is reported to D and E. D and E map this to a client
layer fault on the link D-E. How would A and G find out about the fault?

> but according to the requirement document and to the framework they 
> may be MIPs of the LSPs but they cannot initiate OAM messages. 
> Therefore, it seems that the endpoint of the server layer needs to 
> notify the endpoint of the client layers of the fault (remote). And 
> IMO we need to be clear about the requirement. And also to extend the 
> definition in the first bullet for remote (not necessarily endpoint).

The answer is that the end points of the link (in the client layer) or
the connection (in the server layer) may be MEPs. When they detect
faults they raise alarms to their management systems.

When the fault is detected by the end points (as surely it will be) they
can perform fault localisation to isolate the fault. That is, the MEPs
can consult the MIPs and (since the MIPs are allowed to respond)
determine where the fault is.

If, in the opinion of the deployer, the end points of an LSP need to
know about the location of a fault at detection time (I have serious
doubts about
this) then he can designate every node along the path as a MEP.

> * Assuming that the failure is in the first (server) link. As a result

> we have loss of continuity in the LSPs that transmit over this failed 
> link and in the PWs that are attached to these LSPs. As I understand 
> from your definition, as these fault are detected by the endpoint (to 
> which the link is attached), all these faults are considered locals!

No.
Please read again.

The fault is local if the fault exists in a resource that is local (in
the layer that is detecting the fault).

In your example, G detects a fault in the LSP, but G does not defect a
fault in the link FG or at the node G. Therefore the fault is not local.

Still in your example, E detects a fault in the server connection DE
(say DxyzE).
- If the fault is in E or in the link zE, the fault is local in the
server layer
- If the fault is not in E or in the link zE, the fault is not local in
the server layer In *both* cases, once the fault has been reported to
the client layer, the fault is local in the client layer (it is a
failure of the link DE).

May help you to note that local and remote are relative terms. I.e.
"local to E" is not the same as "local to G".

> Is there still a requirement to distinguish between these failures? 
> Note that all of these are now local.

Yes, there is a requirement to distinguish.
No, they are not all local.

I would just like to point out that we are now very far into
implementation and not standardisation. It is not the job of the IETF
(in my personal
opinion) to tell people how to build boxes.

Thanks,
Adrian

-----Original Message-----
From: mpls-interop-bounces@ietf.org
[mailto:mpls-interop-bounces@ietf.org] On Behalf Of ext Adrian Farrel
Sent: Wednesday, May 06, 2009 3:16 PM
To: mpls-interop@ietf.org
Subject: [Mpls-interop] Local and Remote

I know that Malcolm and Huub are going to work on the definition of
local

and remote.

Huub usefully typed during the meeting yesterday that "remote" might be

better stated as "non-local."

Watching some emails this morning, I wonder whether part of the
confusion arises from
- where the fault is detected
- where the fault exists

A node can (IMHO) only be detected locally. That is local to the point
of detection.

A fault may be reported (to the management plane) from the node of
detection (a local report), or may be signaled to another node (through
the OAM or control plane) and reported (to the management plane) from
that other node (a remote report).

The fault that is detected may be in the detecting node, or in a
resource directly connected to that node (such as a link). This is a
local fault.

But the fault may also exist in a node or resource that is not local to
the detecting node. This is a remote fault.

If an LSP continuity check fails, the fault is detected by an end point

(local detection), but the fault might be somewhere out in the network

(remote fault).

When a server layer provides a connection that is used as a link in the

client layer, a server layer fault (that is remote in the server layer)
may

be reported to the client layer as a fault in the client layer link
(that is

local in the client layer). The client layer does not know about the
route

or resources used in the server layer (we MUST assume clean layer

separation) so the client layer does not need to know about the location
of

the server layer remote fault. If the client layer requests the server
layer

to repair the client layer link (i.e. to repair the server layer
connection)

the server layer may need to consider the location of the server layer

fault.

Am I wrong?

Is this really so complicated?

Thanks,

Adrian

_______________________________________________

Mpls-interop mailing list

Mpls-interop@ietf.org

https://www.ietf.org/mailman/listinfo/mpls-interop

_______________________________________________
Mpls-interop mailing list
Mpls-interop@ietf.org
https://www.ietf.org/mailman/listinfo/mpls-interop