Updated draft response to OIF on 1:n protection

"Adrian Farrel" <adrian@olddog.co.uk> Thu, 08 June 2006 20:44 UTC

Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1FoRMZ-0006yU-72 for ccamp-archive@ietf.org; Thu, 08 Jun 2006 16:44:19 -0400
Received: from psg.com ([147.28.0.62]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FoRMW-000234-UH for ccamp-archive@ietf.org; Thu, 08 Jun 2006 16:44:19 -0400
Received: from majordom by psg.com with local (Exim 4.60 (FreeBSD)) (envelope-from <owner-ccamp@ops.ietf.org>) id 1FoREX-0003zq-DG for ccamp-data@psg.com; Thu, 08 Jun 2006 20:36:01 +0000
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on psg.com
X-Spam-Level:
X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, FORGED_RCVD_HELO autolearn=ham version=3.1.1
Received: from [80.68.34.49] (helo=mail2.noc.data.net.uk) by psg.com with esmtp (Exim 4.60 (FreeBSD)) (envelope-from <adrian@olddog.co.uk>) id 1FoREU-0003zW-Jv for ccamp@ops.ietf.org; Thu, 08 Jun 2006 20:35:59 +0000
Received: from 57-99.dsl.data.net.uk ([80.68.57.99] helo=cortex.aria-networks.com) by mail2.noc.data.net.uk with esmtp (Exim 3.36 #1) id 1FoREP-0004W3-00 for ccamp@ops.ietf.org; Thu, 08 Jun 2006 21:35:53 +0100
Received: from your029b8cecfe ([217.158.132.194] RDNS failed) by cortex.aria-networks.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 8 Jun 2006 21:35:48 +0100
Message-ID: <005b01c68b3b$191d9be0$c2849ed9@your029b8cecfe>
Reply-To: Adrian Farrel <adrian@olddog.co.uk>
From: Adrian Farrel <adrian@olddog.co.uk>
To: ccamp@ops.ietf.org
Subject: Updated draft response to OIF on 1:n protection
Date: Thu, 08 Jun 2006 20:43:50 +0100
Organization: Old Dog Consulting
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2180
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
X-OriginalArrivalTime: 08 Jun 2006 20:35:49.0162 (UTC) FILETIME=[2085F8A0:01C68B3B]
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 919b3965bd46f7460d234f848680b238

Hi,

Looking at the latest input from the OIF, I think we have to perform
some type of meld on the two incoming communications and respond
to them together.

Here is my attempt building on what we had before. Please comment on or
off the list.

Thanks,

Adrian

==

Dear Jim,

Thank you for your communication to CCAMP on the use of GMPLS to provide
1:n protection at the OIF UNI and the OIF E-NNI dated 20th May 2006 and
for your updates received on 2nd June 2006.

We are grateful for this opportunity to comment, but we note that this
type of communication requesting clarifications is better suited to a 
mailing list discussion than to official communications that, by their
nature, have a slow turn-around. This opinion is considerably 
reinforced by the process we have gone through here with a revision to
the OIF communication being generated while CCAMP is trying to draft 
its response. It seems to us that if official lines of communication 
are to be followed then they have to be adhered to, but if iterative
discussions are needed (as has proved to be the case here) then it would
be possible to respond far more dynamically using mailing lists.

The appropriate place for discussions of GMPLS protocols is the CCAMP
working group mailing list. Details of how to subscribe to the mailing
list can be found at
http://www.ietf.org/html.charters/ccamp-charter.html

Anyway, the CCAMP chairs are keen to ensure smooth communications with
the OIF and have consulted as widely as they could in the short time in
order to update the response that we had already drafted to your 
original enquiries.

We hope that our answers are satisfactory.

In the remainder of our response we have quoted extracts from your two
communications as:

>1> For a quote from the first communication dated 20th May 2006

>2> For a quote from your second communication dated 2nd June 2006

>1> Future updates to OIF UNI and E-NNI signaling may include a feature
>1> for 1:N connection protection. The attached document presents
>1> requirements for these features. Recently a review was completed
>1> of RFCs 4426, 4427 and 4428 and IETF drafts that may be able to
>1> implement this function (including draft-ietf-ccamp-gmpls-recovery-
>1> e2e-signaling-03 and draft-ietf-ccamp-gmpls-segment-recovery-02).
>1> It appears that the abstract messages from RFC 4426 provide much
>1> of this functionality, however several questions resulted from this
>1> review. OIF would appreciate review and comments from IETF
>1> CCAMP on the following items.
>1>
>1> 1.) OIF would appreciate knowing if there are protocol features in
>1> other IETF documents relevant to 1:N protection .

We would like to suggest that, in order to utilize advanced features of
the GMPLS control plane protocol, engineers should be familiar with the
full set of GMPLS RFCs and Internet-Drafts. These are listed on the
CCAMP charter page and can be downloaded free of charge by clicking on
the links.

Although not all of this work is directly related to protection and
restoration, it should be noted that any protocol aspect present for a
working path may also be required for a protection path. Protocol
engineers must, therefore, be familiar with the details of the protocol
before attempting to provide advanced functions like protection.

>2> 1) OIF would appreciate CCAMP's guidance as to whether CCAMP has
>2> defined standards for any similar form of restoration, i.e., one
>2> that protects a group of LSPs at once over a local span, by
>2> shifting these LSPs from their original link within the span over
>2> to a backup link.  It should be noted that
>2> - the backup link may be a different type than the original
>2> (e.g., OC192 rather than OC48), so that GMPLS signaling rather
>2> than underlying SONET/SDH link protection is used to perform
>2> the switchover; and
>2>
>2> - it is intended that the affected LSPs be shifted using a
>2> single signaling interaction rather than separate interactions
>2> per individual LSP in order to reduce the signaling overhead
>2> required.
>2>
>2> We believe that some of the existing work, especially for segment
>2> recovery, may be helpful, but may not meet the exact requirements
>2> of the service that has been proposed within OIF. Any pointers to
>2> existing drafts or RFCs, however, would be greatly appreciated.

There are two principal ways in which the objectives you cite can be
met, and both of these techniques are, to our certain knowledge, 
implemented and successfully deployed.

a) Link-level protection. This technique relies on the protection
   of the underlying link outside the scope of GMPLS. Thus, the
   TE link over which one or more LSPs are provisioned is actually
   supported by more than one underlying link. When one link fails,
   the traffic that it was carrying (the LSPs) is transferred to
   another link. This type of protection is transparent to GMPLS
   although it could leverage GMPLS fault notification procedures.

   You can learn some more about link-level protection by reading
   RFC3945 and RFC4426 (where it is referred to as Span Protection).

   Please note that the links used in this mode do not need to be
   of the same type. This is not link bundling.

b) LSP Hierarchies. This technique relies on nesting multiple LSPs
   within another LSP. Most familiar in packet technologies, this
   process is also applicable to non-packet technologies where
   appropriate adaptation is available.

   By nesting multiple LSPs within another LSP, it is possible to
   reroute them all simply by rerouting the nesting LSP. Thus any
   protection scheme that can be applied to the nesting LSP can be
   applied to the nested LSPs in a single stage. Such procedures
   are, therefore, fully available for GMPLS control.

   You can read more about LSP hierarchies in RFC4206.

Excellent though the procedures documented in
draft-ietf-ccamp-gmpls-segment-recovery are, we are unsure as to 
the "exact requirements of the service that has been proposed
within OIF" and so cannot be sure which procedures to advise for
the problem as you have described it.

>2> 2) Reviewing some of the existing RFC text, we note that RFC 4426
>2> section 2.5.2 states "it MAY be possible for the LSPs on the working
>2> link to be mapped to the protection link without re-signaling each
>2> individual LSP" and "it MAY be possible to change the component
>2> links without needing to re-signal each individual LSP".
>2> This text appears to refer to the use of SONET/SDH link protection
>2> in such a way that the labels for each LSP remain the same. Does
>2> this imply, however, that an action that changes the local
>2> labels for the affected LSPs then requires re-signaling of each
>2> individual LSP, or is there a "bulk" mechanism to change labels
>2> for a group of LSPs simultaneously?

Your question is confusing in the light of the referenced section. The
section describes the messages required to achieve span protection.
Clearly, if a span is protected, then all LSPs carried over that span
may be transparently protected. This is how normal link protection 
operates and there is nothing clever going on.

Obviously (hopefully this is obvious) if you change the label in use on
a link for a particular LSP then the NEs at each end of the link need 
to know that information since both the sender and the receiver need to
use the correct label. This applies for each LSP whose label you change.
The accepted mechanism in the control plane for exchanging labels is the
signaling protocol, so it follows that, if you wish to change the label
in use for an LSP on a link, you must engage signaling.

You should observe that an NE may change the label in use on a link at
any time using the RSVP-TE protocol. All that is required (assuming a
unidirectional LSP) is a trigger Resv message carrying a new label.
Considerations of the impact to user traffic are left as an exercise for
the reader.

It is unclear how the "bulk" mechanism you propose could operate unless
it was well-known that all labels are going to change in the same way. 
So perhaps you are suggesting that a single signaling message might 
itemise all of the LSPs and show each new label. If this is really a 
significant issue (i.e., you feel it is absolutely imperative to reduce
the number of signaling messages) then you should consider RSVP message
bundling.

>2> 3) RFC 4426 describes the sending of the Failure Indication
>2> Message upon detection of failure by a slave device.  It is
>2> our belief that the same mechanism could also be used when
>2> the slave device is triggered to send an indication due to
>2> management system intervention (cases are mentioned in RFC
>2> 4427 but not in 4426), and we would like to know if CCAMP
>2> concurs with this.
>2> An example of where this might occur is where the master
>2> and slave devices are in different management domains.

As you correctly observe, RFC4427 section 4.13 describes exactly this
case where management plane intervention causes a Failure Indication,
and it is useful for forced or controlled switch-over.

You should note that RFC4426 section 2.5.1 says of the Failure
Indication message...
   This message is sent from the slave to the master to indicate the
   identities of one or more failed working links.  This message MAY not
   be necessary when the transport plane technology itself provides for
   such a notification.

It could also be the case that the message MAY not be necessary in the
case where the failure indication is conveyed to the master node by the
management plane. That is to say, there is no specific requirement (in
the case of management plane intervention) for the intervention to be 
to the slave and causing a Failure Indication message to be sent to the
master - the management plane intervention could consist of a 
notification sent to both the slave and the master from the management
plane.

The absence of this discussion within the GMPLS RFCs owes much to the
fact that they are largely control plane specifications with some notes
about the management plane for additional helpfulness.

Your final example about the use of this technique where the master and
slave are in different management domains is interesting, but the use of
a control plane means that you should consider the control plane 
domains, not just the management domains.

>1> 4.) A goal of the 1:N protection is to use a bulk notification and
>1> recovery procedure, based on RFC 4427 section 4.15. However, that
>1> RFC states the corresponding recovery switching actions are
>1> performed at the LSP level. It would be useful to know if bulk
>1> processing could be applied to recovery of individual connection
>1> segments on the failed span, not entire LSPs.

>2> 4) RFC 4427, section 4.15 discusses bulk recovery for a failed span,
>2> and suggests that the recovery switching message to recovered LSP
>2> ratio may be 1 or greater.  OIF would like to know if it is possible
>2> to define procedures such that the ratio is much less than 1, 
>2> i.e., a message that causes bulk recovery actions on a number of
>2> LSPs.

We believe that you have missed the point of section 4.15 of RFC4427.
This section is describing the case where all or only some of the LSPs
carried on a span are protected by a single recovery message exchange 
(full or partial span protection). In the case of partial span
protection it is possible that not all LSPs on the span will be
protected. Thus, the discussion of message to LSP ratios refers to the
number of recovery messages needed to protect the LSPs on a span.

The expression of the ratios is probably unclear, but the subsequent
text explains the situation.

Let us assume that there are S LSPs on the span, and s LSPs protected by
a protection message. Consider the ratio S/s.

If S/s = 1, one message has been used to protect all LSPs on the span.
(Full recovery)

If S/s > 1, more than one message is used to protect all of the LSPs on
the span OR not all LSPs on the span are protected. (Partial recovery)

Clearly a ratio of less than one would be particularly odd !

It should be obvious from wider reading of the RFCs (4436, 4427, and
3473) that the whole point of the Failure Indication is to be able to
report on more than one LSP failure at a time.

>2> 5) RFC 4426 defines a "master" and "slave" role for dedicated 1+1
>2> and 1:1 span protection and a "source" and "destination" role for
>2> control of end-to-end restoration and for reversion.  We believe
>2> that "source" and "destination" mean the initiator and receiver
>2> of the LSP (as opposed to the source and destination of data
>2> in-band).

The terms "source" and "destination" are standard.

For unidirectional LSPs, the "source" is the source of data on the LSP,
also known as he ingress. Where a control plane is used, signaling 
progresses from the source (also known as the head-end).

Similarly, the "destination" is the destination of data on the LSP, also
known as the egress. Where a control plane is used, signaling progresses
from the source to the destination (also known as the tail-end).

By common convention, for bidirectional LSPs set up by the control
plane, the "source" remains the signaling source (ingress) and the
"destination" is the signaling destination (egress). Traffic flowing
in the reverse direction is referred to as reverse direction traffic
and flows from destination to source.

Very probably there is an ITU-T architectural term for these end points
of LSPs.

Note that RFC 4426 is very careful to state:
   The end-to-end recovery models discussed in this
   document apply to segment protection where the source and destination
   refer to the protected segment rather than the entire LSP.

Should this still be unclear to you, RFC4426 section 1 states
   Consider the control plane message flow during the establishment of
   an LSP.  This message flow proceeds from an initiating (or source)
   node to a terminating (or destination) node, via a sequence of
   intermediate nodes.  A node along the LSP is said to be "upstream"
   from another node if the former occurs first in the sequence.  The
   latter node is said to be "downstream" from the former node.  That
   is, an "upstream" node is closer to the initiating node than a node
   further "downstream".  Unless otherwise stated, all references to
   "upstream" and "downstream" are in terms of the control plane message
   flow.

The terms "master" and "slave" are introduced to describe the trigger
points for protection activity and are defined clearly in section 2.3
of RFC4426.
   Consider two adjacent nodes, A and B.  Under 1:1 protection, a
   dedicated link j between A and B is pre-assigned to protect working
   link i.  Link j may be carrying (pre-emptable) Extra Traffic.  A
   failure affecting link i results in the corresponding LSP(s) being
   restored to link j.  Extra Traffic being routed over link j may need
   to be pre-empted to accommodate the LSPs that have to be restored.

   Once a fault is isolated/localized, the affected LSP(s) must be moved
   to the protection link.  The process of moving an LSP from a failed
   (working) link to a protection link must be initiated by one of the
   nodes, A or B.  This node is referred to as the "master".  The other
   node is called the "slave".  The determination of the master and the
   slave may be based on configured information or protocol specific
   requirements.

Thus, the "master" is responsible for initiating the switchover, and the
slave is responsible for keeping up with the state changes.

>1> Further, it would be helpful to understand why the actions are
>1> performed by source and destination nodes rather than master and
>1> slave nodes. It may be appropriate to reuse the master/slave roles
>1> in the reversion process just as is done in the switchover process.

>2> We are not clear on the rationale for when control
>2> plane roles are based on master/slave vs. source/destination:
>2> it appears that local span actions are controlled using
>2> master/slave while remote actions are controlled using
>2> source/destination, however the reasoning for control of
>2> reversion is less clear to us.  Any clarification of the
>2> rationale for using master/slave vs. source/destination
>2> control would be appreciated.

As explained by the definitions of the terms, there is a distinction
between the node that invokes a switchover process (the master) and a
node that performs the process. For example, a Bridge and Switch Request
message is sent by the source node after it has bridged traffic back to 
both working and protection links simply because the source node has 
performed the bridging and is the only node that can know this fact.

In other words, whether the source is master or slave depends on the
protection scheme in use and the nature of the operation. It should be
a simple matter when considering a protection scheme and the necessary
protocol exchanges and switchover actions to determine which of the
source and destination must play the master or slave role.

>1> In addition, RFC 4426 does not include an abstract message similar
>1> to the Failure Indication Message to request the beginning of the
>1> reversion procedure. It may be beneficial to include a message from
>1> the slave device to initiate reversion, just as there is a Failure
>1> Indication Message to initiate switchover. (RFC 4426 states that the
>1> Failure Indication Message may not be needed when the transport 
>1> plane technology itself provides such a notification. The same may
>1> apply when a failure is cleared; however, there should still be an
>1> optional message to trigger the reversion process.)

>2> 6) We believe that it may be useful in some cases of reversion to
>2> allow a "slave" device to request reversion using an abstract 
>2> message similar to the Failure Indication Message.  An example
>2> case is (again) when the "master" and "slave" devices are in
>2> different management domains, such that reversion is initiated from
>2> the management domain of the "slave" device.  We request CCAMP 
>2> comment on this suggestion.

Reversion is described as an administrative procedure in RFC4426 and
RFC4427 quite deliberately. In our view it should not be a rapid 
response to a specific situation triggered through the control plane
by the 'master', but should be a considered operation under the control
of administrative policy. The trigger is, therefore, outside the scope 
of the control plane. This discussion can be seen in section 4.13 of 
RFC4427.

We believe that your suggestion does not change this view, but that you
are proposing that the control plane be used as a transport for a 
management plane request. You are suggesting that a management station
in the management domain that contains the slave sends the request to 
the slave, the slave would then deliver the request through the control
plane to the master. In the absence of any specific control plane
requirement for this message, we believe that the correct architectural
approach is for management plane messages to be delivered in the 
management plane. Thus, if there is a need for management plane 
coordination between separate management plane domains, this should be
arranged through an appropriate management plane peering point where 
the correct policies can be applied.


We hope this answers your questions, and we would be happy to enter into
further dialog on these topics.

In conclusion, it may be helpful to the OIF to know the status of two
CCAMP drafts related to recovery.
draft-ietf-ccamp-gmpls-recovery-e2e-signaling-03 and
draft-ietf-ccamp-gmpls-segment-recovery-02 both completed CCAMP
working group last call in early 2005. Since then they have been
implemented and tested. The drafts are stable and complete, and are
queued in the IETF process waiting to become RFCs



Best regards,
Adrian Farrel and Deborah Brungard
CCAMP co-chairs