Response to your questions on 1:n protection

"Adrian Farrel" <adrian@olddog.co.uk> Mon, 26 June 2006 15:49 UTC

Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1FutL5-0005jy-HC for ccamp-archive@ietf.org; Mon, 26 Jun 2006 11:49:27 -0400
Received: from psg.com ([147.28.0.62]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FutL0-0003Cv-ID for ccamp-archive@ietf.org; Mon, 26 Jun 2006 11:49:27 -0400
Received: from majordom by psg.com with local (Exim 4.60 (FreeBSD)) (envelope-from <owner-ccamp@ops.ietf.org>) id 1FutFm-000EKY-Km for ccamp-data@psg.com; Mon, 26 Jun 2006 15:43:58 +0000
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on psg.com
X-Spam-Level:
X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,FORGED_RCVD_HELO autolearn=ham version=3.1.1
Received: from [80.68.34.49] (helo=mail2.noc.data.net.uk) by psg.com with esmtp (Exim 4.60 (FreeBSD)) (envelope-from <adrian@olddog.co.uk>) id 1FutFj-000EKE-Vo for ccamp@ops.ietf.org; Mon, 26 Jun 2006 15:43:56 +0000
Received: from 57-99.dsl.data.net.uk ([80.68.57.99] helo=cortex.aria-networks.com) by mail2.noc.data.net.uk with esmtp (Exim 3.36 #1) id 1FutFf-0007VP-00 for ccamp@ops.ietf.org; Mon, 26 Jun 2006 16:43:51 +0100
Received: from your029b8cecfe ([194.94.109.154] RDNS failed) by cortex.aria-networks.com with Microsoft SMTPSVC(6.0.3790.1830); Mon, 26 Jun 2006 16:43:53 +0100
Message-ID: <042501c69937$522252f0$0a23fea9@your029b8cecfe>
Reply-To: Adrian Farrel <adrian@olddog.co.uk>
From: Adrian Farrel <adrian@olddog.co.uk>
To: Jim Jones <Jim.D.Jones@alcatel.com>
Cc: ccamp@ops.ietf.org, "Brungard, Deborah A, ALABS" <dbrungard@att.com>, Ross Callon <rcallon@juniper.net>, fenner@rearch.att.com
Subject: Response to your questions on 1:n protection
Date: Mon, 26 Jun 2006 16:43:30 +0100
Organization: Old Dog Consulting
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="response"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2180
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
X-OriginalArrivalTime: 26 Jun 2006 15:43:53.0729 (UTC) FILETIME=[53F27F10:01C69937]
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk
X-Spam-Score: 0.1 (/)
X-Scan-Signature: d437399464e10b52abe9a34ed7e712d0

Dear Jim,

Thank you for your communication to CCAMP on the use of GMPLS to provide
1:n protection at the OIF UNI and the OIF E-NNI dated 20th May 2006 and
for your updates received on 2nd June 2006.

We are grateful for this opportunity to comment, but we note that this
type of communication requesting clarifications is better suited to a
mailing list discussion than to official communications that, by their
nature, have a slow turn-around. This opinion is considerably
reinforced by the process we have gone through here with a revision to
the OIF communication being generated while CCAMP is trying to draft
its response. It seems to us that if official lines of communication
are to be followed then they have to be adhered to, but if iterative
discussions are needed (as has proved to be the case here) then it would
be possible to respond far more dynamically using mailing lists.

The appropriate place for discussions of GMPLS protocols is the CCAMP
working group mailing list. Details of how to subscribe to the mailing
list can be found at
http://www.ietf.org/html.charters/ccamp-charter.html

Anyway, the CCAMP chairs are keen to ensure smooth communications with
the OIF and have consulted as widely as they could in the short time in
order to update the response that we had already drafted to your
original enquiries.

We hope that our answers are satisfactory.

In the remainder of our response we have quoted extracts from your two
communications as:

>1> For a quote from the first communication dated 20th May 2006

>2> For a quote from your second communication dated 2nd June 2006

>1> Future updates to OIF UNI and E-NNI signaling may include a feature
>1> for 1:N connection protection. The attached document presents
>1> requirements for these features. Recently a review was completed
>1> of RFCs 4426, 4427 and 4428 and IETF drafts that may be able to
>1> implement this function (including draft-ietf-ccamp-gmpls-recovery-
>1> e2e-signaling-03 and draft-ietf-ccamp-gmpls-segment-recovery-02).
>1> It appears that the abstract messages from RFC 4426 provide much
>1> of this functionality, however several questions resulted from this
>1> review. OIF would appreciate review and comments from IETF
>1> CCAMP on the following items.
>1>
>1> 1.) OIF would appreciate knowing if there are protocol features in
>1> other IETF documents relevant to 1:N protection .

We would like to suggest that, in order to utilize advanced features of
the GMPLS control plane protocol, engineers should be familiar with the
full set of GMPLS RFCs and Internet-Drafts. These are listed on the
CCAMP charter page and can be downloaded free of charge by clicking on
the links.

Although not all of this work is directly related to protection and
restoration, it should be noted that any protocol aspect present for a
working path may also be required for a protection path. Protocol
engineers must, therefore, be familiar with the details of the protocol
before attempting to provide advanced functions like protection.

>2> 1) OIF would appreciate CCAMP's guidance as to whether CCAMP has
>2> defined standards for any similar form of restoration, i.e., one
>2> that protects a group of LSPs at once over a local span, by
>2> shifting these LSPs from their original link within the span over
>2> to a backup link.  It should be noted that
>2> - the backup link may be a different type than the original
>2> (e.g., OC192 rather than OC48), so that GMPLS signaling rather
>2> than underlying SONET/SDH link protection is used to perform
>2> the switchover; and
>2>
>2> - it is intended that the affected LSPs be shifted using a
>2> single signaling interaction rather than separate interactions
>2> per individual LSP in order to reduce the signaling overhead
>2> required.
>2>
>2> We believe that some of the existing work, especially for segment
>2> recovery, may be helpful, but may not meet the exact requirements
>2> of the service that has been proposed within OIF. Any pointers to
>2> existing drafts or RFCs, however, would be greatly appreciated.

There are two principal ways in which the objectives you cite can be
met, and both of these techniques are, to our certain knowledge,
implemented and successfully deployed.

a) Link-level protection. This technique relies on the protection
  of the underlying link outside the scope of GMPLS. Thus, the
  TE link over which one or more LSPs are provisioned is actually
  supported by more than one underlying link. When one link fails,
  the traffic that it was carrying (the LSPs) is transferred to
  another link. This type of protection is transparent to GMPLS
  although it could leverage GMPLS fault notification procedures.

  You can learn some more about link-level protection by reading
  RFC3945 and RFC4426 (where it is referred to as Span Protection).

  Please note that the links used in this mode do not need to be
  of the same type. This is not link bundling.

b) LSP Hierarchies. This technique relies on nesting multiple LSPs
  within another LSP. Most familiar in packet technologies, this
  process is also applicable to non-packet technologies where
  appropriate adaptation is available.

  By nesting multiple LSPs within another LSP, it is possible to
  reroute them all simply by rerouting the nesting LSP. Thus any
  protection scheme that can be applied to the nesting LSP can be
  applied to the nested LSPs in a single stage. Such procedures
  are, therefore, fully available for GMPLS control.

  You can read more about LSP hierarchies in RFC4206.

You will also want to note that the Notify message [RFC3473] defines a 
single signaling message capable of providing a bulk notification procedure. 
Refer to section 12 of draft-ietf-ccamp-gmpls-recovery-e2e-signaling for 
further descriptions, and note that this technique is also applicable in 
segment protection.

Excellent though the procedures documented in
draft-ietf-ccamp-gmpls-segment-recovery are, we are unsure as to
the "exact requirements of the service that has been proposed
within OIF" and so cannot be sure which procedures to advise for
the problem as you have described it.

>2> 2) Reviewing some of the existing RFC text, we note that RFC 4426
>2> section 2.5.2 states "it MAY be possible for the LSPs on the working
>2> link to be mapped to the protection link without re-signaling each
>2> individual LSP" and "it MAY be possible to change the component
>2> links without needing to re-signal each individual LSP".
>2> This text appears to refer to the use of SONET/SDH link protection
>2> in such a way that the labels for each LSP remain the same. Does
>2> this imply, however, that an action that changes the local
>2> labels for the affected LSPs then requires re-signaling of each
>2> individual LSP, or is there a "bulk" mechanism to change labels
>2> for a group of LSPs simultaneously?

Your question is confusing in the light of the referenced section. The
section describes the messages required to achieve span protection.
Clearly, if a span is protected, then all LSPs carried over that span
may be transparently protected. This is how normal link protection
operates and there is nothing clever going on.

Obviously (hopefully this is obvious) if you change the label in use on
a link for a particular LSP then the NEs at each end of the link need
to know that information since both the sender and the receiver need to
use the correct label. This applies for each LSP whose label you change.
The accepted mechanism in the control plane for exchanging labels is the
signaling protocol, so it follows that, if you wish to change the label
in use for an LSP on a link, you must engage signaling.

You should observe that an NE may change the label in use on a link at
any time using the RSVP-TE protocol. All that is required (assuming a
unidirectional LSP) is a trigger Resv message carrying a new label.
Considerations of the impact to user traffic are left as an exercise for
the reader.

It is unclear how the "bulk" mechanism you propose could operate unless
it was well-known that all labels are going to change in the same way.
So perhaps you are suggesting that a single signaling message might
itemise all of the LSPs and show each new label. If this is really a
significant issue (i.e., you feel it is absolutely imperative to reduce
the number of signaling messages) then you should consider RSVP message
bundling.

Otherwise, as mentioned above, the Notify message provides a bulk mechanism.

>2> 3) RFC 4426 describes the sending of the Failure Indication
>2> Message upon detection of failure by a slave device.  It is
>2> our belief that the same mechanism could also be used when
>2> the slave device is triggered to send an indication due to
>2> management system intervention (cases are mentioned in RFC
>2> 4427 but not in 4426), and we would like to know if CCAMP
>2> concurs with this.
>2> An example of where this might occur is where the master
>2> and slave devices are in different management domains.

As you correctly observe, RFC4427 section 4.13 describes exactly this
case where management plane intervention causes a Failure Indication,
and it is useful for forced or controlled switch-over.

You should note that RFC4426 section 2.5.1 says of the Failure
Indication message...
  This message is sent from the slave to the master to indicate the
  identities of one or more failed working links.  This message MAY not
  be necessary when the transport plane technology itself provides for
  such a notification.

It could also be the case that the message MAY not be necessary in the
case where the failure indication is conveyed to the master node by the
management plane. That is to say, there is no specific requirement (in
the case of management plane intervention) for the intervention to be
to the slave and causing a Failure Indication message to be sent to the
master - the management plane intervention could consist of a
notification sent to both the slave and the master from the management
plane.

The absence of this discussion within the GMPLS RFCs owes much to the
fact that they are largely control plane specifications with some notes
about the management plane for additional helpfulness.

Your final example about the use of this technique where the master and
slave are in different management domains is interesting, but the use of
a control plane means that you should consider the control plane
domains, not just the management domains.

>1> 4.) A goal of the 1:N protection is to use a bulk notification and
>1> recovery procedure, based on RFC 4427 section 4.15. However, that
>1> RFC states the corresponding recovery switching actions are
>1> performed at the LSP level. It would be useful to know if bulk
>1> processing could be applied to recovery of individual connection
>1> segments on the failed span, not entire LSPs.

>2> 4) RFC 4427, section 4.15 discusses bulk recovery for a failed span,
>2> and suggests that the recovery switching message to recovered LSP
>2> ratio may be 1 or greater.  OIF would like to know if it is possible
>2> to define procedures such that the ratio is much less than 1, 2> i.e., a 
>message that causes bulk recovery actions on a number of
>2> LSPs.

We believe that you have missed the point of section 4.15 of RFC4427.
This section is describing the case where all or only some of the LSPs
carried on a span are protected by a single recovery message exchange
(full or partial span protection). In the case of partial span
protection it is possible that not all LSPs on the span will be
protected. Thus, the discussion of message to LSP ratios refers to the
number of recovery messages needed to protect the LSPs on a span.

The expression of the ratios is probably unclear, but the subsequent
text explains the situation.

Let us assume that there are S LSPs on the span, and s LSPs protected by
a protection message. Consider the ratio S/s.

If S/s = 1, one message has been used to protect all LSPs on the span.
(Full recovery)

If S/s > 1, more than one message is used to protect all of the LSPs on
the span OR not all LSPs on the span are protected. (Partial recovery)

Clearly a ratio of less than one would be particularly odd !

It should be obvious from wider reading of the RFCs (4436, 4427, and
3473) that the whole point of the Failure Indication is to be able to
report on more than one LSP failure at a time.

>2> 5) RFC 4426 defines a "master" and "slave" role for dedicated 1+1
>2> and 1:1 span protection and a "source" and "destination" role for
>2> control of end-to-end restoration and for reversion.  We believe
>2> that "source" and "destination" mean the initiator and receiver
>2> of the LSP (as opposed to the source and destination of data
>2> in-band).

The terms "source" and "destination" are standard.

For unidirectional LSPs, the "source" is the source of data on the LSP,
also known as he ingress. Where a control plane is used, signaling
progresses from the source (also known as the head-end).

Similarly, the "destination" is the destination of data on the LSP, also
known as the egress. Where a control plane is used, signaling progresses
from the source to the destination (also known as the tail-end).

By common convention, for bidirectional LSPs set up by the control
plane, the "source" remains the signaling source (ingress) and the
"destination" is the signaling destination (egress). Traffic flowing
in the reverse direction is referred to as reverse direction traffic
and flows from destination to source.

Very probably there is an ITU-T architectural term for these end points
of LSPs.

Note that RFC 4426 is very careful to state:
  The end-to-end recovery models discussed in this
  document apply to segment protection where the source and destination
  refer to the protected segment rather than the entire LSP.

Should this still be unclear to you, RFC4426 section 1 states
  Consider the control plane message flow during the establishment of
  an LSP.  This message flow proceeds from an initiating (or source)
  node to a terminating (or destination) node, via a sequence of
  intermediate nodes.  A node along the LSP is said to be "upstream"
  from another node if the former occurs first in the sequence.  The
  latter node is said to be "downstream" from the former node.  That
  is, an "upstream" node is closer to the initiating node than a node
  further "downstream".  Unless otherwise stated, all references to
  "upstream" and "downstream" are in terms of the control plane message
  flow.

The terms "master" and "slave" are introduced to describe the trigger
points for protection activity and are defined clearly in section 2.3
of RFC4426.
  Consider two adjacent nodes, A and B.  Under 1:1 protection, a
  dedicated link j between A and B is pre-assigned to protect working
  link i.  Link j may be carrying (pre-emptable) Extra Traffic.  A
  failure affecting link i results in the corresponding LSP(s) being
  restored to link j.  Extra Traffic being routed over link j may need
  to be pre-empted to accommodate the LSPs that have to be restored.

  Once a fault is isolated/localized, the affected LSP(s) must be moved
  to the protection link.  The process of moving an LSP from a failed
  (working) link to a protection link must be initiated by one of the
  nodes, A or B.  This node is referred to as the "master".  The other
  node is called the "slave".  The determination of the master and the
  slave may be based on configured information or protocol specific
  requirements.

Thus, the "master" is responsible for initiating the switchover, and the
slave is responsible for keeping up with the state changes.

>1> Further, it would be helpful to understand why the actions are
>1> performed by source and destination nodes rather than master and
>1> slave nodes. It may be appropriate to reuse the master/slave roles
>1> in the reversion process just as is done in the switchover process.

>2> We are not clear on the rationale for when control
>2> plane roles are based on master/slave vs. source/destination:
>2> it appears that local span actions are controlled using
>2> master/slave while remote actions are controlled using
>2> source/destination, however the reasoning for control of
>2> reversion is less clear to us.  Any clarification of the
>2> rationale for using master/slave vs. source/destination
>2> control would be appreciated.

As explained by the definitions of the terms, there is a distinction
between the node that invokes a switchover process (the master) and a
node that performs the process. For example, a Bridge and Switch Request
message is sent by the source node after it has bridged traffic back to
both working and protection links simply because the source node has
performed the bridging and is the only node that can know this fact.

In other words, whether the source is master or slave depends on the
protection scheme in use and the nature of the operation. It should be
a simple matter when considering a protection scheme and the necessary
protocol exchanges and switchover actions to determine which of the
source and destination must play the master or slave role.

>1> In addition, RFC 4426 does not include an abstract message similar
>1> to the Failure Indication Message to request the beginning of the
>1> reversion procedure. It may be beneficial to include a message from
>1> the slave device to initiate reversion, just as there is a Failure
>1> Indication Message to initiate switchover. (RFC 4426 states that the
>1> Failure Indication Message may not be needed when the transport 1> plane 
>technology itself provides such a notification. The same may
>1> apply when a failure is cleared; however, there should still be an
>1> optional message to trigger the reversion process.)

>2> 6) We believe that it may be useful in some cases of reversion to
>2> allow a "slave" device to request reversion using an abstract 2> message 
>similar to the Failure Indication Message.  An example
>2> case is (again) when the "master" and "slave" devices are in
>2> different management domains, such that reversion is initiated from
>2> the management domain of the "slave" device.  We request CCAMP 2> 
>comment on this suggestion.

Reversion is described as an administrative procedure in RFC4426 and
RFC4427 quite deliberately. In our view it should not be a rapid
response to a specific situation triggered through the control plane
by the 'master', but should be a considered operation under the control
of administrative policy. The trigger is, therefore, outside the scope
of the control plane. This discussion can be seen in section 4.13 of
RFC4427.

We believe that your suggestion does not change this view, but that you
are proposing that the control plane be used as a transport for a
management plane request. You are suggesting that a management station
in the management domain that contains the slave sends the request to
the slave, the slave would then deliver the request through the control
plane to the master. In the absence of any specific control plane
requirement for this message, we believe that the correct architectural
approach is for management plane messages to be delivered in the
management plane. Thus, if there is a need for management plane
coordination between separate management plane domains, this should be
arranged through an appropriate management plane peering point where
the correct policies can be applied.


We hope this answers your questions, and we would be happy to enter into
further dialog on these topics.

In conclusion, it may be helpful to the OIF to know the status of two
CCAMP drafts related to recovery.
draft-ietf-ccamp-gmpls-recovery-e2e-signaling-03 and
draft-ietf-ccamp-gmpls-segment-recovery-02 both completed CCAMP
working group last call in early 2005. Since then they have been
implemented and tested. The drafts are stable and complete, and are
queued in the IETF process waiting to become RFCs

Best regards,
Adrian Farrel and Deborah Brungard
CCAMP co-chairs