Re: Updated draft response to OIF on 1:n protection

Lou Berger <lberger@labn.net> Mon, 12 June 2006 12:48 UTC

Message-Id: <7.0.1.0.2.20060612081409.06ee4e28@labn.net>
Date: Mon, 12 Jun 2006 08:42:34 -0400
To: Adrian Farrel <adrian@olddog.co.uk>
From: Lou Berger <lberger@labn.net>
Subject: Re: Updated draft response to OIF on 1:n protection
Cc: ccamp@ops.ietf.org
In-Reply-To: <005b01c68b3b$191d9be0$c2849ed9@your029b8cecfe>
References: <005b01c68b3b$191d9be0$c2849ed9@your029b8cecfe>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk

Adrian,
         I like what you have so far, but there is one additional item to add:

Bulk notifications *are* supported in the recovery drafts using the 
Notify message.  Notify messages could be used to provide a "single 
signaling interaction" and "a bulk notification ... 
procedure".  Protection/recovery could then be provided on a per LSP 
basis (per segment protection) or use the hierarchy approach you 
mentioned below.  (Bulk) reversion is also supported via Notify 
messages, see section 12 of 
draft-ietf-ccamp-gmpls-recovery-e2e-signaling.  Note that these 
procedures also apply to segment recovery.

Lou

At 03:43 PM 6/8/2006, Adrian Farrel wrote:

>Hi,
>
>Looking at the latest input from the OIF, I think we have to perform
>some type of meld on the two incoming communications and respond
>to them together.
>
>Here is my attempt building on what we had before. Please comment on or
>off the list.
>
>Thanks,
>
>Adrian
>
>==
>
>Dear Jim,
>
>Thank you for your communication to CCAMP on the use of GMPLS to provide
>1:n protection at the OIF UNI and the OIF E-NNI dated 20th May 2006 and
>for your updates received on 2nd June 2006.
>
>We are grateful for this opportunity to comment, but we note that this
>type of communication requesting clarifications is better suited to 
>a mailing list discussion than to official communications that, by their
>nature, have a slow turn-around. This opinion is considerably 
>reinforced by the process we have gone through here with a revision to
>the OIF communication being generated while CCAMP is trying to draft 
>its response. It seems to us that if official lines of communication 
>are to be followed then they have to be adhered to, but if iterative
>discussions are needed (as has proved to be the case here) then it would
>be possible to respond far more dynamically using mailing lists.
>
>The appropriate place for discussions of GMPLS protocols is the CCAMP
>working group mailing list. Details of how to subscribe to the mailing
>list can be found at
>http://www.ietf.org/html.charters/ccamp-charter.html
>
>Anyway, the CCAMP chairs are keen to ensure smooth communications with
>the OIF and have consulted as widely as they could in the short time in
>order to update the response that we had already drafted to your 
>original enquiries.
>
>We hope that our answers are satisfactory.
>
>In the remainder of our response we have quoted extracts from your two
>communications as:
>
>>1> For a quote from the first communication dated 20th May 2006
>
>>2> For a quote from your second communication dated 2nd June 2006
>
>>1> Future updates to OIF UNI and E-NNI signaling may include a feature
>>1> for 1:N connection protection. The attached document presents
>>1> requirements for these features. Recently a review was completed
>>1> of RFCs 4426, 4427 and 4428 and IETF drafts that may be able to
>>1> implement this function (including draft-ietf-ccamp-gmpls-recovery-
>>1> e2e-signaling-03 and draft-ietf-ccamp-gmpls-segment-recovery-02).
>>1> It appears that the abstract messages from RFC 4426 provide much
>>1> of this functionality, however several questions resulted from this
>>1> review. OIF would appreciate review and comments from IETF
>>1> CCAMP on the following items.
>>1>
>>1> 1.) OIF would appreciate knowing if there are protocol features in
>>1> other IETF documents relevant to 1:N protection .
>
>We would like to suggest that, in order to utilize advanced features of
>the GMPLS control plane protocol, engineers should be familiar with the
>full set of GMPLS RFCs and Internet-Drafts. These are listed on the
>CCAMP charter page and can be downloaded free of charge by clicking on
>the links.
>
>Although not all of this work is directly related to protection and
>restoration, it should be noted that any protocol aspect present for a
>working path may also be required for a protection path. Protocol
>engineers must, therefore, be familiar with the details of the protocol
>before attempting to provide advanced functions like protection.
>
>>2> 1) OIF would appreciate CCAMP's guidance as to whether CCAMP has
>>2> defined standards for any similar form of restoration, i.e., one
>>2> that protects a group of LSPs at once over a local span, by
>>2> shifting these LSPs from their original link within the span over
>>2> to a backup link.  It should be noted that
>>2> - the backup link may be a different type than the original
>>2> (e.g., OC192 rather than OC48), so that GMPLS signaling rather
>>2> than underlying SONET/SDH link protection is used to perform
>>2> the switchover; and
>>2>
>>2> - it is intended that the affected LSPs be shifted using a
>>2> single signaling interaction rather than separate interactions
>>2> per individual LSP in order to reduce the signaling overhead
>>2> required.
>>2>
>>2> We believe that some of the existing work, especially for segment
>>2> recovery, may be helpful, but may not meet the exact requirements
>>2> of the service that has been proposed within OIF. Any pointers to
>>2> existing drafts or RFCs, however, would be greatly appreciated.
>
>There are two principal ways in which the objectives you cite can be
>met, and both of these techniques are, to our certain knowledge, 
>implemented and successfully deployed.
>
>a) Link-level protection. This technique relies on the protection
>   of the underlying link outside the scope of GMPLS. Thus, the
>   TE link over which one or more LSPs are provisioned is actually
>   supported by more than one underlying link. When one link fails,
>   the traffic that it was carrying (the LSPs) is transferred to
>   another link. This type of protection is transparent to GMPLS
>   although it could leverage GMPLS fault notification procedures.
>
>   You can learn some more about link-level protection by reading
>   RFC3945 and RFC4426 (where it is referred to as Span Protection).
>
>   Please note that the links used in this mode do not need to be
>   of the same type. This is not link bundling.
>
>b) LSP Hierarchies. This technique relies on nesting multiple LSPs
>   within another LSP. Most familiar in packet technologies, this
>   process is also applicable to non-packet technologies where
>   appropriate adaptation is available.
>
>   By nesting multiple LSPs within another LSP, it is possible to
>   reroute them all simply by rerouting the nesting LSP. Thus any
>   protection scheme that can be applied to the nesting LSP can be
>   applied to the nested LSPs in a single stage. Such procedures
>   are, therefore, fully available for GMPLS control.
>
>   You can read more about LSP hierarchies in RFC4206.
>
>Excellent though the procedures documented in
>draft-ietf-ccamp-gmpls-segment-recovery are, we are unsure as to the 
>"exact requirements of the service that has been proposed
>within OIF" and so cannot be sure which procedures to advise for
>the problem as you have described it.
>
>>2> 2) Reviewing some of the existing RFC text, we note that RFC 4426
>>2> section 2.5.2 states "it MAY be possible for the LSPs on the working
>>2> link to be mapped to the protection link without re-signaling each
>>2> individual LSP" and "it MAY be possible to change the component
>>2> links without needing to re-signal each individual LSP".
>>2> This text appears to refer to the use of SONET/SDH link protection
>>2> in such a way that the labels for each LSP remain the same. Does
>>2> this imply, however, that an action that changes the local
>>2> labels for the affected LSPs then requires re-signaling of each
>>2> individual LSP, or is there a "bulk" mechanism to change labels
>>2> for a group of LSPs simultaneously?
>
>Your question is confusing in the light of the referenced section. The
>section describes the messages required to achieve span protection.
>Clearly, if a span is protected, then all LSPs carried over that span
>may be transparently protected. This is how normal link protection 
>operates and there is nothing clever going on.
>
>Obviously (hopefully this is obvious) if you change the label in use on
>a link for a particular LSP then the NEs at each end of the link 
>need to know that information since both the sender and the receiver need to
>use the correct label. This applies for each LSP whose label you change.
>The accepted mechanism in the control plane for exchanging labels is the
>signaling protocol, so it follows that, if you wish to change the label
>in use for an LSP on a link, you must engage signaling.
>
>You should observe that an NE may change the label in use on a link at
>any time using the RSVP-TE protocol. All that is required (assuming a
>unidirectional LSP) is a trigger Resv message carrying a new label.
>Considerations of the impact to user traffic are left as an exercise for
>the reader.
>
>It is unclear how the "bulk" mechanism you propose could operate unless
>it was well-known that all labels are going to change in the same 
>way. So perhaps you are suggesting that a single signaling message 
>might itemise all of the LSPs and show each new label. If this is 
>really a significant issue (i.e., you feel it is absolutely 
>imperative to reduce
>the number of signaling messages) then you should consider RSVP message
>bundling.
>
>>2> 3) RFC 4426 describes the sending of the Failure Indication
>>2> Message upon detection of failure by a slave device.  It is
>>2> our belief that the same mechanism could also be used when
>>2> the slave device is triggered to send an indication due to
>>2> management system intervention (cases are mentioned in RFC
>>2> 4427 but not in 4426), and we would like to know if CCAMP
>>2> concurs with this.
>>2> An example of where this might occur is where the master
>>2> and slave devices are in different management domains.
>
>As you correctly observe, RFC4427 section 4.13 describes exactly this
>case where management plane intervention causes a Failure Indication,
>and it is useful for forced or controlled switch-over.
>
>You should note that RFC4426 section 2.5.1 says of the Failure
>Indication message...
>   This message is sent from the slave to the master to indicate the
>   identities of one or more failed working links.  This message MAY not
>   be necessary when the transport plane technology itself provides for
>   such a notification.
>
>It could also be the case that the message MAY not be necessary in the
>case where the failure indication is conveyed to the master node by the
>management plane. That is to say, there is no specific requirement (in
>the case of management plane intervention) for the intervention to 
>be to the slave and causing a Failure Indication message to be sent to the
>master - the management plane intervention could consist of a 
>notification sent to both the slave and the master from the management
>plane.
>
>The absence of this discussion within the GMPLS RFCs owes much to the
>fact that they are largely control plane specifications with some notes
>about the management plane for additional helpfulness.
>
>Your final example about the use of this technique where the master and
>slave are in different management domains is interesting, but the use of
>a control plane means that you should consider the control plane 
>domains, not just the management domains.
>
>>1> 4.) A goal of the 1:N protection is to use a bulk notification and
>>1> recovery procedure, based on RFC 4427 section 4.15. However, that
>>1> RFC states the corresponding recovery switching actions are
>>1> performed at the LSP level. It would be useful to know if bulk
>>1> processing could be applied to recovery of individual connection
>>1> segments on the failed span, not entire LSPs.
>
>>2> 4) RFC 4427, section 4.15 discusses bulk recovery for a failed span,
>>2> and suggests that the recovery switching message to recovered LSP
>>2> ratio may be 1 or greater.  OIF would like to know if it is possible
>>2> to define procedures such that the ratio is much less than 1, 2> 
>>i.e., a message that causes bulk recovery actions on a number of
>>2> LSPs.
>
>We believe that you have missed the point of section 4.15 of RFC4427.
>This section is describing the case where all or only some of the LSPs
>carried on a span are protected by a single recovery message 
>exchange (full or partial span protection). In the case of partial span
>protection it is possible that not all LSPs on the span will be
>protected. Thus, the discussion of message to LSP ratios refers to the
>number of recovery messages needed to protect the LSPs on a span.
>
>The expression of the ratios is probably unclear, but the subsequent
>text explains the situation.
>
>Let us assume that there are S LSPs on the span, and s LSPs protected by
>a protection message. Consider the ratio S/s.
>
>If S/s = 1, one message has been used to protect all LSPs on the span.
>(Full recovery)
>
>If S/s > 1, more than one message is used to protect all of the LSPs on
>the span OR not all LSPs on the span are protected. (Partial recovery)
>
>Clearly a ratio of less than one would be particularly odd !
>
>It should be obvious from wider reading of the RFCs (4436, 4427, and
>3473) that the whole point of the Failure Indication is to be able to
>report on more than one LSP failure at a time.
>
>>2> 5) RFC 4426 defines a "master" and "slave" role for dedicated 1+1
>>2> and 1:1 span protection and a "source" and "destination" role for
>>2> control of end-to-end restoration and for reversion.  We believe
>>2> that "source" and "destination" mean the initiator and receiver
>>2> of the LSP (as opposed to the source and destination of data
>>2> in-band).
>
>The terms "source" and "destination" are standard.
>
>For unidirectional LSPs, the "source" is the source of data on the LSP,
>also known as he ingress. Where a control plane is used, signaling 
>progresses from the source (also known as the head-end).
>
>Similarly, the "destination" is the destination of data on the LSP, also
>known as the egress. Where a control plane is used, signaling progresses
>from the source to the destination (also known as the tail-end).
>
>By common convention, for bidirectional LSPs set up by the control
>plane, the "source" remains the signaling source (ingress) and the
>"destination" is the signaling destination (egress). Traffic flowing
>in the reverse direction is referred to as reverse direction traffic
>and flows from destination to source.
>
>Very probably there is an ITU-T architectural term for these end points
>of LSPs.
>
>Note that RFC 4426 is very careful to state:
>   The end-to-end recovery models discussed in this
>   document apply to segment protection where the source and destination
>   refer to the protected segment rather than the entire LSP.
>
>Should this still be unclear to you, RFC4426 section 1 states
>   Consider the control plane message flow during the establishment of
>   an LSP.  This message flow proceeds from an initiating (or source)
>   node to a terminating (or destination) node, via a sequence of
>   intermediate nodes.  A node along the LSP is said to be "upstream"
>   from another node if the former occurs first in the sequence.  The
>   latter node is said to be "downstream" from the former node.  That
>   is, an "upstream" node is closer to the initiating node than a node
>   further "downstream".  Unless otherwise stated, all references to
>   "upstream" and "downstream" are in terms of the control plane message
>   flow.
>
>The terms "master" and "slave" are introduced to describe the trigger
>points for protection activity and are defined clearly in section 2.3
>of RFC4426.
>   Consider two adjacent nodes, A and B.  Under 1:1 protection, a
>   dedicated link j between A and B is pre-assigned to protect working
>   link i.  Link j may be carrying (pre-emptable) Extra Traffic.  A
>   failure affecting link i results in the corresponding LSP(s) being
>   restored to link j.  Extra Traffic being routed over link j may need
>   to be pre-empted to accommodate the LSPs that have to be restored.
>
>   Once a fault is isolated/localized, the affected LSP(s) must be moved
>   to the protection link.  The process of moving an LSP from a failed
>   (working) link to a protection link must be initiated by one of the
>   nodes, A or B.  This node is referred to as the "master".  The other
>   node is called the "slave".  The determination of the master and the
>   slave may be based on configured information or protocol specific
>   requirements.
>
>Thus, the "master" is responsible for initiating the switchover, and the
>slave is responsible for keeping up with the state changes.
>
>>1> Further, it would be helpful to understand why the actions are
>>1> performed by source and destination nodes rather than master and
>>1> slave nodes. It may be appropriate to reuse the master/slave roles
>>1> in the reversion process just as is done in the switchover process.
>
>>2> We are not clear on the rationale for when control
>>2> plane roles are based on master/slave vs. source/destination:
>>2> it appears that local span actions are controlled using
>>2> master/slave while remote actions are controlled using
>>2> source/destination, however the reasoning for control of
>>2> reversion is less clear to us.  Any clarification of the
>>2> rationale for using master/slave vs. source/destination
>>2> control would be appreciated.
>
>As explained by the definitions of the terms, there is a distinction
>between the node that invokes a switchover process (the master) and a
>node that performs the process. For example, a Bridge and Switch Request
>message is sent by the source node after it has bridged traffic back 
>to both working and protection links simply because the source node 
>has performed the bridging and is the only node that can know this fact.
>
>In other words, whether the source is master or slave depends on the
>protection scheme in use and the nature of the operation. It should be
>a simple matter when considering a protection scheme and the necessary
>protocol exchanges and switchover actions to determine which of the
>source and destination must play the master or slave role.
>
>>1> In addition, RFC 4426 does not include an abstract message similar
>>1> to the Failure Indication Message to request the beginning of the
>>1> reversion procedure. It may be beneficial to include a message from
>>1> the slave device to initiate reversion, just as there is a Failure
>>1> Indication Message to initiate switchover. (RFC 4426 states that the
>>1> Failure Indication Message may not be needed when the transport 
>>1> plane technology itself provides such a notification. The same may
>>1> apply when a failure is cleared; however, there should still be an
>>1> optional message to trigger the reversion process.)
>
>>2> 6) We believe that it may be useful in some cases of reversion to
>>2> allow a "slave" device to request reversion using an abstract 2> 
>>message similar to the Failure Indication Message.  An example
>>2> case is (again) when the "master" and "slave" devices are in
>>2> different management domains, such that reversion is initiated from
>>2> the management domain of the "slave" device.  We request CCAMP 
>>2> comment on this suggestion.
>
>Reversion is described as an administrative procedure in RFC4426 and
>RFC4427 quite deliberately. In our view it should not be a rapid 
>response to a specific situation triggered through the control plane
>by the 'master', but should be a considered operation under the control
>of administrative policy. The trigger is, therefore, outside the 
>scope of the control plane. This discussion can be seen in section 
>4.13 of RFC4427.
>
>We believe that your suggestion does not change this view, but that you
>are proposing that the control plane be used as a transport for a 
>management plane request. You are suggesting that a management station
>in the management domain that contains the slave sends the request 
>to the slave, the slave would then deliver the request through the control
>plane to the master. In the absence of any specific control plane
>requirement for this message, we believe that the correct architectural
>approach is for management plane messages to be delivered in the 
>management plane. Thus, if there is a need for management plane 
>coordination between separate management plane domains, this should be
>arranged through an appropriate management plane peering point where 
>the correct policies can be applied.
>
>
>We hope this answers your questions, and we would be happy to enter into
>further dialog on these topics.
>
>In conclusion, it may be helpful to the OIF to know the status of two
>CCAMP drafts related to recovery.
>draft-ietf-ccamp-gmpls-recovery-e2e-signaling-03 and
>draft-ietf-ccamp-gmpls-segment-recovery-02 both completed CCAMP
>working group last call in early 2005. Since then they have been
>implemented and tested. The drafts are stable and complete, and are
>queued in the IETF process waiting to become RFCs
>
>
>
>Best regards,
>Adrian Farrel and Deborah Brungard
>CCAMP co-chairs
>
>
>
>
>
>
>
>
>--
>Internal Virus Database is out-of-date.
>Checked by AVG Anti-Virus.
>Version: 7.1.394 / Virus Database: 268.7.0/345 - Release Date: 5/22/2006

Updated draft response to OIF on 1:n protection Adrian Farrel
Re: Updated draft response to OIF on 1:n protecti… Lou Berger
Re: Updated draft response to OIF on 1:n protecti… Adrian Farrel