Re: Updated draft response to OIF on 1:n protection

"Adrian Farrel" <adrian@olddog.co.uk> Mon, 12 June 2006 17:04 UTC

Message-ID: <00d601c68e41$80532970$0a23fea9@your029b8cecfe>
Reply-To: Adrian Farrel <adrian@olddog.co.uk>
From: Adrian Farrel <adrian@olddog.co.uk>
To: Lou Berger <lberger@labn.net>
Cc: ccamp@ops.ietf.org
References: <005b01c68b3b$191d9be0$c2849ed9@your029b8cecfe> <7.0.1.0.2.20060612081409.06ee4e28@labn.net>
Subject: Re: Updated draft response to OIF on 1:n protection
Date: Mon, 12 Jun 2006 17:57:37 +0100
Organization: Old Dog Consulting
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="response"
Content-Transfer-Encoding: 7bit
Sender: owner-ccamp@ops.ietf.org
Precedence: bulk

Thanks Lou,

Yes, you are right.
I will fold that into the reply.

Adrian
----- Original Message ----- 
From: "Lou Berger" <lberger@labn.net>
To: "Adrian Farrel" <adrian@olddog.co.uk>
Cc: <ccamp@ops.ietf.org>
Sent: Monday, June 12, 2006 1:42 PM
Subject: Re: Updated draft response to OIF on 1:n protection


> Adrian,
>         I like what you have so far, but there is one additional item to 
> add:
>
> Bulk notifications *are* supported in the recovery drafts using the Notify 
> message.  Notify messages could be used to provide a "single signaling 
> interaction" and "a bulk notification ... procedure".  Protection/recovery 
> could then be provided on a per LSP basis (per segment protection) or use 
> the hierarchy approach you mentioned below.  (Bulk) reversion is also 
> supported via Notify messages, see section 12 of 
> draft-ietf-ccamp-gmpls-recovery-e2e-signaling.  Note that these procedures 
> also apply to segment recovery.
>
> Lou
>
> At 03:43 PM 6/8/2006, Adrian Farrel wrote:
>
>>Hi,
>>
>>Looking at the latest input from the OIF, I think we have to perform
>>some type of meld on the two incoming communications and respond
>>to them together.
>>
>>Here is my attempt building on what we had before. Please comment on or
>>off the list.
>>
>>Thanks,
>>
>>Adrian
>>
>>==
>>
>>Dear Jim,
>>
>>Thank you for your communication to CCAMP on the use of GMPLS to provide
>>1:n protection at the OIF UNI and the OIF E-NNI dated 20th May 2006 and
>>for your updates received on 2nd June 2006.
>>
>>We are grateful for this opportunity to comment, but we note that this
>>type of communication requesting clarifications is better suited to a 
>>mailing list discussion than to official communications that, by their
>>nature, have a slow turn-around. This opinion is considerably reinforced 
>>by the process we have gone through here with a revision to
>>the OIF communication being generated while CCAMP is trying to draft its 
>>response. It seems to us that if official lines of communication are to be 
>>followed then they have to be adhered to, but if iterative
>>discussions are needed (as has proved to be the case here) then it would
>>be possible to respond far more dynamically using mailing lists.
>>
>>The appropriate place for discussions of GMPLS protocols is the CCAMP
>>working group mailing list. Details of how to subscribe to the mailing
>>list can be found at
>>http://www.ietf.org/html.charters/ccamp-charter.html
>>
>>Anyway, the CCAMP chairs are keen to ensure smooth communications with
>>the OIF and have consulted as widely as they could in the short time in
>>order to update the response that we had already drafted to your original 
>>enquiries.
>>
>>We hope that our answers are satisfactory.
>>
>>In the remainder of our response we have quoted extracts from your two
>>communications as:
>>
>>>1> For a quote from the first communication dated 20th May 2006
>>
>>>2> For a quote from your second communication dated 2nd June 2006
>>
>>>1> Future updates to OIF UNI and E-NNI signaling may include a feature
>>>1> for 1:N connection protection. The attached document presents
>>>1> requirements for these features. Recently a review was completed
>>>1> of RFCs 4426, 4427 and 4428 and IETF drafts that may be able to
>>>1> implement this function (including draft-ietf-ccamp-gmpls-recovery-
>>>1> e2e-signaling-03 and draft-ietf-ccamp-gmpls-segment-recovery-02).
>>>1> It appears that the abstract messages from RFC 4426 provide much
>>>1> of this functionality, however several questions resulted from this
>>>1> review. OIF would appreciate review and comments from IETF
>>>1> CCAMP on the following items.
>>>1>
>>>1> 1.) OIF would appreciate knowing if there are protocol features in
>>>1> other IETF documents relevant to 1:N protection .
>>
>>We would like to suggest that, in order to utilize advanced features of
>>the GMPLS control plane protocol, engineers should be familiar with the
>>full set of GMPLS RFCs and Internet-Drafts. These are listed on the
>>CCAMP charter page and can be downloaded free of charge by clicking on
>>the links.
>>
>>Although not all of this work is directly related to protection and
>>restoration, it should be noted that any protocol aspect present for a
>>working path may also be required for a protection path. Protocol
>>engineers must, therefore, be familiar with the details of the protocol
>>before attempting to provide advanced functions like protection.
>>
>>>2> 1) OIF would appreciate CCAMP's guidance as to whether CCAMP has
>>>2> defined standards for any similar form of restoration, i.e., one
>>>2> that protects a group of LSPs at once over a local span, by
>>>2> shifting these LSPs from their original link within the span over
>>>2> to a backup link.  It should be noted that
>>>2> - the backup link may be a different type than the original
>>>2> (e.g., OC192 rather than OC48), so that GMPLS signaling rather
>>>2> than underlying SONET/SDH link protection is used to perform
>>>2> the switchover; and
>>>2>
>>>2> - it is intended that the affected LSPs be shifted using a
>>>2> single signaling interaction rather than separate interactions
>>>2> per individual LSP in order to reduce the signaling overhead
>>>2> required.
>>>2>
>>>2> We believe that some of the existing work, especially for segment
>>>2> recovery, may be helpful, but may not meet the exact requirements
>>>2> of the service that has been proposed within OIF. Any pointers to
>>>2> existing drafts or RFCs, however, would be greatly appreciated.
>>
>>There are two principal ways in which the objectives you cite can be
>>met, and both of these techniques are, to our certain knowledge, 
>>implemented and successfully deployed.
>>
>>a) Link-level protection. This technique relies on the protection
>>   of the underlying link outside the scope of GMPLS. Thus, the
>>   TE link over which one or more LSPs are provisioned is actually
>>   supported by more than one underlying link. When one link fails,
>>   the traffic that it was carrying (the LSPs) is transferred to
>>   another link. This type of protection is transparent to GMPLS
>>   although it could leverage GMPLS fault notification procedures.
>>
>>   You can learn some more about link-level protection by reading
>>   RFC3945 and RFC4426 (where it is referred to as Span Protection).
>>
>>   Please note that the links used in this mode do not need to be
>>   of the same type. This is not link bundling.
>>
>>b) LSP Hierarchies. This technique relies on nesting multiple LSPs
>>   within another LSP. Most familiar in packet technologies, this
>>   process is also applicable to non-packet technologies where
>>   appropriate adaptation is available.
>>
>>   By nesting multiple LSPs within another LSP, it is possible to
>>   reroute them all simply by rerouting the nesting LSP. Thus any
>>   protection scheme that can be applied to the nesting LSP can be
>>   applied to the nested LSPs in a single stage. Such procedures
>>   are, therefore, fully available for GMPLS control.
>>
>>   You can read more about LSP hierarchies in RFC4206.
>>
>>Excellent though the procedures documented in
>>draft-ietf-ccamp-gmpls-segment-recovery are, we are unsure as to the 
>>"exact requirements of the service that has been proposed
>>within OIF" and so cannot be sure which procedures to advise for
>>the problem as you have described it.
>>
>>>2> 2) Reviewing some of the existing RFC text, we note that RFC 4426
>>>2> section 2.5.2 states "it MAY be possible for the LSPs on the working
>>>2> link to be mapped to the protection link without re-signaling each
>>>2> individual LSP" and "it MAY be possible to change the component
>>>2> links without needing to re-signal each individual LSP".
>>>2> This text appears to refer to the use of SONET/SDH link protection
>>>2> in such a way that the labels for each LSP remain the same. Does
>>>2> this imply, however, that an action that changes the local
>>>2> labels for the affected LSPs then requires re-signaling of each
>>>2> individual LSP, or is there a "bulk" mechanism to change labels
>>>2> for a group of LSPs simultaneously?
>>
>>Your question is confusing in the light of the referenced section. The
>>section describes the messages required to achieve span protection.
>>Clearly, if a span is protected, then all LSPs carried over that span
>>may be transparently protected. This is how normal link protection 
>>operates and there is nothing clever going on.
>>
>>Obviously (hopefully this is obvious) if you change the label in use on
>>a link for a particular LSP then the NEs at each end of the link need to 
>>know that information since both the sender and the receiver need to
>>use the correct label. This applies for each LSP whose label you change.
>>The accepted mechanism in the control plane for exchanging labels is the
>>signaling protocol, so it follows that, if you wish to change the label
>>in use for an LSP on a link, you must engage signaling.
>>
>>You should observe that an NE may change the label in use on a link at
>>any time using the RSVP-TE protocol. All that is required (assuming a
>>unidirectional LSP) is a trigger Resv message carrying a new label.
>>Considerations of the impact to user traffic are left as an exercise for
>>the reader.
>>
>>It is unclear how the "bulk" mechanism you propose could operate unless
>>it was well-known that all labels are going to change in the same way. So 
>>perhaps you are suggesting that a single signaling message might itemise 
>>all of the LSPs and show each new label. If this is really a significant 
>>issue (i.e., you feel it is absolutely imperative to reduce
>>the number of signaling messages) then you should consider RSVP message
>>bundling.
>>
>>>2> 3) RFC 4426 describes the sending of the Failure Indication
>>>2> Message upon detection of failure by a slave device.  It is
>>>2> our belief that the same mechanism could also be used when
>>>2> the slave device is triggered to send an indication due to
>>>2> management system intervention (cases are mentioned in RFC
>>>2> 4427 but not in 4426), and we would like to know if CCAMP
>>>2> concurs with this.
>>>2> An example of where this might occur is where the master
>>>2> and slave devices are in different management domains.
>>
>>As you correctly observe, RFC4427 section 4.13 describes exactly this
>>case where management plane intervention causes a Failure Indication,
>>and it is useful for forced or controlled switch-over.
>>
>>You should note that RFC4426 section 2.5.1 says of the Failure
>>Indication message...
>>   This message is sent from the slave to the master to indicate the
>>   identities of one or more failed working links.  This message MAY not
>>   be necessary when the transport plane technology itself provides for
>>   such a notification.
>>
>>It could also be the case that the message MAY not be necessary in the
>>case where the failure indication is conveyed to the master node by the
>>management plane. That is to say, there is no specific requirement (in
>>the case of management plane intervention) for the intervention to be to 
>>the slave and causing a Failure Indication message to be sent to the
>>master - the management plane intervention could consist of a notification 
>>sent to both the slave and the master from the management
>>plane.
>>
>>The absence of this discussion within the GMPLS RFCs owes much to the
>>fact that they are largely control plane specifications with some notes
>>about the management plane for additional helpfulness.
>>
>>Your final example about the use of this technique where the master and
>>slave are in different management domains is interesting, but the use of
>>a control plane means that you should consider the control plane domains, 
>>not just the management domains.
>>
>>>1> 4.) A goal of the 1:N protection is to use a bulk notification and
>>>1> recovery procedure, based on RFC 4427 section 4.15. However, that
>>>1> RFC states the corresponding recovery switching actions are
>>>1> performed at the LSP level. It would be useful to know if bulk
>>>1> processing could be applied to recovery of individual connection
>>>1> segments on the failed span, not entire LSPs.
>>
>>>2> 4) RFC 4427, section 4.15 discusses bulk recovery for a failed span,
>>>2> and suggests that the recovery switching message to recovered LSP
>>>2> ratio may be 1 or greater.  OIF would like to know if it is possible
>>>2> to define procedures such that the ratio is much less than 1, 2> i.e., 
>>>a message that causes bulk recovery actions on a number of
>>>2> LSPs.
>>
>>We believe that you have missed the point of section 4.15 of RFC4427.
>>This section is describing the case where all or only some of the LSPs
>>carried on a span are protected by a single recovery message exchange 
>>(full or partial span protection). In the case of partial span
>>protection it is possible that not all LSPs on the span will be
>>protected. Thus, the discussion of message to LSP ratios refers to the
>>number of recovery messages needed to protect the LSPs on a span.
>>
>>The expression of the ratios is probably unclear, but the subsequent
>>text explains the situation.
>>
>>Let us assume that there are S LSPs on the span, and s LSPs protected by
>>a protection message. Consider the ratio S/s.
>>
>>If S/s = 1, one message has been used to protect all LSPs on the span.
>>(Full recovery)
>>
>>If S/s > 1, more than one message is used to protect all of the LSPs on
>>the span OR not all LSPs on the span are protected. (Partial recovery)
>>
>>Clearly a ratio of less than one would be particularly odd !
>>
>>It should be obvious from wider reading of the RFCs (4436, 4427, and
>>3473) that the whole point of the Failure Indication is to be able to
>>report on more than one LSP failure at a time.
>>
>>>2> 5) RFC 4426 defines a "master" and "slave" role for dedicated 1+1
>>>2> and 1:1 span protection and a "source" and "destination" role for
>>>2> control of end-to-end restoration and for reversion.  We believe
>>>2> that "source" and "destination" mean the initiator and receiver
>>>2> of the LSP (as opposed to the source and destination of data
>>>2> in-band).
>>
>>The terms "source" and "destination" are standard.
>>
>>For unidirectional LSPs, the "source" is the source of data on the LSP,
>>also known as he ingress. Where a control plane is used, signaling 
>>progresses from the source (also known as the head-end).
>>
>>Similarly, the "destination" is the destination of data on the LSP, also
>>known as the egress. Where a control plane is used, signaling progresses
>>from the source to the destination (also known as the tail-end).
>>
>>By common convention, for bidirectional LSPs set up by the control
>>plane, the "source" remains the signaling source (ingress) and the
>>"destination" is the signaling destination (egress). Traffic flowing
>>in the reverse direction is referred to as reverse direction traffic
>>and flows from destination to source.
>>
>>Very probably there is an ITU-T architectural term for these end points
>>of LSPs.
>>
>>Note that RFC 4426 is very careful to state:
>>   The end-to-end recovery models discussed in this
>>   document apply to segment protection where the source and destination
>>   refer to the protected segment rather than the entire LSP.
>>
>>Should this still be unclear to you, RFC4426 section 1 states
>>   Consider the control plane message flow during the establishment of
>>   an LSP.  This message flow proceeds from an initiating (or source)
>>   node to a terminating (or destination) node, via a sequence of
>>   intermediate nodes.  A node along the LSP is said to be "upstream"
>>   from another node if the former occurs first in the sequence.  The
>>   latter node is said to be "downstream" from the former node.  That
>>   is, an "upstream" node is closer to the initiating node than a node
>>   further "downstream".  Unless otherwise stated, all references to
>>   "upstream" and "downstream" are in terms of the control plane message
>>   flow.
>>
>>The terms "master" and "slave" are introduced to describe the trigger
>>points for protection activity and are defined clearly in section 2.3
>>of RFC4426.
>>   Consider two adjacent nodes, A and B.  Under 1:1 protection, a
>>   dedicated link j between A and B is pre-assigned to protect working
>>   link i.  Link j may be carrying (pre-emptable) Extra Traffic.  A
>>   failure affecting link i results in the corresponding LSP(s) being
>>   restored to link j.  Extra Traffic being routed over link j may need
>>   to be pre-empted to accommodate the LSPs that have to be restored.
>>
>>   Once a fault is isolated/localized, the affected LSP(s) must be moved
>>   to the protection link.  The process of moving an LSP from a failed
>>   (working) link to a protection link must be initiated by one of the
>>   nodes, A or B.  This node is referred to as the "master".  The other
>>   node is called the "slave".  The determination of the master and the
>>   slave may be based on configured information or protocol specific
>>   requirements.
>>
>>Thus, the "master" is responsible for initiating the switchover, and the
>>slave is responsible for keeping up with the state changes.
>>
>>>1> Further, it would be helpful to understand why the actions are
>>>1> performed by source and destination nodes rather than master and
>>>1> slave nodes. It may be appropriate to reuse the master/slave roles
>>>1> in the reversion process just as is done in the switchover process.
>>
>>>2> We are not clear on the rationale for when control
>>>2> plane roles are based on master/slave vs. source/destination:
>>>2> it appears that local span actions are controlled using
>>>2> master/slave while remote actions are controlled using
>>>2> source/destination, however the reasoning for control of
>>>2> reversion is less clear to us.  Any clarification of the
>>>2> rationale for using master/slave vs. source/destination
>>>2> control would be appreciated.
>>
>>As explained by the definitions of the terms, there is a distinction
>>between the node that invokes a switchover process (the master) and a
>>node that performs the process. For example, a Bridge and Switch Request
>>message is sent by the source node after it has bridged traffic back to 
>>both working and protection links simply because the source node has 
>>performed the bridging and is the only node that can know this fact.
>>
>>In other words, whether the source is master or slave depends on the
>>protection scheme in use and the nature of the operation. It should be
>>a simple matter when considering a protection scheme and the necessary
>>protocol exchanges and switchover actions to determine which of the
>>source and destination must play the master or slave role.
>>
>>>1> In addition, RFC 4426 does not include an abstract message similar
>>>1> to the Failure Indication Message to request the beginning of the
>>>1> reversion procedure. It may be beneficial to include a message from
>>>1> the slave device to initiate reversion, just as there is a Failure
>>>1> Indication Message to initiate switchover. (RFC 4426 states that the
>>>1> Failure Indication Message may not be needed when the transport 1> 
>>>plane technology itself provides such a notification. The same may
>>>1> apply when a failure is cleared; however, there should still be an
>>>1> optional message to trigger the reversion process.)
>>
>>>2> 6) We believe that it may be useful in some cases of reversion to
>>>2> allow a "slave" device to request reversion using an abstract 2> 
>>>message similar to the Failure Indication Message.  An example
>>>2> case is (again) when the "master" and "slave" devices are in
>>>2> different management domains, such that reversion is initiated from
>>>2> the management domain of the "slave" device.  We request CCAMP 2> 
>>>comment on this suggestion.
>>
>>Reversion is described as an administrative procedure in RFC4426 and
>>RFC4427 quite deliberately. In our view it should not be a rapid response 
>>to a specific situation triggered through the control plane
>>by the 'master', but should be a considered operation under the control
>>of administrative policy. The trigger is, therefore, outside the scope of 
>>the control plane. This discussion can be seen in section 4.13 of RFC4427.
>>
>>We believe that your suggestion does not change this view, but that you
>>are proposing that the control plane be used as a transport for a 
>>management plane request. You are suggesting that a management station
>>in the management domain that contains the slave sends the request to the 
>>slave, the slave would then deliver the request through the control
>>plane to the master. In the absence of any specific control plane
>>requirement for this message, we believe that the correct architectural
>>approach is for management plane messages to be delivered in the 
>>management plane. Thus, if there is a need for management plane 
>>coordination between separate management plane domains, this should be
>>arranged through an appropriate management plane peering point where the 
>>correct policies can be applied.
>>
>>
>>We hope this answers your questions, and we would be happy to enter into
>>further dialog on these topics.
>>
>>In conclusion, it may be helpful to the OIF to know the status of two
>>CCAMP drafts related to recovery.
>>draft-ietf-ccamp-gmpls-recovery-e2e-signaling-03 and
>>draft-ietf-ccamp-gmpls-segment-recovery-02 both completed CCAMP
>>working group last call in early 2005. Since then they have been
>>implemented and tested. The drafts are stable and complete, and are
>>queued in the IETF process waiting to become RFCs
>>
>>
>>
>>Best regards,
>>Adrian Farrel and Deborah Brungard
>>CCAMP co-chairs
>>
>>
>>
>>
>>
>>
>>
>>
>>--
>>Internal Virus Database is out-of-date.
>>Checked by AVG Anti-Virus.
>>Version: 7.1.394 / Virus Database: 268.7.0/345 - Release Date: 5/22/2006
>
>
>
>
>

Updated draft response to OIF on 1:n protection Adrian Farrel
Re: Updated draft response to OIF on 1:n protecti… Lou Berger
Re: Updated draft response to OIF on 1:n protecti… Adrian Farrel