[mpls] AD review : working group last call draft-ietf-mpls-tp-psc-itu-01

"Adrian Farrel" <adrian@olddog.co.uk> Tue, 21 January 2014 20:49 UTC

From: Adrian Farrel <adrian@olddog.co.uk>
To: 'Loa Andersson' <loa@pi.nu>, mpls@ietf.org
Date: Tue, 21 Jan 2014 20:49:03 -0000
Message-ID: <0bc301cf16ea$3981cd00$ac856700$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Thread-Index: Ac8W6jHMZ8uZf2LMS56KIUdobofaHQ==
Content-Language: en-gb
Cc: mpls-ads@tools.ietf.org, mpls-chairs@tools.ietf.org, draft-ietf-mpls-tp-psc-itu@tools.ietf.org
Subject: [mpls] AD review : working group last call draft-ietf-mpls-tp-psc-itu-01
Precedence: list
Reply-To: adrian@olddog.co.uk

Hi,

Congratulations on sailing a very difficult course between technical and
political. You have produced a readable and coherent document.

I have conducted my usual AD review of this document early (i.e., during
working group last call) on request of the working group chairs. The
purpose of my review is to catch and clean up any issues that might
otherwise be found during IETF last call and IESG evaluation. Thus, the
intention is to produce a higher quality document and ensure smoother
passage through those later stages. As should be obvious, by reviewing 
the document before it has seen working group last call and been updated
I may have found more issues and concerns than I would have done had I
reviewed it later.

Nevertheless, I find the document to be basically sound. There are quite
a lot of comments below, but they are almost exclusively editorial in 
nature. The editorial comments fall into three categories:

- pure nits (should be entirely uncontentious)
- issues of clarity and readability (hopefully these are also easy to
  accept, but please check that my "cleaning up" of your text has 
  preserved the meaning you intended)
- issues of spin/politics (sometimes saying the same thing in slightly
  different words can make everyone happy)

I think I found only two technical issues - in Section 9.1.1 and
9.1.3.3.

In all cases I have tried to suggest alternative wording so that you 
don't have to guess what I mean. But please (please, please) do not 
feel that I am insisting on any of these changes or on my precise words:
I am just trying to polish and move this document along; everything is
open for discussion.

One last point: if some enthusiastic native speaker was to review and
edit the final revision (perhaps in return for beer or an 
acknowledgement) they would be doing us all a great favor. I have not
pointed out every minor issue of language in my review.

Thanks for the work.

Adrian

===

The Abstract and the Introduction say:

   Two modes are defined in
   this document: Protection State Coordination (PSC) mode and Automatic
   Protection Switching (APS) mode.

   This document describes the behavior of the PSC protocol including
   priority logic and state machine when all the capabilities associated
   with the APS mode are enabled.

This leaves the question: where is the protocol including priority logic 
and state machine defined for the PSC mode? I hope the answer is "in 
RFC 6378, in which case you can easily add a node such as:

  The PSC protocol behavior for the PSC mode is as defined in RFC 6378.

---

The last paragraph of the Introduction reads

   This document updates RFC 6378 in that the capability advertisement
   method defined here is an addition to that document.  For an existing
   implementation of RFC 6378, it is recommended to be updated with the
   bug-fixes in [I-D.ietf-mpls-psc-updates] and the capability
   advertisement in this document.

I suggest replacing this with the text below. There are two reasons for
my suggested changes:

1. It looks very odd for this document to recommend using changes in 
   another document. The result of such a statement is likely to be a
   requirement that you bundle the two documents together. I think that
   [I-D.ietf-mpls-psc-updates] can stand on its own.

2. While you can recommend that the installed base is updated, you 
   cannot force it to happen. It is, therefore, useful to draw attention
   to the important backward compatibility text you have in Section 9.3.

So my suggested text is:

   This document updates RFC 6378 by adding a capability advertisement
   mechanism. It is recommended that existing implementations of RFC
   6378 should be updated to support this capability, however the issue 
   of backward compatibility with existing implementations is described
   in Section 9.3.

---

Section 4

   In this document, the priorities of FS and SF-P are swapped and the
   priority of Clear SF (SFc) is raised.  In addition to the priority
   modification, this document introduces the use of Freeze command in
   Appendix C.  The reasons for these changes are explained in the
   following sub-sections from technical and network operational
   aspects.

The issue I have with this text is that a swap has to be relative to 
something. So we should make it clear what is in 6378 and then describe
what this document does...

   [RFC6378] defines the priority of FS to be higher than that of SF-P.
   That document also defines the priority of Clear SF (SFc) to be low.
   This document the defines the Priority Modification capability
   whereby the priorities of FS and SF-P are swapped and the priority of
   Clear SF (SFc) is raised.  In addition, this capability introduces 
   the use of Freeze command as described in Appendix C.  The reasons 
   for these changes are explained in the following sub-sections from 
   technical and network operational aspects.

---

Section 4.1

No technical change, just ease of reading.

OLD
   Setting the priority of any input that is supposed to be signaled to
   the other end to be higher than that of SF-P can result in
   unpredictable protection switching state, when the protection path
   has failed and consequently the PSC communication stopped.
NEW
   When the protection path fails PSC communication may stop as a 
   result. In this case, if any input that is supposed to be signaled to
   the other end has a higher priority that SF-P then this can result in
   unpredictable protection switching state.
END

---

Section 4.1

   According to Section 2.4 of RFC 5654 [RFC5654] it MUST be possible to
   operate an MPLS-TP network without using a control plane.  This means
   that external switch commands, e.g., FS, can be transferred to the
   remote Label Edge Router (LER) only by using the PSC communication
   channel and should not rely on the presence of a control plane.

This paragraph has several issues.

1. It is not true! Not using a control plane leaves the option of PSC as
   you say, and also leaves the option of the management plane. Indeed,
   the PSC communication channel is probably a special case of the in-
   band OAM channel.

2. This statement does not appear to have anything to do with swapping 
   the priorities of FS and SF-P.

I suggest that if you can show how this statement is related to the swap
of priorities you should add it. In that case you should also say...
   This means
   that external switch commands, e.g., FS, can be transferred to the
   remote Label Edge Router (LER) only by using the management plane or
   the in-band OAM channel and should not rely on the presence of a 
   control plane.  The use of the OAM channel as used by PSC messages is
   considered more appropriate for coherence with other PSC messages.
If, on the other hand, you can't show how this statement is relevant for
the swapping of the priorities, I suggest removing the paragraph.

---

Section 4.1

I think this one is mainly political :-)

OLD
   As the priority of SF-P has been higher than FS in other transport
   networks, such as SDH, OTN and Ethernet transport networks, for
   network operators it is important that the MPLS-TP protection
   switching preserves the network operation behavior to which network
   operators have become accustomed.
NEW
   In other transport networks (such as SDH, OTN, and Ethernet transport
   networks) the priority of SF-P is been higher than FS. It is 
   therefore important to offer network operators the option of having 
   the same behavior in their MPLS-TP network so that they can have the
   same operational protection switching behavior to which they have
   become accustomed.
END

---

Section 4.3

s/broken, the Freeze command,/broken. The Freeze command,/

---

Section 4.4

As you have already established earlier in this document, the
modifications to 6378 are the addition of the capabilities 
advertisement. So I think the language used in this section is too
strong. I suggest...

OLD
4.4.  Modifications to RFC 6378

   The list of local requests in order of priority SHALL be modified as
   follows:

      (from higher to lower)

   o  Clear Signal Fail

   o  Signal Fail on Protection path

   o  Forced Switch

   o  Signal Fail on Working path

   The change of the PSC Control logic including the state machine due
   to this priority modification is incorporated in the PSC Control
   logic description in Section 10 and Section 11 when all the
   capabilities are enabled.
NEW
4.4.  Procedures in Support of Capability 1

   When this capability is in use the list of local requests in order of
   priority SHALL be as follows:

      (from highest to lowest)

   o  Clear Signal Fail

   o  Signal Fail on Protection path

   o  Forced Switch

   o  Signal Fail on Working path

   This requires different PSC control logic (including the state 
   machine) compared to that shown in [RFC6378]. Sections 10 and 11
   show the PSC control logic and state machine when all of the 
   capabilities in APS mode are enabled.
END

---

Section 5

Similar changes to those proposed for Sections 4.1 and 4.4.

OLD
   However, PSC protocol defined in RFC 6378 [RFC6378] supports this
   operation only when recovering from a defect condition, but does not
   operate as non-revertive when an operator's switch-over command such
   as FS or Manual Switch (MS) is cleared.  To be aligned with legacy
   transport network behavior and RFC 4427, a node should go into the
   Do-not-Revert (DNR) state not only when a failure condition on the
   working path is cleared but also when an operator command requesting
   switch-over is cleared.

   The change of the PSC Control logic including the state machine due
   to the modification of non-revertive operation is incorporated into
   the PSC Control logic description in Section 10 and Section 11 when
   all the capabilities are enabled.
NEW
   However, the PSC protocol defined in RFC 6378 [RFC6378] supports this
   operation only when recovering from a defect condition: it does not
   support the non-revertive function when an operator's switch-over 
   command, such as FS or Manual Switch (MS), is cleared.  To be aligned
   with the behaviour in other transport networks and to be consistent 
   with RFC 4427, a node should go into the Do-not-Revert (DNR) state 
   not only when a failure condition on the working path is cleared, but
   also when an operator command that requested switch-over is cleared.

   This requires different PSC control logic (including the state 
   machine) compared to that shown in [RFC6378]. Sections 10 and 11
   show the PSC control logic and state machine when all of the 
   capabilities in APS mode are enabled.
END

---

Section 6.1

OLD
Changing the non-revertive operation
NEW
Changing the non-revertive operation as described in Section 5
END

---

Section 6.1

OLD
   Manual Switch-over for recovery LSP/span command, defined in RFC 4427
   [RFC4427] and also defined in RFC 5654 [RFC5654], Requirement 83, as
   one of the mandatory external commands, should be used for this
   purpose, but is not included in RFC 6378.  Note that the "Manual
   Switch-over for recovery LSP/span" command is the same as MS-W
   command.
NEW
   Manual Switch-over for recovery LSP/span command is defined in RFC
   4427 [RFC4427]. Requirement 83 in RFC 5654 [RFC5654] states that the
   external commands defined in RFC 4427 must be supported.  No such
   command is supported in PSC as defined in RFC 6378 so there is a need
   to provide support for that feature.  Note that the "Manual Switch-
   over for recovery LSP/span" command is the same as the MS-W command.
END

---

In Section 6.2, when you say "replaced" it implies that this is making a
specific update to 6378. But I don't think you need to do that (and 
possibly that wasn't the intention. I think it is enough to define the
terms you use in this document. So I suggest...

OLD
   The term "Manual Switch" and its acronym "MS" used in RFC 6378 are
   replaced respectively by "Manual Switch to Protection path" and
   "MS-P" by this document to avoid confusion with "Manual Switch to
   Working path" and its acronym "MS-W".

   Also, the term "Protecting administrative state" used in RFC 6378 is
   replaced by "Switching administrative state" by this document to
   include the case where traffic is switched back to the working path
   by administrative MS-W command.
NEW
   RFC 6378 uses the term "Manual Switch" and its acronym "MS".  This 
   document uses the term "Manual Switch to Protection path" and
   "MS-P" to have the same meaning, but avoid confusion with "Manual 
   Switch to Working path" and its acronym "MS-W".

   Similarly, RFC 6378 uses the term "Protecting administrative state",
   and this document uses "Switching administrative state" to cover the
   same concept but also include the case where traffic is switched back
   to the working path by administrative MS-W command.
END

In keeping with this, it might be better to change the section title to

   6.2.  Terminology to support MS-W

---

Section 6.3

There is a slight discrepancy in the text because it says that the MS-P
and MS-W commands have the same priority, but also gives an example of
when they don't have the same priority.

I think all the relevant material is present, so it is just a matter of
re-ordering it....

OLD
   The MS-P and MS-W commands SHALL have the same priority.  If one of
   these commands is already issued and accepted, then the other command
   that is issued afterwards SHALL be ignored.  If two LERs are
   requesting opposite operations simultaneously, i.e. one LER is
   sending MS-P while the other LER is sending MS-W, the MS-W SHALL be
   considered to have a higher priority than MS-P, and MS-P SHALL be
   ignored and cancelled.
NEW
   If one of the MS-P and MS-W commands is received and processed after
   the other, the two commands SHALL have the same priority such that if
   one of the commands is already issued and accepted, the command that
   is issued afterwards SHALL be ignored.  However, if two LERs request
   opposite operations simultaneously (i.e., one LER sends MS-P and the
   other sends MS-W), the MS-W SHALL be considered to have a higher 
   priority than MS-P, and MS-P SHALL NOT be accepted and SHALL be
   cancelled.
END

---

Section 6.4

Just as 4.1, 4.4, and 5...

OLD
   The change of the PSC Control logic including the state machine due
   to the support of MS-W command is incorporated into the PSC Control
   logic description in Section 10 and Section 11 when all the
   capabilities are enabled
NEW
   Support or this function requires changes to the PSC control logic
   (including the state machine) compared to that shown in [RFC6378].
   Sections 10 and 11 show the PSC control logic and state machine when
   all of the capabilities in APS mode are enabled.
END

---

Section 7.1

   The PSC protocol associated with SD is covered in this document, and
   the specifics for the method of identifying SD is out of the scope of
   the protection protocol similar to the facts that how SF is detect
   and how MS and FS commands are initiated in a management system and
   signaled to protection switching are out of its scope.

s/and the specifics/but the specifics/

It is OK to include the "similar to..." but it is not necessary to give
this reasoning.

---

Section 7.2

Just like Section 6.2 the word "replaced" may be misinterpreted.

So I suggest naming the section...
   7.2.  Terminology to support SD

... and replacing

OLD
   Instead of SFc, Clear Signal Fail or Degrade (SFDc) is used to
   indicate the clearance of either a degraded condition or a failure
   condition.
NEW
   In this document the term Clear Signal Fail or Degrade (SFDc) is used
   to indicate the clearance of either a degraded condition or a failure
   condition.
END

---

Section 7.3

Again, just a small political change...

OLD
   In order to maintain the network operation behavior to which
   transport network operators have become accustomed, the priorities of
   SD-P and SD-W are defined to be equal as in other transport networks,
   such as SDH, OTN and Ethernet transport networks.
NEW
   In order to make the behavior of MPLS-TP networks consistent with 
   that of other transport networks (such as SDH, OTN and Ethernet
   transport networks), the priorities of SD-P and SD-W are defined to
   be equal.
END

---

In Section 7.4, for clarity, I think you should intend and bullet the
two paragraphs beginning

   When MS-W and MS-P...
and
   When SD-W and SD-P...

---

And, as now is becoming familiar, at the end of 7.4

OLD
   The change of the PSC Control logic including the state machine due
   to the support of protection against SD is incorporated into the PSC
   Control logic description in Section 10 and Section 11 when all the
   capabilities are enabled.
NEW
   The addition of support for protection against SD requires different
   PSC control logic (including the state machine) compared to that
   shown in [RFC6378]. Sections 10 and 11 show the PSC control logic and
   state machine when all of the capabilities in APS mode are enabled.
END

---

Section 8 is unclear in the race logic. You have...

   When Exercise commands are input at both ends, an EXER, instead of
   RR, SHALL be transmitted from both ends.

I think that this means that EXER shall be taken as a valid response to
EXER and that if an LER that has issued an EXER and has not received an
RR then, if it receives an EXER it does not need to (SHOULD NOT) send 
RR.

We could capture this as...

OLD
   When Exercise commands are input at both ends, an EXER, instead of
   RR, SHALL be transmitted from both ends.
NEW
   If Exercise commands are input at both ends, then a race condition
   may arrise.  This is resolved as follows:

   o If an LER has issued EXER and receives EXER before receiving RR, it

      o MUST treat the received EXER as it would an RR.

      o SHOULD NOT respond with RR.
END

---

Section 8

   The following PSC Requests SHALL be added to PSC Request field to
   support Exercise:

We don't need to use RFC 2119 language here. You can just say...

   The following PSC Requests are added to the PSC Request field to
   support the Exercise command (see also Section 14.1):

---

Section 8

   The priority of Exercise SHALL be inserted between the priorities of
   WTR Expires and No Request.

For the avoidance of doubt it is nice to actually give the ordering. And
since we end up with Exercise not being immediately adjacent to No 
Request, I suggest this is best handled by a forward reference to
Section 10.2.

   The relative priority of Exercise is shown in the table in Section
   10.2.

---

Section 9.1.1

We do not design protocols to make them resilient against bugs in 
implementations of the protocol. This is because any tick you come up
with will, itself, be vulnerable to a bug in the implementation.

Thus, when you say...

   PSC sends messages in response to external events and in periodic
   retransmission of current status.  It may be expensive to send and to
   parse an Capabilities TLV attached to a packet intended to trigger a
   protection switch or other real-time behavior.  However, if a node
   does not periodically send its Capabilities TLV, the receiving node
   cannot discriminate a deliberate omission of the Capabilities TLV for
   performance reasons from an accidental omission due to an
   implementation issue.  To guard against this, a node MUST include its
   Capabilities TLV in every PSC message that it sends.

...you are neglecting to consider that each and very omission of a 
capability might be due to an implementation issue. So requiring 
inclusion in every PSC message does not resolve this.

However, you *do* still need to include the Capabilities TLV in every
PSC message (if the implementation supports the Capabilities TLV) if
and only if, an LER is allowed to change the capabilities it supports
during the lifetime of an LSP. The reason for this is that the 
absence of the Capabilities TLV is valid for backward compatibility 
reasons: therefore there is no way to distinguish "I have stopped
supporting all of the capabilities" from "I have left out the
Capabilities TLV because nothing has changed."

(...but see my comment on 9.1.3.2 wrt the final paragraph of 9.3).

So, you must decide: can an LER change the capabilities it supports?
If yes, then you make *that* the reason for requiring the TLV to be
present in every message.
If no, then the TLV does not need to be present in each message, but you
do need to make sure the message that was carrying it got delivered.

It looks to me, from 9.1.2 that you are set on requiring retransmission
and continual checking of Capabilities even though it would be 
impossible to achieve a synchronised change (that is, if one end were to
change its capabilities this would automatically result in an error
conditions).  So it really seems to me that this is unnecessary
processing.

---

Section 9.1.2

This comment only applies if you don't make any change as a result of
my comment on the previous section.

I think you should advise an implementation on whether it should 
compare the Capabilities TLV as described in this section before or
after acting on the received PSC message. That is (for example), should
a protection switch be triggered before or after the Capabilities have
been checked?

---

Section 9.1.3.2

I think the final paragraph of Section 9.3 is very relevant here. You 
should either copy the text here or you should provide a forward pointer
to Section 9.3.


---

Section 9.1.3.3

Several things are not clear in the description of the error handling:
- if a mismatch (9.1.3.2) is received, should the timer be stopped?
- if there is a timeout (9.1.3.1) and then a PSC message is received,
  should the capabilities be compared?
- if a mismatch (9.1.3.2) is received and then a PSC message is 
  received, should the capabilities be compared?
- how should alerts to be the operator be handled in the event of 
  continued mismatches or timeouts?
- what actions are available to an operator to resolve capabilities
  mismatches?

---

Section 9.2.1

OLD
   A node can send a
   Capabilities TLV of 0x0
NEW
   A node can send a
   Capabilities TLV with Flags value set to 0x0
END

---

Similarly in 9.2.3

OLD
   Capabilities TLV of 0x0
NEW
   Capabilities TLV with Flags value set to 0x0
END

---

Section 10.1

You really scared me!

   The values of "Request" field in PSC protocol message, which is shown
   in Figure 2 of RFC 6378 [RFC6378], are redefined as follows:

Fortunately you are not redefining the Request types from RFC 6378. You 
are just defining two new ones. Phew!

So you can replace the whole of Section 10.1 with...

NEW
   This document defines two new values for the "Request" field in the
   PSC protocol message that is shown in Figure 2 of RFC 6378 [RFC6378]
   as follows:

      (3) Exercise
      (2) Reverse Request

   See also Section 14.1 of this document.
END

---

Either fill in or delete Section 15.

---

I think that [I-D.ietf-mpls-psc-updates] can be moved from normative to
informative reference. This doesn't decrease its value, but I don't 
think that *this* document requires [I-D.ietf-mpls-psc-updates].

> -----Original Message-----
> From: mpls [mailto:mpls-bounces@ietf.org] On Behalf Of Loa Andersson
> Sent: 20 January 2014 02:28
> To: mpls@ietf.org
> Cc: <mpls-ads@tools.ietf.org>; mpls-chairs@tools.ietf.org; draft-ietf-mpls-tp-
> psc-itu@tools.ietf.org
> Subject: [mpls] working group last call draft-ietf-mpls-tp-psc-itu-01
> 
> Working Group,
> 
> This is to start a two week working group last call on
> draft-ietf-mpls-tp-psc-itu.

[mpls] AD review : working group last call draft-… Adrian Farrel
Re: [mpls] AD review : working group last call dr… Ryoo, Jeong-dong
Re: [mpls] AD review : working group last call dr… Adrian Farrel
Re: [mpls] AD review : working group last call dr… Ryoo, Jeong-dong
Re: [mpls] AD review : working group last call dr… Adrian Farrel