Re: [mpls] Working Group Last Call for draft-ietf-mpls-tp-shared-ring-protection-02

Eric Gray <eric.gray@ericsson.com> Tue, 16 August 2016 19:01 UTC

Return-Path: <eric.gray@ericsson.com>
X-Original-To: mpls@ietfa.amsl.com
Delivered-To: mpls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 03A7B12D61D; Tue, 16 Aug 2016 12:01:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.202
X-Spam-Level:
X-Spam-Status: No, score=-4.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8T5itxL7HsOw; Tue, 16 Aug 2016 12:01:25 -0700 (PDT)
Received: from usplmg20.ericsson.net (usplmg20.ericsson.net [198.24.6.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E25712D0B2; Tue, 16 Aug 2016 12:01:24 -0700 (PDT)
X-AuditID: c618062d-980fb98000000a08-ff-57b363e3b8b2
Received: from EUSAAHC008.ericsson.se (Unknown_Domain [147.117.188.96]) by (Symantec Mail Security) with SMTP id 43.BD.02568.3E363B75; Tue, 16 Aug 2016 21:05:07 +0200 (CEST)
Received: from EUSAAMB107.ericsson.se ([147.117.188.124]) by EUSAAHC008.ericsson.se ([147.117.188.96]) with mapi id 14.03.0301.000; Tue, 16 Aug 2016 14:58:15 -0400
From: Eric Gray <eric.gray@ericsson.com>
To: "mpls@ietf.org" <mpls@ietf.org>
Thread-Topic: Working Group Last Call for draft-ietf-mpls-tp-shared-ring-protection-02
Thread-Index: AdH3ANzPdQ19JyNsQnuCnI2RKyUKQQAKdw5A
Date: Tue, 16 Aug 2016 18:58:14 +0000
Message-ID: <48E1A67CB9CA044EADFEAB87D814BFF64A886D22@eusaamb107.ericsson.se>
References: <48E1A67CB9CA044EADFEAB87D814BFF64A883507@eusaamb107.ericsson.se>
In-Reply-To: <48E1A67CB9CA044EADFEAB87D814BFF64A883507@eusaamb107.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [147.117.188.10]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrLLMWRmVeSWpSXmKPExsUyuXRPgu7j5M3hBtMvy1rsusBise7yKTaL W0tXsjoweyxZ8pMpgDGKyyYlNSezLLVI3y6BK+NIXztrwcY9jBVXDmxlaWCcNY2xi5GDQ0LA ROLm8YwuRk4OIYENjBJvH/pA2MsZJQ4ergKx2QQ0JI7dWcsIYosIKEscmdjN2sXIxcEssIBR Yufcv+wgc4QFwiXaYWoiJKY8WM4KYRtJdE3fCxZnEVCV2N1zng2knFfAV2LpJWuIVb4SM97O YAKxOQX8JA6eOgRWziggJvH91BqwOLOAuMStJ/PBbAkBAYkle84zQ9iiEi8f/2OFsJUkJi09 xwpRrydxY+oUNghbW2LZwtdg9bwCghInZz5hmcAoOgvJ2FlIWmYhaZmFpGUBI8sqRo7S4oKc 3HQjg02MwEg4JsGmu4Px/nTPQ4wCHIxKPLwLQjaHC7EmlhVX5h5ilOBgVhLhVU4ECvGmJFZW pRblxxeV5qQWH2KU5mBREucVe6QYLiSQnliSmp2aWpBaBJNl4uCUamA0dwmS75D602KldM6Y 8dM0Pif9vczvtdcsEzitHX3w2jcPizspbotql6zwEe7mtmmU3L8ibDc3U9XlbNmu5XKlPKsa cx+yaF+y/6D2Tdkh6fix3h3x0zdmBDt84Pj7/6+c8uldl34af8673HL4wU7nGWlrJ33cZHrI 8/HRvXtXf+bIkWTa8z5aiaU4I9FQi7moOBEA+OOxkYACAAA=
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/aue2sfHgQ26YAFKC3pc_mCCKu7w>
Cc: "draft-ietf-mpls-tp-shared-ring-protection@ietf.org" <draft-ietf-mpls-tp-shared-ring-protection@ietf.org>, "mpls-chairs@ietf.org" <mpls-chairs@ietf.org>
Subject: Re: [mpls] Working Group Last Call for draft-ietf-mpls-tp-shared-ring-protection-02
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2016 19:01:28 -0000

The below is my comments (as document shepherd) for the Working Group Last Call on 
this draft.

Comments:

I support progression of this document after addressing a number of issues.

I have already raised the issue with the (large) number of Authors listed on the first page.

====

As a general question, that affects a number of portions of the draft, is it the intention that 
all ring nodes in a ring operate in the _same_ protection switching mode (wrapping, short
wrapping or steering)?

One reason why this is important is that operating in inconsistent modes may result in 
unpredictable behavior - particularly in multiple failure scenarios.

Another reason why this is important, is that - if consistency is a requirement - then it is 
quite likely to be useful to have some way for ring nodes to verify that this is the case.

And one more reason why this is important is that - if consistency is required, and can be
verified by ring nodes participating in a ring - then RPS currently only seems to be needed
in a limited set of scenarios (to detect a unidirectional link failure in wrapping mode, or 
when using steering mode).  RPS is not otherwise mentioned in connection with modes of 
operation.  Note that this may be in error (it seems likely that RPS may be needed to detect 
a unidirectional link failure in all modes, but the current draft seems to only discuss this in 
connection with the wrapping protection switch mode).

Note that - if RPS is required only to provide support for a unidirectional link failure, there 
are other ways to do this.

The answer to this general question may affect some of my comments below.

Section 5.1 seems to indicate that the protection switching mode is a ring characteristic - 
which strongly supports the need for consistency in protection switching mode in a ring.
But I did not see an explicit statement to this effect elsewhere in the document.  

I would expect any such statement to occur at the point where the modes are introduced 
(i.e. - in section 4.3).

====

In the Introduction for this draft, you summarize optimization features from RFC 5654 as 
follows:
   a.  Minimize the number of OAM entities for protection
   b.  Minimize the number of elements of recovery
   c.  Minimize the required label number
   d.  Minimize the amount of control and management-plane transactions
       during maintenance operation
   e.  Minimize the impact on information exchange during protection if
       a control plane is supported

>From RFC 5654, the actual "optimization criteria" are:

   a.  Minimize the number of OAM entities that are needed to trigger
       the recovery operation, such that it is less than is required by
       other recovery mechanisms.
   b.  Minimize the number of elements of recovery in the ring, such
       that it is less than is required by other recovery mechanisms.
   c.  Minimize the number of labels required for the protection paths
       across the ring, such that it is less than is required by other
       recovery mechanisms.
   d.  Minimize the amount of control and management-plane transactions
       during a maintenance operation (e.g., ring upgrade), such that it
       is less than the amount required by other recovery mechanisms.
   e.  When a control plane is supported, minimize the impact on
       signaling and routing information exchange during protection,
       such that it is less than the impact caused by other recovery
       mechanisms.

The draft does not identify the list provided as a summary of the correlated "optimization
criteria" from RFC 5654, and the summarized versions leave a lot of information out.  

This could be misleading and it would be useful to suggest to a reader that they might look a
at the actual criteria in RFC 5654 for more information.

As an example of "missing information," the summarized versions do not indicate to what 
we would compare the minimizations in this draft to determine that they achieve their
(ambiguous) goals (i.e. "such that [the characteristic being minimized] is less than [that
same or similar characteristic] for other recovery mechanisms").

Additional "missing" information includes:
  a. "needed to trigger recovery" as opposed to "for protection."
  b. "in the ring" (possibly not important).
  c. "number of labels required for the protection paths across the ring" as opposed to 
      "required label number" (remember that - to some people - labels are numbers).
  d. an example is given of a maintenance operation (possibly not important).
  e. "signaling and routing information exchange" as opposed to "information exchange."

Minimally, some wording improvements are required to align the text in this draft with the
corresponding text in RFC 5654.

====

In the second paragraph after bullet "e" on page 4, you state that "the solution for 
point-to-multipoint transport paths is under study and will be presented in a separate 
document."  It is generally not a good idea for an Internet Draft to attempt to predict 
what may or may not happen in documents that are as yet unwritten.  It is sufficient 
to say that "a solution for anything other than point-to-point transport paths is out of 
scope in this document"  (specifically - replace "under study and will be presented in 
a separate" with "out of scope in this").

Note that this means this document does not satisfy Requirement 95 from RFC 5654.

====

When I see a statement along the lines of "This document elaborates on the requirements 
in detail" - my immediate concern is that what might actually happen is that requirements
are going to be "restated" in some way that supports a new (or different) interpretation or
agenda.

Please make it clearer what the intentions are for this document with respect to RFC 5654.

The Introduction claims that this document addresses ring protection requirements in
RFC 5654, yet - once we get into section 2, we jumps right over requirements 92-104A
to address the requirement that the protection mechanisms SHOULD protect against 
multiple failures.  Why is that?  

Perhaps the authors are already aware of an obvious set of mechanisms that satisfy 
these earlier requirements (many of which are ones that MUST be met) and it is then 
only necessary to delve more deeply into mechanisms that further satisfy this 
"SHOULD" be met requirement?  

If so, say so.  

If not, then provide some justification for why you want to address this somewhat soft 
requirement first.

====

Where - in RFC 5654 - does it state "recovery mechanisms which are optimized for ring 
topologies could be further developed if it can provide the following features" (I couldn't
discover that the word "developed" was used at all in RFC 5654, for instance)?  What it
seems to say is that the recovery mechanisms applicable to generic recovery may be 
optimized for specific topologies provided the optimizations meet the stated criteria,
and satisfy somewhat vague "cost-benefit" considerations.  Perhaps you meant to say:
"recovery mechanisms could be further optimized for ring topologies if those further
optimizations meet the following criteria" (or something along these lines)?

====

The first paragraph after bullet "e" on page 4 is awkward; I suggest replacing "those" with
"the."  Also note that the wording could use some minor improvement to avoid giving the
impression that the preceding list of "criteria" is not what you are referring to here as
"requirements on ring protection listed in RFC 5654."  Perhaps stating the actual number
range (i.e. - 92-94 and 96-109) of requirements this document addresses; or is the list of 
requirements this draft aims to address shorter than this?

====

Which requirements from RFC 5654 are addressed in section 2.2 (can you identify them
as part of this section).
 
====

Which requirements from RFC 5654 are addressed in section 2.3 (can you identify them
as part of this section).

====

Figure 1 (or at least part of the text describing it) appears to apply only to the ingress port
for Ring Tunnels; at an egress port, it is not a question of the port "can carry" multiple ring 
tunnels, because it _does_ carry exactly 4 (per description in section 4.1.1).

====

Having defined a notation for "label stack" in section 3, section 4 then falls back to a more
traditional notation.  Is there a reason why you do not use the same notation consistently?
Should you maybe include the possibility that the more traditional notation might also be 
used when defining a different notation in section 3?

The way I understand the notation defined in section 3, the label stack in figure 2 would be
shown as -

	[ Ring Tunnel Label | LSP Label | PW Label ] Payload

- which seems simpler than the way it is currently depicted.

====

Perhaps you should look again at the notation text in section 3.  One notation that you appear 
to be actually using is for an egress "<X>" to define 4 tunnels as:
	RcW_<X>
	RaP_<X>
	RaW_<X>
	RcP_<X>

For the first "reverse hop" in each tunnel, there is a label assigned by the egress corresponding
to that tunnel.

If you then construct a label-stack along the lines of the one above, the "Ring Tunnel Label"
would be whatever label corresponds to one of the above 4 tunnels (for that "hop" it might 
be unnecessary to use parentheses to add the "Node Name" of the assigning node to the 
label stack).

For each subsequent "reverse hop", the label could presumably be assigned by the downstream
end of the "hop" and the assigning node would be the same for RcW_<X> and RcP_<X> (the node
that is one "hop" clockwise from <X>), and a (probably different) assigning node would be the 
same for RaW_<X> and RaP_<X> (the node that is one "hop" counter-clockwise from <X>).

In any case, the notation used to name Ring tunnels is not described in section 3.

I assume that how LSP labels and PW labels are assigned is out of scope for this draft.

====

The wording in section 4.1.3 is unclear; because you have not described the mapping for Ring 
Tunnel Labels at each hop, it is not clear what it means when the "ingress node on the ring 
pushes the working ring tunnel label according to the egress node" (chances are pretty good the
actual label pushed on by the Ring Tunnel ingress may not be the same label that was assigned 
for the Ring Tunnel by the tunnel egress node (unless the particular tunnel is one "hop" from the
ingress to the egress node, or the labels are configured that way for all tunnels at all ring nodes 
- which may not be possible).

Section 4.1.2 should probably say a little bit more about assignment of Ring Tunnel Labels at
each transit node in the ring.

====

What is meant in section 4.2 by "Two end ports of a link"?  I assume this means the ports that are
used to connect two adjacent nodes on a ring, but "end ports" makes this unclear.

====

There are issues with use of (or non-use of) normative language in section 4.3.  Is it the case that
every node MUST obtain the ring topology (currently it says "can")?  

This seems important because it is not at all clear that ring-topology awareness is required to 
support either wrapping protection mode (this assumes that protection mode is required to be 
consistent in a ring), since it seems that wrapping occurs at points where the ring is broken, 
irrespective of the ring topology otherwise.

Where is there a statement to the effect that the protection switching mode in a single ring MUST
be consistent?

====

For Steering (section 4.3.3), there are a few places where the description might be easier to follow
if there was a term (or expression) for "the ring node on which an LSP enters the ring" - since this is 
(as I understand it) the ONLY ring node where the location of the failure determines which ring the 
LSP traffic will be forwarded on.

There may be an unwritten assumption associated with the use of Steering protection switching 
mode - that is particularly obvious in the 2nd paragraph.  As I understand it, Steering mode is the 
only mode where ring topology information is needed by every ring node.  

This seems to be the case because steering mode is the only mode where the decision as to which
ring to use in forwarding LSP traffic is made at each node depending on where that node is in the
ring and where in the ring the failure has occurred.  For the wrapping modes, the traffic is placed 
on the working ring by each node and only moved to a protection ring by the ring node directly 
adjacent to the link or node failure.

However, with respect to knowledge of ring-topology information, the draft says only that this 
information is either configured or learned "via some topology discovery mechanism" (in the 2nd 
paragraph, section 4.3).

If it is the case that ring-topology awareness is only explicitly required for steering mode, then the
assumption that this information is known should be explicitly stated here (if - as does not seem 
very likely, or possibly not necessary - this information is explicitly required for all modes, then that
should be made explicit in the use of normative language in section 4.3).

====

In section 5, I strongly suspect that - no matter how hard (or how often) we look at this section - 
there will very likely be issues that will only be discovered by implementers (especially those not
already involved in writing this section, as they will be the ones that discover hidden assumptions
or missing pieces).

====

Depending on a number of factors - chief among them is whether or not RPS is required to know
about unidirectional link failures in every protection switching mode - it may be the case that RPS
is NOT necessarily required to support MSRP.  Should that turn out to be the case, then you need 
to reconsider certain uses of normative language in section 5.1.

If the RPS process does not communicate between ring nodes via the G-ACh channel defined in this 
document, how would it do so?  Note that - in this document - you currently seem to require use of
RPS and define one mechanism (use of a G-ACh channel) for doing this.  Assuming both these facts
are true, it seems reasonable to mandate use of the defined mechanism.

Note that text in section 7 (Security Considerations) does explicitly state that G-ACh is used for RPS
protocol.

What does it mean to have no protection switching active on a ring (perhaps you mean no protection
switching in effect)?  Or is this something that is a common understanding among operators (i.e. - if
protection switching is "active" this means that the protection switching mechanisms enabled are in
use as a result of a failure)?

What exactly is a "failed span?"

What does it mean to say that "Ring switches MUST be preempted by higher priority RPS requests?"
I assume this means that it must be possible to change the protection switching state as a result of a
higher priority RPS request - but this is not necessarily obvious.

In the last paragraph of section 5.1 (immediately before section 5.1.1), you say that "nodes do not
preempt existing RPS requests unless they have a higher-priority RPS request."  Further, you imply
that "knowledge of the state of the ring" is required to make this determination.  

Exactly how does a ring node make this determination?  A ring node can make direct comparisons 
between an incoming RPS message type/priority and the RPS message type/priority it would make
based on local knowledge of the ring - but this does not require "knowledge of the state of the ring"
only knowledge of local ring-state.

Moreover, "knowledge of the state of the ring" is often less than perfect, if meant to apply to remote
portions of the ring (an incoming RPS message is likely to reflect better knowledge of the ring-state
than the ring node would otherwise likely have).

====

For section 5.1.1 - exactly how is the recommended interval between RPS message reflecting new 
information determined?  I can see that having an interval of 3.3 milliseconds would result in having
sent 3 of them in slightly less than 10 milliseconds, but this may not be particularly relevant in the 
goal of a 50 millisecond protection switch completion if the mechanism is relying on RPS propagation 
around the ring the long way in order to detect a unidirectional link failure.

===

On page 28, what does this paragraph mean?

"When multiple MS RPS requests over different spans exist at the same time, no switch SHOULD be 
executed and existing switches MUST be dropped.  The nodes MUST signal, anyway, the MS RPS 
request code."

After looking a bit, "MS" probably means "Manual Switch" and this would imply that the effect of
having multiple manual switch commands should be that they nullify each other.  But what is the 
expected behavior if there are multiple manual switch commands in a ring and a ring node sends 
a "Signal Fail"  RPS message?

====

Section 6 - in effect, this section determines  the value assignments that will be used in the rest of 
the document.  Therefore, the statement that "new values" are "defined in this document and 
summarized in this section" is not quite correct (because it implies that they are defined elsewhere
in this document and then summarized here).  I suggest replacing " and summarized in this section"
with "listed in the sections below" so that it would now read:

"IANA is requested to administer the assignments of new values defined in this document and listed
in the sections below."

I will grant you that this seems a minor point; the code points are defined here and their meaning
and use is defined elsewhere in the document.  But this section is about the code points and the 
need for IANA to create registries as needed and record these assignments.  Everything that IANA 
needs MUST be _included_ (not summarized) here.

=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=

NITs:
Section 3 - "()" Are called "parentheses" - "[]" are "brackets" and "{}" are "braces."

----

Section 4.2 - "Three consecutive CC packets losses ..." doesn't make sense - probably should be
"three consecutive lost CC packets ..."

----

Section 4.3 - in the last sentence of the 2nd paragraph, "use" should be "uses" and "peform" should 
be "perform."

Also, "(original data traffic carried by LSP1)" and "(data traffic carried by LSP1)" should both just be 
"Payload" to be consistent with Figure 2.  The same also applies (to "Original data traffic) later in 
section 4.3.1.1, (to "original data traffic carried by LSP1" and "data traffic carried by LSP1") in 
section 4.3.1.2 and anywhere else where you use similar phrasing.

"Then at node D ..." should start a separate paragraph.

"... node which detects a ..." should be "... the node which detects a ..."

"... and reach the egress node" should be "... and reaches the egress node"  (though "... and arrives
at the egress node" may be a slightly better wording).

""At the egress node, the traffic leave the ring ..." - "leave" should be "leaves" (even though "traffic"
may be (and often is) plural, it is a group name (like "herd") so if it is in the singular form, that is what
the verb ("leave" in this case) has to match.

Similar for the case with "... following sections describes ..." (should be "... following sections describe 
...").

----

Section 4.3.1 - "... tunnel can protect both the link failure and the node failure" should probably be "...
tunnel can protect against either a link failure or a node failure."

----

Section 4.3.2.1 - in the 4th sentence, "Rap_D" should be "RaP_D."

----

Section 4.3.3 - 1st sentence, "perform" should be "performs."

In the last sentence, "needs" should be "need" and "it" should be "the ring node."

----

Top of page 26, 2nd paragraph - "resulting the ring being ..." should probably be "resulting in the
ring being ..."

----

In section 5.1.3.2, "... to adjacent node ..." should probably be "... to adjacent nodes ..."

----

There are a great many places where "Pass-trough" should probably be "Pass-through" in sections
under 5.2 (i.e. 5.2.2 through 5.2.5).

----

On the last line on page 28, "Node" should probably be "The node."

----

Section 6.2 - in the section title, "RSP" probably was meant to be "RPS."

--
Eric Gray

PS - As a general (but mostly irrelevant) observation, I found it amusing that the acronym chosen 
for this specification (MSRP) just happens to be the common acronym in use for decades in 
the US to mean "Manufacturer's Suggested Retail Price."

:-)

_________________________________________________________________________________
From: mpls [mailto:mpls-bounces@ietf.org] On Behalf Of Eric Gray
Sent: Monday, August 15, 2016 10:32 AM
To: mpls@ietf.org
Cc: draft-ietf-mpls-tp-shared-ring-protection@ietf.org; mpls-chairs@ietf.org
Subject: [mpls] Working Group Last Call for draft-ietf-mpls-tp-shared-ring-protection-02

MPLS Working Group,

I am the document shepherd for the following draft:

"MPLS-TP Shared-Ring protection (MSRP) mechanism for ring topology" 
(https://datatracker.ietf.org/doc/draft-ietf-mpls-tp-shared-ring-protection  or
https://tools.ietf.org/html/draft-ietf-mpls-tp-shared-ring-protection) - draft
version 02. 

This E-Mail is to initiate a two week working group last call on the above draft.

Please send comments to the MPLS working group mailing list (mailto:mpls@ietf.org).

There are IPR disclosures for the individual draft that this document replaced.
They can be found here:

https://datatracker.ietf.org/ipr/search/?submit=draft&id=draft-ietf-mpls-tp-shared-ring-protection 

All the authors and contributors have stated that they are not aware of any 
additional IPR related to this draft.

The document shepherd and working group chairs are frequently asked about 
working group discussion related to IPR disclosures.

Please remember that discussion on the content and validity of IPR disclosures 
should not take place on IETF mailing lists.

However we are looking for simple statements as to whether or not you support
continued working group effort to progress the document, regardless of existing 
IPR disclosures. 

Please include this information when making comments, or indicating your 
"support/do not support" when responding to this working group last call.

Note that the IETF works on the basis of "rough consensus" - hence, unless you 
are convinced that a significant number of other participants will have similar 
objections, it may be in your best interest to provide any comments you may have 
on this Internet Draft, even if you do not support progressing this draft further.

This working group last call ends Monday, August 29, 2016.

--
Eric Gray