Review of draft-ietf-teas-gmpls-resource-sharing-proc-06

Dale Worley <worley@ariadne.com> Thu, 12 January 2017 21:29 UTC

Return-Path: <worley@ariadne.com>
X-Original-To: ietf@ietf.org
Delivered-To: ietf@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 88A541289B0; Thu, 12 Jan 2017 13:29:28 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Dale Worley <worley@ariadne.com>
To: <gen-art@ietf.org>
Subject: Review of draft-ietf-teas-gmpls-resource-sharing-proc-06
X-Test-IDTracker: no
X-IETF-IDTracker: 6.40.3
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <148425656853.2951.1304074587524457828.idtracker@ietfa.amsl.com>
Date: Thu, 12 Jan 2017 13:29:28 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/z__0ofR4UqYbcSrDyiW8bVwh8bM>
Cc: draft-ietf-teas-gmpls-resource-sharing-proc.all@ietf.org, ietf@ietf.org, teas@ietf.org
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.17
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jan 2017 21:29:28 -0000

Reviewer: Dale Worley
Review result: Ready with Nits

I am the assigned Gen-ART reviewer for this draft.  The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at
<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Document:  draft-ietf-teas-gmpls-resource-sharing-proc-06
Reviewer:  Dale R. Worley
Review Date:  12 Jan 2017
IETF LC End Date:  17 Jan 2017
IESG Telechat date:  2 Feb 2017

Summary:

       This draft is basically ready for publication, but has nits
       that should be fixed before publication.

There are various places where the wording of the draft is unclear.
The draft would benefit from a careful editing for clarity.
Particularly, there are a considerable number of places where the use
of "the" and "a" and of plurals is not standard or leaves the text
somewhat uncertain.

There are various references to ASSOCIATION objects,
SESSION_ATTRIBUTE
objects, etc.  The text leaves it unclear where these objects live;
it
talks as if they exist in an abstract sense.  I think I managed to
track down what is going on in RFC 4872, which is that the Path
message that sets up an LSP contains an array of objects and all of
the objects described are parts of the respective LSP setup Path
messages.

I also suspect that the Path message objects are retained by the
various nodes as permanent information about the LSPs that they have
configured, so one can speak unambiguously of "the ASSOCIATION object
of the LSP" long after the LSP is set up.

If all of this is correct, it would help the naive reader if this was
spelled out at the beginning of the document and/or the wording was
changed in places provide this context.  E.g.,

   GMPLS LSPs can share resources during LSP setup if they have
Shared
   Explicit (SE) flag set in their SESSION_ATTRIBUTE objects and:

could be clarified as

   GMPLS LSPs can share resources during LSP setup if they have
Shared
   Explicit (SE) flag set in the SESSION_ATTRIBUTE objects in the
Path
   messages that create them and:

There are a number of terms that are unclear to me.  It's possible
that they have very standard meanings in GMPLS or traffic
engineering,
though.  Is there a terminology section in a referenced RFC that
could
be pointed to to define these various words?

1.  Introduction

   to setup Label Switched Paths (LSPs) in non-packet transport

The form "set up" is a verb, whereas "setup" is a noun (naming an
instance of the action of setting up) or an adjective (specifying
that
something has to do with setting up).  So in this instance, the
wording should be "set up".  Other uses of "setup/set up" should be
checked also.

   As described in [RFC6689], an ASSOCIATION object can be
   used to identify the LSPs for restoration using Association Type
set
   to "Recovery" [RFC4872] and also identify the LSPs for resource
   sharing using Association Type set to "Resource Sharing" [RFC4873].


The ordering of the phrases in this sentence is somewhat confusing
because "using Association Type set to xxx" is a qualifier of "an
ASSOCIATION object", yet the phrase "can be used to yyy" is between
them.  Clearer to say:

   As described in [RFC6689], an ASSOCIATION object with Association
   Type "Recovery" [RFC4872] can be used to identify the LSPs for
   restoration.  Also, an ASSOCIATION object with Association Type
   "Resource Sharing" [RFC4873] can be used to identify the LSPs for
   resource sharing.

--

   Generally GMPLS end-to-end recovery schemes have the restoration
LSP
   signaled after the failure has been detected and notified on the
   working LSP.

Is "signaled" used here in a standard way for GMPLS?  It seems that
"the LSP is signaled" is to mean "the LSP is set up", but it took me
some time to realize that.  I am used to "X is signaled" meaning "a
signal is sent to X".  (There are many instances of this usage.)

It would also be useful for the reader to know the difference between
"protection", "restoration", and "recovery".  I think that
"protection" is anti-failure paths set up *before* any failure,
"restoration" is anti-failure paths set up *after* a failure, and
"recovery" includes both "protection" and "restoration".  Is this
standard terminology withing GMPLS, or should the reader be warned
about it?

   In non-packet transport networks, as
   working LSPs are typically signaled over a nominal path, 

What is the meaning of "nominal" here?  ("nominal" has a number of
meanings, some of which are largely contradictory.)

   can be reverted to the nominal path when the failure is repaired

In this context, the meaning of "reverted" is made clear by the
clause
"when the failure is reparied..." -- as opposed to other uses of
"reverted".

   In this document, procedures are reviewed for

It's probably better to say "we review procedures for...".

   o  When using end-to-end recovery with revertive mode, methods for
      LSP reversion and resource sharing are summarized in this
      document.

A definition of "revert/revertive/reversion" would be useful.

2.  Overview

   The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and
   being considered in this document, "fully dynamic rerouting
switches
   normal traffic to an alternate LSP that is not even partially
   established only after the working LSP failure occurs.  The new
   alternate route is selected at the LSP head-end node, it may reuse
   resources of the failed LSP at intermediate nodes and may include
   additional intermediate nodes and/or links".

It is awkward to visually coordinate the quotation marks in this
paragraph.  If it is important that the text is quoted from RFC 4872,
given its length, it should be presented as a block-quote.  If not,
the quotation marks should be omitted and just the reference given.

If the intention is to quote this text, it should be corrected so
that
it matches the passage from RFC 4872.  In particular, the difference
between "fully dynamic rerouting" (in the draft) and "Full LSP
rerouting (or restoration)" needs to be resolved, as there might be a
difference in meaning.

The grammar does not join "The GMPLS end-to-end recovery scheme ..."
and "... fully dynamic rerouting switches normal traffic".

Perhaps something like:

   The GMPLS end-to-end recovery scheme, as defined in [RFC4872] and
   being considered in this document, switches
   normal traffic to an alternate LSP that is not even partially
   established only after the working LSP failure occurs.  The new
   alternate route is selected at the LSP head-end node, it may reuse
   resources of the failed LSP at intermediate nodes and may include
   additional intermediate nodes and/or links.

--

   Two examples, 1+R and 1+1+R are described in the following
sections.

At this point in the text, it's not clear what category these items
are examples *of*.  They aren't single recovery situations, as one
would expect of something labeled "example".  They seem to be
sub-categories of "The GMPLS end-to-end recovery scheme".  So it
would
be better to use phrasing like "Two forms of end-to-end recovery,
...,
are described in the following sections." or "Two end-to-end recovery
schemes/situations ...".

I assume that other variants of end-to-end recovery exist, and this
draft is applicable to some/many/all of them.  To guard against
misunderstanding, it would be worth saying so by adding something
like
"Many other forms of end-to-end recovery exist, many of which [or
whatever] can use these RSVP-TE signaling techniques."

Given that sections 2.1 and 2.2 form a pair of examples, it might be
useful to distinguish them from "Resource Sharing By Restoration LSP"
(which is not an example, and is not somehow an alternative to 1+R
and
1+1+R) by renumbering the sections to:

    2.  Overview
    2.1.  Examples
    2.1.1.  1+R Restoration
    2.1.2.  1+1+R Restoration
    2.2.  Resource Sharing By Restoration LSP

In that case, the introductory sentence "Two examples..." would move
to the new section 2.1.

Where do the names "1+R" and "1+1+R" come from and do they have
meaning beyond being arbitrary labels?

Also, given that the 1+1+R case is split into four sub-cases, it's
not
clear that the split between 1+R and 1+1+R is fundamental.  It seems
that there is an array of semi-independent choices:  whether there is
an ongoing protection LSP, how many restoration LSPs may be
established (no more than the number of ongoing LSPs), how many
failures of original LSPs must happen before restoration LSPs are
established; various combinations of these choices yield various
restoration techniques.

Looked at that way, it might be worth combining both examples into
one.  But that has the problem that figure 2 looks considerably
different from figure 1.

OTOH, figure 2 isn't particularly accurate for the situation with two
restoration LSPs, and perhaps those two cases should be split into
another section with its own figure.

2.1.  1+R Restoration

   Unlike a protection LSP, a restoration LSP is signaled per need
   basis.

Is "restoration" a standard word in this field?  If not, there should
be some sort of terminology section that states clearly the
difference
between "protection" and "restoration".

2.2.  1+1+R Restoration

This paragraph could use rewording to be clearer:

   After a failure detection and
   notification on a working LSP or protecting LSP, a third LSP on
path
   A-H-I-J-Z is established as a restoration LSP.

Since the working LSP has already been described, this should be "the
working LSP".

   The restoration LSP
   in this case provides protection against a second order failure. 

It would probably be better to explain what the "second order
failure"
is:

   The restoration LSP in this case provides protection against
   failure of both the working and protecting LSPs.

--

   During failure switchover with 1+1+R recovery scheme, in general,
   failed LSP resources are not released so that working, protecting
and
   restoration LSPs coexist in the network.  Nonetheless, a
restoration
   LSP with the working LSP it is restoring as well as a restoration
LSP
   with the protecting LSP it is restoring can share network
resources. 

For ease of reading, better to split the two cases apart, and not use
"it is restoring" as we haven't introduced "restore" as a transitive
verb:

   The restoration LSP can share network resources with the working
   LSP, and it can share network resources with the protecting LSP.

--

   Typically, restoration LSP is torn down when the failure on the
   original (working or protecting) LSP is repaired and the traffic
is
   reverted to the original LSP.

Strictly,

   Typically, the restoration LSP is torn down when both the working
   and protecting LSPs are repaired and the traffic is reverted to
the
   original LSP.

Except that's not correct, either.  Probably the practice is that a
restoration LSP is torn down when enough original LSPs are repaired
to
bring the failure count below the threshold that triggered the
setting
up of the restoration LSP (which varies among the four models).  But
that's awkward to write, even though that is the correct statement.

--

   In all models discussed, if the restoration LSP also fails, it is
   torn down and a new restoration LSP is signaled.

You can't say "the restoration LSP" because some of the models have
more than one.  Better

   In all these models, if a restoration LSP also fails, it is torn
   down and a new restoration LSP is signaled.

2.3.  Resource Sharing By Restoration LSP

   it allows for resource sharing when the LSP
   traffic is dynamically restored after the link failure

The significance of this phrase isn't clear to me.  One possible
sense
is that since the failure that is being discussed is the C-D link
failure, then necessarily the resources from A to C can be reused.
But that meaning doesn't work well here, because we haven't
introduced
what the failure is.  (Also, you use the phrase "the link failure"
before introducing what the link failure is.)

It seems like the potential for resource sharing is a property of the
LSP that it might not have, but the text doesn't point that out
clearly as an assumption of the example.  Perhaps

   Using the network shown in Figure 3 as an example, LSP1
(A-B-C-D-E)
   is the working LSP, and assume it allows for resource sharing when
the LSP
   traffic is dynamically restored.

--

   In this case, A-B-C-F-G-E is
   chosen as the restoration LSP path and the resources on the path
   segment A-B-C are re-used by this LSP when the working LSP is not
   torn down (e.g. in 1+R recovery scheme).

"when" isn't the right word here, because the re-using the resources
doesn't wait for the working LSP to be not torn down.  Perhaps:

   In this case, A-B-C-F-G-E is
   chosen as the restoration LSP path and the resources on the path
   segment A-B-C are re-used by this LSP.  The working LSP is not
   torn down.

3.1.  Restoration LSP Association

   For example, when a restoration
   LSP is signaled for a failed working LSP, the ASSOCIATION object
in
   the restoration LSP contains the Association ID and Association
   Source set to the Association ID and Association Source signaled
in
   the working LSP for the "Recovery" Association Type.

As a general question, where does the association object live?
Clearly it isn't "in the restoration LSP".  It would be useful to
mention this for readers who aren't fully familiar with the
background:

   For example, when a restoration LSP is signaled for a failed
   working LSP, the ASSOCIATION object in the Path message that
   establishes the restoration LSP contains ...

3.2.  Resource Sharing-based Restoration LSP Setup

   As described in [RFC3209], Section 2.5, the purpose of
make-before-
   break is "not to disrupt traffic, or adversely impact network
   operations while TE tunnel rerouting is in progress".  In
non-packet
   transport networks, the label has a mapping into the data plane
   resource used and the nodes along the LSP need to send triggering
   commands to data plane for setting up cross-connections
accordingly
   during the RSVP-TE signaling procedure.  Due to the nature of the
   non-packet transport networks, a node may not be able to fulfill
this
   purpose when sharing resources in some scenarios.

I can understand this paragraph, but I think it could benefit from a
number of edits.  The first is to remove the quotation marks, since
the purpose is not to emphasize that RFC 3209 said those words, but
rather that 3209 stated the same concept.  And I think some of the
explanation can be omitted without losing clarity.

   As described in [RFC3209], Section 2.5, the purpose of
make-before-
   break is not to disrupt traffic, or adversely impact network
   operations while TE tunnel rerouting is in progress.  In
non-packet
   transport networks during the RSVP-TE setup procedure, the
   nodes along the LSP set up cross-connections accordingly.  Because
a
   cross-connection cannot simultaneously connect a shared resource
to
   different resources in two alternative LSPs, nodes may not be able
to
   fulfill this promise when LSPs share resources.

--

  
---------+---------------------------------------------------------
   Category |       Node Behavior during Restoration LSP Setup
  
---------+---------------------------------------------------------
      C1    + Reusing existing resource on both input and output
            + interfaces (nodes A & B in Figure 3).
            +
            + This type of node needs to book the existing 
            + resources and no cross-connection setup 
            + command is needed.
  
---------+---------------------------------------------------------

This would be prettier if most of the +'s were turned into |'s:

  
---------+---------------------------------------------------------
   Category |       Node Behavior during Restoration LSP Setup
  
---------+---------------------------------------------------------
      C1    | Reusing existing resource on both input and output
            | interfaces (nodes A & B in Figure 3).
            |
            | This type of node needs to book the existing 
            | resources and no cross-connection setup 
            | command is needed.
  
---------+---------------------------------------------------------

Note that the items in the second column of the table are composed of
two parts:  The first part is condition that defines which nodes are
in that category, and the second part is the actions that will be
taken by such nodes.  Ideally, these would be broken out as separate
columns.  (The current first column provides the labels C1, C2, and
C3, but those aren't references anywhere in the document, and could
be
omitted to save space.)  That revises the table to look like this:

  
------------------------------------+------------------------------
       Situation                       |     Actions
  
------------------------------------+------------------------------
   Reusing existing resources          | Book the existing resources.
   on both input and output interfaces | No cross-connection setup 
is
   (nodes A & B in Figure 3).          | needed.
  
------------------------------------+------------------------------
   Reusing existing resource only on   | Book the resources.
   one of the interfaces (either input | Re-configure the
cross-connection
   or output) and uses new resource on | to connect the re-used
resource
   the other interface.                | to the new resource.
   (nodes C & E in Figure 3).          |
  
------------------------------------+------------------------------
   Using new resources on both         | Book the new resources.
   interfaces.                         | Send the cross-connection
setup 
   (nodes F & G in Figure 3).          | command on both interfaces.
  
------------------------------------+------------------------------

Is the meaning of "book" well-known?  I find no use of it elsewhere
in
this document or in any of the references.

   Depending on whether the resource is re-used or not, the node
   behaviors differ.

Of course, the different behavior is only because we are here
optimizing the establishment of the new LSP.  A node could send a
command to cross-connect two resources that are already connected.

   This deviates from normal LSP setup since some
   nodes do not need to re-configure the cross-connection, and it
should
   not be viewed as an error.

Why would this (not sending a command to connect things that are
already connected) be considered an error under any circumstances?

3.3.  LSP Reversion

Is "reversion" a standard term?

   If the end-to-end LSP recovery is revertive, as described in
   Section 2 ...

I'm not sure how the phrase "If the end-to-end LSP recovery is
revertive" works.  "Recovery" seems to be a general term for
techniques to recover from link failures and the like.  Is this
describing a "revertive" recovery method, or is it describing an
instance of recovery which is somehow "revertive"?

Compare to "revert", which seems to be the action of putting the
traffic back on the original/protection LSP once its functionality is
restored.  I would expect that behavior to be universal.

   1. Make-while-break Reversion, where resources associated with a
      working or protecting LSP are reconfigured while removing
      reservations for the restoration LSP.

It's not clear to me what sort of reconfiguring is being discussed.
Assuming that "reversion" means "when the working/protecting LSP
starts working again, traffic is restored to that path", its not
clear
what sort of reconfiguration would be needed, as the
working/protecting LSP already exists.

I suspect that this issue shows up when the working/protecting LSP
shares resources with the restoration LSP, and moving traffic to the
restoration LSP may require reconfiguring resources, and so moving
traffic back to working/protecting LSP may require reversing that
reconfiguration.  But the initial reconfiguration has not been
mentioned.  Should some sort of general description be put in
"Resource Sharing By Restoration LSP" of the possible need to
reconfigure when moving traffic to or from a restoration LSP?

(This is all rather obvious, but it would help if it was clearly
described.)

3.3.1.  Make-while-break Reversion

   Removing reservations for restoration LSP
   triggers reconfiguration of resources associated with a working or
   protecting LSP on every node where resources are shared.

Could you add an explanation or pointer why this is so?  It seems
that
for this to be true, the reservation process must broadcast an
explicit prioritization between the new (restorative) reservation and
the old (working) reservation, because the node that is reconfigured
has to remember both reservations, and revert to the working one when
the restorative one is deleted.  It'd be useful for the naive reader
to know where in RSVP-TE that information is broadcast and/or how
RSVP-TE specified that nodes have to remember that information.

   Deletion of restoration LSPs is not a revertive process.

What is the meaning of "revertive process" here?  It doesn't seem to
match the sense of "revertive" as used elsewhere.

   In
   particular, if RSVP packets are lost due to nodal or DCN failures
it
   is possible for an LSP to be only partially deleted.

"nodal" should probably be "node".

What is "DCN"?  I can't find it in any of the referenced RFCs.  Does
"link" work as a replacement?

3.3.2.  Make-before-break Reversion

   Instead of relying on deletion of
   restoration LSP, the head-end chooses to establish a new LSP to
   reconfigure resources on the working or protection LSP path, and
uses
   identical ASSOCIATION and PROTECTION objects from the LSP it is
   replacing.

This could be made clearer by consistently labeling the enw LSP as
the
"reversion" LSP.  Also, state explicitly that its resources exactly
duplicate the resources of the working/protection LSP that is being
reverted:

   Instead of relying on deletion of the
   restoration LSP, the head-end chooses to establish a new
   "reversion" LSP that duplicates the configuration of the
   resources on the working or protection LSP, and uses
   identical ASSOCIATION and PROTECTION objects for that LSP.

--

   Reversion LSP is sharing resources both with working and
   restoration LSPs.

Better

   The reversion LSP shares all of the resources of the
working/protection
   LSP and may share resources with the restoration LSP.

--

   Hence, after reversion LSP
   is created, data plane configuration essentially reflects working
or
   protecting LSP reservations.

It seems like "essentially" is not needed, because the data plane
configuration will *exactly* reflect the working/protecting LSP
reservations.  Or are there minor variations in how reservations are
done that may not be exactly duplicated by the reversion LSP?

   After "make" part is finished, working and restoration LSPs are
torn
   down.

Perhaps emphasize "the original working/protection and restoration
LSPs are torn down", as the reversion LSP becomes the new
working/protection LSP.

   o  Rollback

   If "make" part fails, (existing) restoration LSP will still be
used
   to carry existing traffic.  Same logic applies here as for any MBB
   operation failure.

The reasoning here is not clear to me.  If the "make" operation
fails,
some of the nodes may be configured for the restoration LSP, while
others will be configured for the restoration LSP.  Or is it implicit
that creating LSPs is an atomic operation network-wide, that
incomplete LSP creations will be completely purged from the network?

If the latter is true, then the core of this discussion is that
creating LSPs is atomic across the network, but *deleting* LSPs is
not
(and so make-while-break can fail to work).  If that difference is
true, it should be said explicitly somewhere near the beginning of
section 3.3, as that fact is what is driving the whole discussion.

[END]