Re: [Teas] WG Last Call on draft-ietf-teas-gmpls-lsp-fastreroute-05

Hi Adrian,

Appreciate you taking the time to do thorough review and providing detailed comments.  We have uploaded a new revision (06) that addresses your comments.

https://tools.ietf.org/html/draft-ietf-teas-gmpls-lsp-fastreroute-06

Please see inline with <RG> for replies for your comments.

On Wed, Jun 15, 2016 at 2:04 PM, Adrian Farrel <adrian@olddog.co.uk<mailto:adrian@olddog.co.uk>> wrote:
Hi I reviewed this document as part of last call, having not paid attention to it for some considerable time.

This document describes what is essentially a simple a useful feature, but it over-complicates life (such as in 4.5.2), includes confusing text (such as in 1. and 4.5.1), and seems to miss some details.

I think the document could use more work before publication.

(Caveat - I do not have an implementation of this function that I am working on.)

Thanks,
Adrian

---

I found the Introduction particularly heavy to read. This is not an
uncommon problem because it is often the oldest text, used to exist to
justify the work, and is rarely updated except to add to the catalogue of
issues being addressed.

<RG> We have updated the Introduction section. Hope it reads better now.

One of the problems described in the Introduction is unclear to me. The
text says

   When using FRR procedures with bidirectional co-routed GMPLS LSPs, it
   is possible in some cases for the RSVP signaling refreshes to stop
   reaching some nodes along the primary LSP path after the PLRs finish
   rerouting signaling onto the bypass tunnels.  This may occur when
   using node protection bypass tunnels after a link failure event and
   when RSVP signaling is sent in-fiber and in-band with data.  This is
   caused by the asymmetry of paths that may be taken by the
   bidirectional LSP's signaling in the forward and reverse directions
   after FRR reroute.  In such cases, the RSVP soft-state timeout
   causes the protected bidirectional LSP to be destroyed, with
   subsequent traffic loss after FRR.

Firstly a minor point: you can strike "in-fiber and" since it is
automatically covered by "in-band with data".

<RG> Text corrected.

Now the main point. I think the problem you describe specifically arises
when the "asymmetry of paths that may be taken" extends to asymmetry of
PLR/MP pairs. That is, the choice of path is not relevant because the
bypass tunnel appears as a single hop, but if there is some mismatch of
PLR/MP choice then one direction of the protected LSP may pop out of its
protection tunnel at a different point from where the other enters its
protection tunnel.

<RG> Modified Introduction section to clarify the problem.

It may be more helpful to express the Introduction in terms of
objectives and desires rather than complaints about deficiencies in
4090. Thus...

1. You want the same PLR/MP pairs to be selected in each direction.
2. You want both PLRs to select the same bidirectional bypass tunnel.
3. You need next-hop-label and next-next-hop label exchanges to work
   for both directions of the protected LSP.

<RG>  Modified Introduction section with above suggestion.  Hopefully it is easier to read now.

Now, assuming you do all of these, doesn't the soft-state timeout
problem go away? Or are you describing a different problem where you
use node protection in the case of a link failure leaving a downstream
node up but not receiving refresh messages? I think that is a 4090
problem that is not specific to this draft and is generally solved by
not doing node protection for link failure!

<RG> Even with using the same bidirectional tunnel and same PLR/MP pair, soft-state time-out problem exists when using the node protection bypass tunnels as shown in Figure 2.

---

The term "primary LSP" seems to be introduced in this document.

Maybe you should define it or replace it with "protected LSP" which is
what you probably mean.

In other protection work (in MPLS and CCAMP) the term "primary" is used
exchangeably with "working", and along with "secondary" and "backup".
But, that doesn't seem appropriate here because you don't really have a
primary/secondary concept.

<RG> Corrected.

---

In 2.2 you define upstream/downstream PLR. You might do similar for MPs
because the definitions are not intuitive or consistent with previous
work.

<RG> Added definitions for downstream MP and upstream MP.

Normally upstream and downstream are relative positional terms ("LSR A is
upstream of LSR B" or "the upstream LSR"), but you are using them in a
directional sense where we normally use "forward" and "reverse".

Thus, when you say "downstream PLR" you mean "the node upstream of the
fault (i.e., between the ingress and the fault) that performs PLR
function on the forward path". When used in your sense, we have
typically said something far more longwinded but carefully clear, such
as "the PLR for the downstream direction of traffic flow."

I think you should think about whether it would be helpful to change the
terms you use especially in view of the definition of MP in 4090 (and
reproduced in 2.2).

<RG> Terminology updated in the revised document. Removed definitions defined in RFC4090.

---

2.2

Is no familiarity with 3471, 3473, and 4090 assumed?
I wonder why you redefine (restate definitions of) terms from 4090.
(Expanding abbreviations is a fine thing to do.)

<RG> Updated section 2.2 with above suggestion.

---

2.2

I think...

   LSR: An MPLS Label Switching Router.
   LSP: An MPLS Label Switched Path.

<RG> Corrected, added a new Abbreviations sub-section.

---

2.2

I don't really think PRR is the most helpful name you could have given
to what is actually the "PLR on the forward path of the bidirectional
LSP." From what is the PRR remote?

Furthermore, in 6.2 you have...

   The downstream MP R5 that receives rerouted protected LSP RSVP Path
   message through the bypass tunnel, in addition to the regular MP
   processing defined in [RFC4090], gets promoted to a Point of Remote
   Repair (PRR) role and performs the following actions to re-coroute
   signaling and data traffic over the same path in both directions:

So the downstream MP is a PRR.
But using the definition from 2.2 the PRR is "an upstream PLR".
Meaning that the upstream PLR is the downstream MP?

<RG> Updated the definition of the PRR. Term PRR identifies specific FRR functions in addition to those of PLR and MP. Also the functions are performed by the remote node, hence the term.

---

3.

To be completely clear, where you have "These FRR procedures" I think
you mean "Those FRR procedures". That is, you mean that the FRR
procedures or 4090 apply to bidirectional associated GMPLS LSPs, and
not that the procedures of this document apply to bidirectional
associated GMPLS LSPs.

<RG> Corrected.

---

In section 4.5 I found myself asking why you didn't use RFC 5750. The
function is the same, I suppose, so maybe it is about codepoints.

I think I have a preference for keeping as few ERO/RRO subobjects
having different presence rules as possible.

<RG> Ok. Hope revised rules are ok in Section 4.5.1.

---

In 4.5.1 you have...

   When the BYPASS_ASSIGNMENT subobject is added in the RECORD_ROUTE
   Object:

     o The BYPASS_ASSIGNMENT subobject MUST be added prior to the
       Node-ID subobject containing the node's address.

     o The Node-ID subobject MUST also be added.

     o The IPv4 or IPv6 subobject MUST also be added.

     o The Label subobject MUST also be added.

You'll recall that there is no such thing as  "Node-ID subobject" per se
(see http://www.iana.org/assignments/rsvp-parameters/rsvp-parameters.xhtml#rsvp-parameters-24)
What you have available is IPv4 address subobjects and IPv6 address
subobjects that can contain addresses of interfaces or addresses of
nodes and flags you can set to define the context per RFC 4561. You
should rewrite in that context.

<RG> Term Node-ID subobject is used and defined in RFC4561. Corrected the text however to reflect the IANA terms.

You might go to 6.1.3 of RFC 4990 and state which options are allowed
and which cannot work with the BYPASS_ASSIGNMENT subobject.

<RG> Added the text.

BTW, does this not work with unnumbered interfaces, or did you forget?

<RG> Added.

I'm surprised that you put the BYPASS_SUBOBJECT before the IPv4/6
address subobject. This is counter to the way label subobjects are
placed after the IPv4/6 subobjects. Furthermore, how do I tell the
difference between node protection and link protection in this scheme?
Seems to me that you want to say that the location of the
BYPASS_ASSIGNMENT object tells you whether it is the node or the link
being protected, and that would work best by putting it after the thing
it protects.

<RG> Added the BYPASS_ASSIGNMENT subobject after the Node-ID subobject. We (authors) did not see much value in “inferring" the node protection/link protection based on the location of the subobject. Agree with you, keeping only few RRO presence rules.

it's worth noting (per RFC 3209) that labels are assigned in Resv
messages so that in the first Path message setting up an LSP it is not
possible to include the Label subobject (contrary to your MUST?). This
means that the BYPASS_ASS subobject cannot be present on the Path that
sets up the LSP, but must be added later.

<RG> Downstream PLR can assign the bypass tunnel when it receives the first Path message but may not be able to program the forwarding with the downstream MP label as it has not received the Resv yet. It can update the forwarding when the Resv message with label sub-objects is received. Added text for this in Section 4.5.1.

---

Surely you need a new error message for "BYPASS_ASSIGNMENT unknown"?

<RG> Added new Notify message for Error-Spec – Bypass Assignment Cannot Be Used and Not Found. Destination address also added in the BYPASS_ASSIGNMENT subobject so that the upstream PLR knows that there should be a reverse bypass tunnel present on the node in order to generate this error message.

---

Hiding here, I think is the fact that the address of the node present in
an address subobject is used to identify the tunnel along with the
tunnel ID. You need to be really careful because:
- a node may use multiple addresses to identify itself in different RROs
- a node may use multiple address to initiate signaling different
  tunnels

You need to call this out more clearly.

<RG> Added text to clarify in Section 4.5.1.

---

In 4.5.1

   In the absence of BYPASS_ASSIGNMENT subobject, the upstream PLR
   (downstream MP) SHOULD NOT assign a bypass tunnel in the reverse
   direction.  This allows the downstream PLR to always initiate the
   bypass assignment and upstream PLR (downstream MP) to simply reflect
   the bypass assignment.

Doesn't this cause problems if only node protection is in use and it is
the link downstream of the protected node that fails? In this case only
the "upstream PLR" detects the failure, but it cannot act because the
BYPASS_ASSIGNMENT subobject wasn't present.

Perhaps your answer is "serves you right for not doing the right thing"
which would seem reasonable!

On the other hand, why do you create this problem for yourselves? When
you say...
   The BYPASS_ASSIGNMENT subobject SHOULD be added by each downstream
   PLR in the RSVP Path RECORD_ROUTE message of the GMPLS signaled
   bidirectional primary LSP to record the downstream bidirectional
   bypass tunnel assignment.
...you could instead say...
   When the procedures defined in this document are in use, the
   BYPASS_ASSIGNMENT subobject MUST be added by each downstream PLR in
   the RSVP Path RECORD_ROUTE message of the GMPLS signaled
   bidirectional primary LSP to record the downstream bidirectional
   bypass tunnel assignment.

Then you could say that the absence of the subobject means that the
relevant node/link is not protected by a bidirectional bypass tunnel.

<RG> Agree, updated the text.

---

In 4.5.1 you say...

   An upstream PLR (downstream MP) SHOULD examine the entire Path RRO
   and look at all BYPASS_ASSIGNMENT subobjects in order to assign a
   reverse bypass tunnel.  The choice of a reverse bypass tunnel (if
   multiple bypass tunnels exist) is based on the local policy on the
   downstream MP and is discussed in Section 4.5.2 of this document.

Naively, this conflicts with the previous paragraph that seems to say:
find a sub-object and use it. Maybe you should merge the paragraphs so
is it is clear that you do *this* paragraph first, then apply 4.5.2, and
then apply the previous paragraph.

<RG> Merged the two paragraphs to make this clear.

But I think you are making a rod for your own back! Parsing the whole
RRO is pretty ugly because of the amount of processing required, and
will require the ability to step over unknown subobjects. But more on
this in 4.5.2.

<RG> Revised section 4.5.2 to highlight the issue and solution, using the new Notify message

---

Finally for 4.5.1 you have...

   The bypass assignment co-ordination procedure described in this
   Section can be used for both one-to-one backup described in Section
   3.1 of [RFC4090] and facility backup described in Section 3.2 of
   [RFC4090].

This is true, but it is not so simple in a proper implementation. That
is, it would be really neat if the upstream PLR could tell whether to do
one-to-one or facility backup without having to be globally configured.
And it may be necessary (OK, it is necessary) to have an error code when
to report the BYPASS_ASSIGNMENT identifies a bypass tunnel that is
already in use for one-to-one protection.

<RG> Added text that DETOUR Object from RFC4090 can be used to identify one-to-one backup. Also, added an error code as suggested above.

---

I think 4.5.2 is just wrong :-(

The objective you have voiced is that the forward and reverse protection
paths should be the same. That means that the same pair of PLRs/MPs must
be selected, and they must use the same tunnel as well.

In this section you appear to say that the upstream PLR (i.e., the PLR
for the reverse path) has freedom to choose which protection tunnel to
use to carry the reverse path traffic, with the result that forward and
reverse protection may be on different tunnels.

Somehow (and I don't think this I-D does it) the two PLRs for any
failure must agree which tunnel they are using. Hopefully (!) that
decision is made before the error is detected.

<RG> Understand your comment. Please see the updated text in Section 4.5.2 that highlights the issue and also added a Notify message to handle this issue.

---

4.5.3

"MUST NOT be added to a Resv RRO"

Fair enough. Add a forward pointer to section 7.
But in section 7, please reference 3209 not 2205 (EROs/RROs did not
exist in standard RSVP until RSVP-TE came along.)

<RG> Corrected.

---

In section 5 I wasn't clear what happens if the error is only detected
in one direction. Is it acceptable for only one of the Resv/Path to be
rerouted over the tunnel and for traffic in one direction only to use
the tunnel? Or is the PLR that did not detect the error expected to
see the rerouted message (or sniff the rerouted data) and switch
accordingly in its turn?

<RG> Added section 7 for unidirectional link failure (moved the corresponding text from Section 6.2). Section 4.6 covers the general link failure case.

The same question applies to reversion. Does this need to be
coordinated?

<RG> Updated text for Revertive behavior in Section 5.2 and 6.3.

Many thanks,
Rakesh (for authors).

From: Teas [mailto:teas-bounces@ietf.org<mailto:teas-bounces@ietf.org>] On Behalf Of Vishnu Pavan Beeram
Sent: 13 June 2016 05:32
To: teas@ietf.org<mailto:teas@ietf.org>
Subject: [Teas] WG Last Call on draft-ietf-teas-gmpls-lsp-fastreroute-05

All,

This starts a two week working group last call on
draft-ietf-teas-gmpls-lsp-fastreroute-05.

The working group last call ends on Monday, June 27th. Please
send your comments to the TEAS mailing list.

As is always the case, positive comments, e.g., "I've reviewed this
document and believe it is ready for publication", are welcome!
This is useful and important, even from authors.
Note, IPR has been disclosed on this draft.

Thanks,
Pavan (and Lou)

_______________________________________________
Teas mailing list
Teas@ietf.org<mailto:Teas@ietf.org>
https://www.ietf.org/mailman/listinfo/teas