Re: [Teas] WG Last Call on draft-ietf-teas-gmpls-lsp-fastreroute-05

"Adrian Farrel" <adrian@olddog.co.uk> Wed, 15 June 2016 18:05 UTC

From: Adrian Farrel <adrian@olddog.co.uk>
To: 'Vishnu Pavan Beeram' <vishnupavan@gmail.com>, teas@ietf.org
References: <CA+YzgTvxwGgKimyOa==VTVWfhoGwr_r1oYEh7Mz_6WhNvqAxEg@mail.gmail.com>
In-Reply-To: <CA+YzgTvxwGgKimyOa==VTVWfhoGwr_r1oYEh7Mz_6WhNvqAxEg@mail.gmail.com>
Date: Wed, 15 Jun 2016 19:04:57 +0100
Message-ID: <06e301d1c730$700cf4f0$5026ded0$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_06E4_01D1C738.D1D94C30"
Thread-Index: AQJ1OciCyjlkvPKh1oEvcPoJ9gCEyp6j7Maw
Content-Language: en-gb
Archived-At: <https://mailarchive.ietf.org/arch/msg/teas/EEuJp3LSQlOcTozYpE0mcoqDUC8>
Subject: Re: [Teas] WG Last Call on draft-ietf-teas-gmpls-lsp-fastreroute-05
Precedence: list
Reply-To: adrian@olddog.co.uk

Hi I reviewed this document as part of last call, having not paid attention to it for some considerable time.
 
This document describes what is essentially a simple a useful feature, but it over-complicates life (such as in 4.5.2), includes confusing text (such as in 1. and 4.5.1), and seems to miss some details.
 
I think the document could use more work before publication.
 
(Caveat - I do not have an implementation of this function that I am working on.)
 
Thanks,
Adrian
 
---
 
I found the Introduction particularly heavy to read. This is not an
uncommon problem because it is often the oldest text, used to exist to
justify the work, and is rarely updated except to add to the catalogue of
issues being addressed.
 
One of the problems described in the Introduction is unclear to me. The
text says
 
   When using FRR procedures with bidirectional co-routed GMPLS LSPs, it
   is possible in some cases for the RSVP signaling refreshes to stop
   reaching some nodes along the primary LSP path after the PLRs finish
   rerouting signaling onto the bypass tunnels.  This may occur when
   using node protection bypass tunnels after a link failure event and
   when RSVP signaling is sent in-fiber and in-band with data.  This is
   caused by the asymmetry of paths that may be taken by the
   bidirectional LSP's signaling in the forward and reverse directions
   after FRR reroute.  In such cases, the RSVP soft-state timeout 
   causes the protected bidirectional LSP to be destroyed, with
   subsequent traffic loss after FRR.
 
Firstly a minor point: you can strike "in-fiber and" since it is 
automatically covered by "in-band with data".
 
Now the main point. I think the problem you describe specifically arises
when the "asymmetry of paths that may be taken" extends to asymmetry of
PLR/MP pairs. That is, the choice of path is not relevant because the
bypass tunnel appears as a single hop, but if there is some mismatch of
PLR/MP choice then one direction of the protected LSP may pop out of its
protection tunnel at a different point from where the other enters its
protection tunnel.
 
It may be more helpful to express the Introduction in terms of
objectives and desires rather than complaints about deficiencies in 
4090. Thus...
 
1. You want the same PLR/MP pairs to be selected in each direction.
2. You want both PLRs to select the same bidirectional bypass tunnel.
3. You need next-hop-label and next-next-hop label exchanges to work
   for both directions of the protected LSP.
 
Now, assuming you do all of these, doesn't the soft-state timeout
problem go away? Or are you describing a different problem where you 
use node protection in the case of a link failure leaving a downstream
node up but not receiving refresh messages? I think that is a 4090
problem that is not specific to this draft and is generally solved by
not doing node protection for link failure!
 
---
 
The term "primary LSP" seems to be introduced in this document.
 
Maybe you should define it or replace it with "protected LSP" which is
what you probably mean.
 
In other protection work (in MPLS and CCAMP) the term "primary" is used
exchangeably with "working", and along with "secondary" and "backup". 
But, that doesn't seem appropriate here because you don't really have a
primary/secondary concept.
 
---
 
In 2.2 you define upstream/downstream PLR. You might do similar for MPs
because the definitions are not intuitive or consistent with previous
work.
 
Normally upstream and downstream are relative positional terms ("LSR A is
upstream of LSR B" or "the upstream LSR"), but you are using them in a
directional sense where we normally use "forward" and "reverse".
 
Thus, when you say "downstream PLR" you mean "the node upstream of the 
fault (i.e., between the ingress and the fault) that performs PLR 
function on the forward path". When used in your sense, we have 
typically said something far more longwinded but carefully clear, such 
as "the PLR for the downstream direction of traffic flow."
 
I think you should think about whether it would be helpful to change the
terms you use especially in view of the definition of MP in 4090 (and
reproduced in 2.2).
 
---
 
2.2
 
Is no familiarity with 3471, 3473, and 4090 assumed?
I wonder why you redefine (restate definitions of) terms from 4090.
(Expanding abbreviations is a fine thing to do.)
 
---
 
2.2
 
I think...
 
   LSR: An MPLS Label Switching Router.
   LSP: An MPLS Label Switched Path. 
 
---
 
2.2
 
I don't really think PRR is the most helpful name you could have given
to what is actually the "PLR on the forward path of the bidirectional 
LSP." From what is the PRR remote? 
 
Furthermore, in 6.2 you have...
 
   The downstream MP R5 that receives rerouted protected LSP RSVP Path
   message through the bypass tunnel, in addition to the regular MP
   processing defined in [RFC4090], gets promoted to a Point of Remote
   Repair (PRR) role and performs the following actions to re-coroute
   signaling and data traffic over the same path in both directions:
 
So the downstream MP is a PRR.
But using the definition from 2.2 the PRR is "an upstream PLR".
Meaning that the upstream PLR is the downstream MP?
 
---
 
3.
 
To be completely clear, where you have "These FRR procedures" I think
you mean "Those FRR procedures". That is, you mean that the FRR 
procedures or 4090 apply to bidirectional associated GMPLS LSPs, and
not that the procedures of this document apply to bidirectional
associated GMPLS LSPs.
 
---
 
In section 4.5 I found myself asking why you didn't use RFC 5750. The
function is the same, I suppose, so maybe it is about codepoints. 
 
I think I have a preference for keeping as few ERO/RRO subobjects 
having different presence rules as possible.
 
---
 
In 4.5.1 you have...
 
   When the BYPASS_ASSIGNMENT subobject is added in the RECORD_ROUTE
   Object:
 
     o The BYPASS_ASSIGNMENT subobject MUST be added prior to the
       Node-ID subobject containing the node's address.
 
     o The Node-ID subobject MUST also be added.
 
     o The IPv4 or IPv6 subobject MUST also be added.
 
     o The Label subobject MUST also be added.
 
You'll recall that there is no such thing as  "Node-ID subobject" per se
(see http://www.iana.org/assignments/rsvp-parameters/rsvp-parameters.xhtml#rsvp-parameters-24)
What you have available is IPv4 address subobjects and IPv6 address
subobjects that can contain addresses of interfaces or addresses of 
nodes and flags you can set to define the context per RFC 4561. You 
should rewrite in that context.
 
You might go to 6.1.3 of RFC 4990 and state which options are allowed 
and which cannot work with the BYPASS_ASSIGNMENT subobject.
 
BTW, does this not work with unnumbered interfaces, or did you forget?
 
I'm surprised that you put the BYPASS_SUBOBJECT before the IPv4/6
address subobject. This is counter to the way label subobjects are
placed after the IPv4/6 subobjects. Furthermore, how do I tell the
difference between node protection and link protection in this scheme?
Seems to me that you want to say that the location of the 
BYPASS_ASSIGNMENT object tells you whether it is the node or the link 
being protected, and that would work best by putting it after the thing
it protects.
 
it's worth noting (per RFC 3209) that labels are assigned in Resv 
messages so that in the first Path message setting up an LSP it is not
possible to include the Label subobject (contrary to your MUST?). This
means that the BYPASS_ASS subobject cannot be present on the Path that
sets up the LSP, but must be added later.
 
---
 
Surely you need a new error message for "BYPASS_ASSIGNMENT unknown"?
 
---
 
Hiding here, I think is the fact that the address of the node present in
an address subobject is used to identify the tunnel along with the 
tunnel ID. You need to be really careful because:
- a node may use multiple addresses to identify itself in different RROs
- a node may use multiple address to initiate signaling different 
  tunnels
 
You need to call this out more clearly.
 
---
 
In 4.5.1
 
   In the absence of BYPASS_ASSIGNMENT subobject, the upstream PLR
   (downstream MP) SHOULD NOT assign a bypass tunnel in the reverse
   direction.  This allows the downstream PLR to always initiate the
   bypass assignment and upstream PLR (downstream MP) to simply reflect
   the bypass assignment.
 
Doesn't this cause problems if only node protection is in use and it is 
the link downstream of the protected node that fails? In this case only 
the "upstream PLR" detects the failure, but it cannot act because the
BYPASS_ASSIGNMENT subobject wasn't present.
 
Perhaps your answer is "serves you right for not doing the right thing"
which would seem reasonable!
 
On the other hand, why do you create this problem for yourselves? When 
you say...
   The BYPASS_ASSIGNMENT subobject SHOULD be added by each downstream
   PLR in the RSVP Path RECORD_ROUTE message of the GMPLS signaled
   bidirectional primary LSP to record the downstream bidirectional
   bypass tunnel assignment.
...you could instead say...
   When the procedures defined in this document are in use, the
   BYPASS_ASSIGNMENT subobject MUST be added by each downstream PLR in
   the RSVP Path RECORD_ROUTE message of the GMPLS signaled 
   bidirectional primary LSP to record the downstream bidirectional
   bypass tunnel assignment.
 
Then you could say that the absence of the subobject means that the
relevant node/link is not protected by a bidirectional bypass tunnel.
 
---
 
In 4.5.1 you say...
 
   An upstream PLR (downstream MP) SHOULD examine the entire Path RRO
   and look at all BYPASS_ASSIGNMENT subobjects in order to assign a
   reverse bypass tunnel.  The choice of a reverse bypass tunnel (if
   multiple bypass tunnels exist) is based on the local policy on the
   downstream MP and is discussed in Section 4.5.2 of this document.
 
Naively, this conflicts with the previous paragraph that seems to say:
find a sub-object and use it. Maybe you should merge the paragraphs so 
is it is clear that you do *this* paragraph first, then apply 4.5.2, and
then apply the previous paragraph.
 
But I think you are making a rod for your own back! Parsing the whole
RRO is pretty ugly because of the amount of processing required, and 
will require the ability to step over unknown subobjects. But more on 
this in 4.5.2.
 
---
 
Finally for 4.5.1 you have...
 
   The bypass assignment co-ordination procedure described in this
   Section can be used for both one-to-one backup described in Section
   3.1 of [RFC4090] and facility backup described in Section 3.2 of
   [RFC4090].
 
This is true, but it is not so simple in a proper implementation. That 
is, it would be really neat if the upstream PLR could tell whether to do
one-to-one or facility backup without having to be globally configured.
And it may be necessary (OK, it is necessary) to have an error code when 
to report the BYPASS_ASSIGNMENT identifies a bypass tunnel that is
already in use for one-to-one protection.
 
---
 
I think 4.5.2 is just wrong :-(
 
The objective you have voiced is that the forward and reverse protection
paths should be the same. That means that the same pair of PLRs/MPs must
be selected, and they must use the same tunnel as well.
 
In this section you appear to say that the upstream PLR (i.e., the PLR
for the reverse path) has freedom to choose which protection tunnel to
use to carry the reverse path traffic, with the result that forward and
reverse protection may be on different tunnels.
 
Somehow (and I don't think this I-D does it) the two PLRs for any 
failure must agree which tunnel they are using. Hopefully (!) that
decision is made before the error is detected.
 
---
  
4.5.3
 
"MUST NOT be added to a Resv RRO"
 
Fair enough. Add a forward pointer to section 7.
But in section 7, please reference 3209 not 2205 (EROs/RROs did not
exist in standard RSVP until RSVP-TE came along.)
 
---
 
In section 5 I wasn't clear what happens if the error is only detected 
in one direction. Is it acceptable for only one of the Resv/Path to be
rerouted over the tunnel and for traffic in one direction only to use 
the tunnel? Or is the PLR that did not detect the error expected to 
see the rerouted message (or sniff the rerouted data) and switch 
accordingly in its turn?
 
The same question applies to reversion. Does this need to be 
coordinated?
 
From: Teas [mailto:teas-bounces@ietf.org] On Behalf Of Vishnu Pavan Beeram
Sent: 13 June 2016 05:32
To: teas@ietf.org
Subject: [Teas] WG Last Call on draft-ietf-teas-gmpls-lsp-fastreroute-05
 
All,
This starts a two week working group last call on
draft-ietf-teas-gmpls-lsp-fastreroute-05.

The working group last call ends on Monday, June 27th. Please
send your comments to the TEAS mailing list.

As is always the case, positive comments, e.g., "I've reviewed this
document and believe it is ready for publication", are welcome!
This is useful and important, even from authors.
Note, IPR has been disclosed on this draft.

Thanks,
Pavan (and Lou)

Re: [Teas] WG Last Call on draft-ietf-teas-gmpls-… Vishnu Pavan Beeram
Re: [Teas] WG Last Call on draft-ietf-teas-gmpls-… Rakesh Gandhi
Re: [Teas] WG Last Call on draft-ietf-teas-gmpls-… Adrian Farrel
[Teas] WG Last Call on draft-ietf-teas-gmpls-lsp-… Vishnu Pavan Beeram
Re: [Teas] WG Last Call on draft-ietf-teas-gmpls-… Rakesh Gandhi (rgandhi)