Re: [mpls] Adoption of draft-shen-mpls-egress-protection-framework

Yimin Shen <yshen@juniper.net> Fri, 08 September 2017 19:22 UTC

From: Yimin Shen <yshen@juniper.net>
To: Alexander Vainshtein <Alexander.Vainshtein@ecitele.com>, "mpls@ietf.org" <mpls@ietf.org>
CC: "draft-shen-mpls-egress-protection-framework@ietf.org" <draft-shen-mpls-egress-protection-framework@ietf.org>, Stewart Bryant <stewart.bryant@gmail.com>
Thread-Topic: Adoption of draft-shen-mpls-egress-protection-framework
Thread-Index: AdMn4OLyXKADRfUKRCC1lUEYeszmrQA1XPmA
Date: Fri, 08 Sep 2017 19:22:53 +0000
Message-ID: <CFB8BC8D-06FF-4579-8CCB-06AAF03457FD@juniper.net>
References: <AM4PR03MB1713772A4348FC535728B6059D940@AM4PR03MB1713.eurprd03.prod.outlook.com>
In-Reply-To: <AM4PR03MB1713772A4348FC535728B6059D940@AM4PR03MB1713.eurprd03.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/f.20.0.170309
received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <47A694B00B9D2F4F9E4696BF648DC45C@namprd05.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Sep 2017 19:22:53.4357 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0501MB1315
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/8bLaRIm8fduCZFvPvc1Zo4Tu5C8>
Subject: Re: [mpls] Adoption of draft-shen-mpls-egress-protection-framework
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Sep 2017 19:22:59 -0000

Hi Sasha,

Thanks for your detailed review, and kind suggestions and support for this draft!

Please see inline for our response, and we will incorporate these changes in the next revision.

Thnaks,

Yimin Shen
Juniper Networks

--------------------------
Dear colleagues,

I’ve read the draft in question and I support its adoption as a WG document by the MPLS WG.

[yshen] Thanks again !

At the same time I have several issues with the current text that, from my POV, should be resolved as the document is progressed. I would like to emphasize that I do not see these issues as blockers for the document adoption.

Here is the list:

The Requirements Language:

I have seen several occurrences of mixed usage of the IETF capitalized “MUST” and the non-normative “must” in the same sentence or in multiple sentences forming (from my POV) a common context within the document. Gere are just two examples (offending words are highlighted):

· In section 5.10: “The context ID must be advertised in such a manner that any egress-protected tunnels MUST have E as tailend, and any egress-protection bypass tunnels MUST have P as tailend while avoiding E”

· In section 5.13: ”In order to achieve this, the protector MUST maintain such kind of service labels in dedicated label spaces on a per protected egress {E, P} basis, i.e. one label space for each egress router that it protects. Also, there must be a session of service label distribution protocol between each egress router and the protector”

I have also found at least one text fragment where the IETF capitalized “MUST” appears very close to a non-normative “should” making it very difficult to understand what is and what is not required (again, the offending words are highlighted):

· In section 5.8: “The bypass tunnel MUST have the property that it should not be affected by any topology change caused by an egress node failure”

I have found at least one text fragment that looks to me like specification of OPTIONAL behavior, but the IETF capitalized “MAY” is not used:

· In section 5.2: “In a case where a PLR does not have a fast and reliable mechanism to detect a node failure or distinguish between a link failure and a node failure, it may conservatively treat a link failure as a node failure and trigger egress node protection”.

(Please note that the quoted fragments are just examples, and there may be more such fragments in the draft).

[yshen] Will go through the document and fix all the cases.

Last but not least, RFC 2119 is quoted in Section 2, but is not listed as a normative reference.

[yshen] Will fix this.

Egress Link Failure:

The draft proposes a common framework for local repair of both egress link failures and egress node failures. From mu POV, these two cases are different. In particular:

· Local repair of egress link failures can be achieved by many different mechanisms, including some that do not require support of context-specific label spaces in the protector. One such mechanism is even mentioned as “Option 2” for egress link protection in Section 6 of the draft, and, AFAIK, similar mechanisms have been successfully deployed by some vendors.

[yshen] Yes, this draft does consider this kind of egress-link protection mechanisms, hence list them as an option in the above section. Will clarify this further.

· While alternative reasonably fast mechanisms for repair of egress node failures exist (e.g., see Section 6.2.1 of the BGP PIC draft, these mechanisms are applied at the service ingress PE and therefore should be classified as global repair mechanisms).

I think that clear differentiation between the two cases (with Informative reference to the BGP PIC draft etc.) would be most helpful.

[yshen] Sure. Will mention BGP PIC as an example of global repair (as it is driven by ingress routers), and refer to the relevant draft.

Egress Node Failure:

· The draft is somewhat vague about detection of egress node failure as opposed to detection of failure of the link between the penultimate LSR and the egress node. Section 5.2 only says that “All the local failure detection mechanisms used by PLRs in transit link/node protection are applicable to egress node failure detection” leaving it to the reader/implementer to guess whether some other methods (e.g., monitoring the state of a multi-hop IP BFD session between stable IP addresses owned by these two devices) can be used for reasonably fast and reliable detection of egress node failures.

· One well-known method for reliable (but slow) detection of the node failure is to wait for IGP to converge and to check whether the node is still present in the updated LSDB. AFAIK, there are deployed BGP PIC implementations that use this method to trigger BGP Edge PIC repair. However, I believe that this method is definitely not suitable as a trigger for local repair mechanisms, and I think that the authors should clarify their position on this point, one way or another.

· The draft states in section 5.2 that “In a case where a PLR does not have a fast and reliable mechanism to detect a node failure or distinguish between a link failure and a node failure, it may conservatively treat a link failure as a node failure and trigger egress node protection”. This leaves open the question of the PLR that can reliably and reasonably fast differentiate between these two cases (link failure and egress failure). In particular, the draft should describe the possible modes of behavior that the user can select for the PLR, and the ways to avoid race conditions between multiple protection schemes being applied simultaneously.

[yshen] Good point! For clarity, I will add a new section for failure detection, to list common detection methods and address applicability, like the following:

1) If the PLR has reasonably fast mechanisms (i.e. faster than control plane failure detection) to detect and differentiate a link failure and an egress node failure, it may set up both link protection and egress node protection, and trigger one and only one protection upon a corresponding failure. There should be no race condition between the two protections.

2) The PLR may have fast mechanisms to detect a link failure and an egress node failure, but cannot distinguish them. Or, the PLR may have a fast mechanism to detect a link failure only, but not an egress node failure. In these cases, the PLR has two options:

2.1) It may set up link protection only, and leave the egress node failure to global repair and control protocol convergence.

2.2) It may set up egress node protection only, and treat a link failure as a trigger for the egress node protection. However, the assumption is that treating a link failure as an egress node failure should not have a negative impact on services. Otherwise, it should also set up link protection only, and leave the egress node failure to global repair and control protocol convergence.

Context ID Visibility:

Section 5.10 discusses various ways in which reachability of the Context ID can be advertised in IGP, i.e., within a specific AS. But I have not found any mention of inter-AS visibility of the Context ID. As a consequence, it is not clear to me whether the framework defined in the draft is applicable to MPLS services that cross multiple AS, e.g., with inter-AS IP VPN option C. From my POV, if the draft is only applicable to intra-AS services, this should be explicitly stated in the document (preferably, in the Applicability Statement section).

[yshen] Both inter-AS and inter-area are supported. We will describe how context IDs are propagated across multiple AS/areas. Essentially, a context ID is an IP address. Hence, its inter-AS/area visibility can be achieved in the same manner as that of a regular IP address.

The role of traffic engineering in setup of bypass tunnels:

The draft states that “any egress-protection bypass tunnels MUST have P as tailend while avoiding E” (Section 5.9). It also states that “The bypass tunnel MUST have the property that it should not be affected by any topology change caused by an egress node failure” (Section 5.8). These statements look to me as implicit requirements to use some traffic-engineering technique for setup of the bypass tunnels and should be clarified in the draft. In particular, from my POV addressing these requirements with LDP used for setup of the bypass tunnels as described in Section 5.11, item [2], is non-trivial and therefore deserves detailed explanation. Depending on the details, some references (e.g., references to RFC 5286 and RFC 7490), that currently appear as Informational, may have to be promoted to Normative.

[yshen] The path computation for a bypass tunnel does not necessarily require TE. All we need is an algorithm which can take an avoided node (i.e. the egress node, in this case) as a constraint. Of course, such kind of algorithm is commonly used for TE, but it should not be tied to TE by its nature. In the case of LDP, we can set up the bypass tunnel by using various LFA mechanisms, which are actually good non-TE examples of the kind of path computation algorithm.

Bypass tunnels and segment routing:

I may have missed something important, but it seems that the technique described in Section 5.11, item 4, is just a variation of the hierarchical LSP technique when the context segment is used as a single hop LSP on top of a bypass LSP set up using SR. If so, it makes sense to merge items 3 and 4 of Section 5.11.

[yshen] Agreed. Will combined them.

Hopefully these comments will be useful.

Regards,

Sasha

Office: +972-39266302

Cell: +972-549266302

Email: Alexander.Vainshtein at ecitele.com

[mpls] Adoption of draft-shen-mpls-egress-protect… Alexander Vainshtein
Re: [mpls] Adoption of draft-shen-mpls-egress-pro… Yimin Shen
Re: [mpls] Adoption of draft-shen-mpls-egress-pro… Alexander Vainshtein