[Pals] draft-ietf-pals-endpoint-fast-protection
Stewart Bryant <stbryant@cisco.com> Fri, 07 August 2015 16:19 UTC
Return-Path: <stbryant@cisco.com>
X-Original-To: pals@ietfa.amsl.com
Delivered-To: pals@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF69A1B2B7C for <pals@ietfa.amsl.com>; Fri, 7 Aug 2015 09:19:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -12.803
X-Spam-Level:
X-Spam-Status: No, score=-12.803 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, J_CHICKENPOX_26=0.6, LOCALPART_IN_SUBJECT=1.107, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cJCoG61B--qg for <pals@ietfa.amsl.com>; Fri, 7 Aug 2015 09:19:44 -0700 (PDT)
Received: from aer-iport-1.cisco.com (aer-iport-1.cisco.com [173.38.203.51]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BA0611ACDBB for <pals@ietf.org>; Fri, 7 Aug 2015 09:19:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=25424; q=dns/txt; s=iport; t=1438964383; x=1440173983; h=from:subject:reply-to:to:cc:message-id:date:mime-version; bh=QeVhTwwSMq0HeZgO/wylBVzpUCkLcM+GFEmxp0JEc1o=; b=F9PBlnEcr5YhTyPfTQ4mqjWeEMUV32yrs1zExjzzfkxl8iPVKO/JIJ1c iuOcxz4A0XlHeOQcqHEk9OA/ckkVtqZw4Q8nkKtMA5+XUklMsS7BBXDJR OtKr5svNbLZOua7Z3ft+6bS/lv6u+LJqOln0Dc8X3ugp0OMEnPMeSYOKN 0=;
X-IronPort-AV: E=Sophos;i="5.15,630,1432598400"; d="scan'208,217";a="619594929"
Received: from aer-iport-nat.cisco.com (HELO aer-core-4.cisco.com) ([173.38.203.22]) by aer-iport-1.cisco.com with ESMTP; 07 Aug 2015 16:19:42 +0000
Received: from [64.103.106.119] (dhcp-bdlk10-data-vlan300-64-103-106-119.cisco.com [64.103.106.119]) by aer-core-4.cisco.com (8.14.5/8.14.5) with ESMTP id t77GJfEE005820; Fri, 7 Aug 2015 16:19:42 GMT
From: Stewart Bryant <stbryant@cisco.com>
To: draft-ietf-pals-endpoint-fast-protection@tools.ietf.org
Message-ID: <55C4DAB5.40103@cisco.com>
Date: Fri, 07 Aug 2015 17:20:05 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------------020403010105060602020508"
Archived-At: <http://mailarchive.ietf.org/arch/msg/pals/Qlyf_d_VdUEm1YBTYwCwi5AHJCQ>
Cc: "pals-chairs@tools.ietf.org" <pals-chairs@tools.ietf.org>, "pals@ietf.org" <pals@ietf.org>
Subject: [Pals] draft-ietf-pals-endpoint-fast-protection
X-BeenThere: pals@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: stbryant@cisco.com
List-Id: "Pseudowire And LDP-enabled Services dicussion list." <pals.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pals>, <mailto:pals-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/pals/>
List-Post: <mailto:pals@ietf.org>
List-Help: <mailto:pals-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pals>, <mailto:pals-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Aug 2015 16:19:52 -0000
Hi Authors, I have some more comments on draft-ietf-pals-endpoint-fast-protection-00 There will be another WGLC on the draft so please consider these comments along with any others you receive. This document needs a careful look at the 2119 language. I commented on some of the SHOULDs, but I think there are a whole lot more where you need to s/SHOULD/MUST/ In each cases when I ask myself "what would happen if the implementer ignored the the suggestion" the system would break, which leads me to conclude that it needs to be a MUST. ========= 1. Introduction Today, fast protection against ingress AC failure and ingress (T-)PE failure can be achievable by using a multi-homed CE and redundant ACs, such as multi-chassis link aggregation group (MC-LAG). Fast protection against failure of intermediate router of transport tunnel SB> nit: "of an intermediate" ======== This document is intended to serve the above need. It specifies a fast protection mechanism based on local repair to protect PWs against the following egress endpoint failures. a. Egress AC failure. b. Egress PE failure: Link or node failure of an egress PE of an SS- PW, or a T-PE of an MS-PW. c. Switching PE failure: Link or node failure of an S-PE of an MS- PW. SB> Suggest Switching PE (S-PE) failure or S-PE failure =========== Primary egress AC: CE2-PE2 SB> Egress is PE2-CE2 - it is a vector. Backup ingress AC: CE1-PE3 Backup ingress PE: PE3 Backup PW: PW2 Backup egress PE: PE4 Backup egress AC: CE2-PE4 SB> As above Based on this schema, this document describes egress endpoint failures and the fast protection mechanism on the per-active-path and per-direction basis. In this case, an egress AC failure refers to the failure of the AC CE2-PE2, and an egress node failure refers to SB> Shouldn't that be PE2-CE2 - that is the egress packet direction. the failure of PE2. The ultimate goal is that when a failure occurs, the traffic should be locally repaired, so that it can eventually reach CE2 via the backup egress PE (PE4) and the backup egress AC (CE2-PE4). SB> Again PE4-CE2 would be a more natural direction for an egress Subsequent to the local repair, either the active path should heal after control plane converges on the new topology, or the ingress CE should switch traffic from the primary path to the backup path, depending on the failure scenario. In the later case, the ingress CE may perform such switchover based on end-to-end OAM (in-band or out- band), PW status notification, CE-PE control protocols (e.g. LACP), etc. In the active-standby mode, this will promote the standby path to new active path. In the active-active mode, it will make the other active path carry all the traffic. SB> Surely in active-active it was already carrying all the traffic? ======== In this document, the following primary and backup roles are assigned for the traffic going from CE1 to CE2: Primary ingress AC: CE1-TPE1 Primary ingress T-PE: TPE1 Primary PW: PW1 Primary S-PE: SPE1 Primary egress T-PE: TPE2 Primary egress AC: CE2-TPE2 SB> Same comment as for SS-PW concerning directionality Backup ingress AC: CE1-TPE3 Backup ingress T-PE: TPE3 Backup PW: PW2 Backup S-PE: SPE2 Backup egress T-PE: TPE4 Backup egress AC: CE2-TPE4 SB> Same comment as for SS-PW concerning directionality ========= 4.1. Applicability The mechanism is applicable to LDP signaled PWs. It is applicable to an environment where an egress CE is multi-homed to a primary PE and a backup PE and there exists a backup PW in the network. In S-PE node protection, it also assumes a backup S-PE on the backup PW. The mechanism assumes IP/MPLS transport tunnels for PWs. If transport tunnels are LDP and there is a possibility of EMCP to a primary PE, it is recommended to enable control word for PWs. SB> Suggest: it is recommended that the PW control word (CW) is used. Imagine a scenario where an LDP tunnel traverse a router with ECMP to the primary PE, and the ECMP includes a direct link to the primary PE. If a PW does not have control word, its traffic may be forwarded SB> Suggest: If a PW is not using the CW... in a load balance fashion over multiple branches of the ECMP, including this link. When the link fails, the router will treat it as an egress PE failure and reroute the portion of traffic traversing the link. Meanwhile, the rest of traffic will remain on the other ECMP branches to the primary PE. This will create a situation where the egress CE will receive traffic from both primary PE and backup PE, which may not be desirable for a service sensitive to packet misordering. SB> I am not sure yet, how you are detecting failure, but if it is VCCV SB> then ECMP may mean that you get the CC packet but do not get all the SB> data traffic. SB> SB> You should also consider the FAT PW case. The mechanism is also assumed to be used in conjunction with global repair and control plane repair, in such a manner that the mechanism temporarily repairs traffic by using a bypass tunnel, and global repair and control plane repair eventually move traffic to a fully functional path. ========= A PLR can realize its role based on configuration or the signaling of transport tunnel. For example, in the case where the transport tunnel is signaled by RSVP, the penultimate hop router can realize that it is the PLR for egress (T-)PE or S-PE failure based on the RRO in Resv message, which should indicate that the router is one hop away from the PE. The detail of how this could be achieved on a per- protocol basis is out of the scope of this document. SB> I don't see how the above works. Particularly in an LDP case which SB> from earlier text seems to be in scope. SB> P4 is just a P router and knows nothing about the traffic it carries SB> over its LSP. How does it know to send this traffic to SPE2, but SB> other FRR traffic to some other node to try to bypass the failed SB> link/node? In all scenarios, when a PLR reroutes traffic through a bypass tunnel to a protector during local repair, it MUST keep the label of the primary PW intact in the packets. This obviates the need for the PLR to maintain bypass routes on a per-PW basis, and allows a bypass tunnel to be shared by multiple PWs. The procedure also requires that the protector SHOULD be able to SB> Surely that is a MUST? forward the traffic based on a PW label that is assigned by the primary PE, and ensure the traffic to eventually reach the target CE. SB> Suggest: and ensure that the traffic eventually reaches the target CE. From the protector's perspective, this PW label is an upstream assigned label (RFC 5331). To accomplish this, the protector SHOULD SB> [RFC5331] is the normal reference style in an RFC SB> again surely it is a MUST? learning the PW label from the primary PE prior to the failure, and SB>s/learning/learn/ install proper forwarding state for the PW label in a dedicated label space associated with the primary PE. During local repair, the protector SHOULD perform PW label lookup in this label space. SB>Surely s/SHOULD/MUST/ ============ 4.3.1. Semantics The semantics of a context identifier is twofold. o It identifies a primary PE and an associated protector. In other words, it identifies a primary PE on a per protector basis. A given primary PE may be protected by multiple protectors, each for a subset of the primary PWs terminated on the primary PE. A distinct context identifier MUST be assigned to the primary PE and each protector. For each primary PW, its ingress PE MUST set up or resolve a transport tunnel with destination as the context identifier of the {primary PE, protector}, rather than a private IP address of the primary PE. This not only allows the transport tunnel to reach the primary PE, but also conveys the identity of the protector to the PLR(s) along the transport tunnel. Each PLR can in turn use this information to set up a bypass tunnel to the protector without relying on local configuration. o It indentifies the primary PE's label space on the protector. The protector may protect PWs for multiple primary PEs. For each primary PE, it MUST maintain a separate label space to store the SB> What is "it" surely not the contest identifier which is the subject SB> of this section - a CA is an identifier not an object that holds a SB> label space. ======== 4.4.2. Centralized Protector In this model, the protector is a dedicated P router or PE router that serves the role. In egress AC protection and egress PE node protection, the protector MAY or MAY NOT be a backup PE with a direct SB> Nits objects to the MAY or MAY NOT, and in any case it is not SB> RFC2119 material, since you are stating a fact not permission SB> to the implementer. connection to the target CE. In S-PE node protection, the protector MAY or MAY NOT be a backup S-PE on the backup PW. SB> As above ========== 6.1. Egress Protection Capability TLV A protector MUST advertise the Egress Protection Capability TLV in its Initialization message and Capability message, over the LDP session with a primary PE. In the centralized protector model, the protector MUST also advertise the TLV over the LDP session with a backup PE. The TLV carries one or multiple context identifiers. To Yimin Shen, et al. Expires November 7, 2015 [Page 21] Internet-Draft PW Endpoint Fast Failure Protection May 2015 the primary PE, the TLV SHOULD carry the context identifier of the {primary PE, protector}. In the centralized protector model, the TLV SHOULD carry to the backup PE multiple context identifiers, one for each {primary PE, protector} where the backup PE serves as a backup for the primary PE. This TLV SHOULD NOT be advertised by the primary PE or the backup PE to the protector. SB> Shouldn't the SHOULDs all be MUSTS? Same for the following text The processing of the Egress Protection Capability TLV by a receiving router SHOULD follow the procedures defined in RFC 5561. In particular, the router SHOULD advertise PW information to the protector by using the Protection FEC Element TLV, only after it has received the Egress Protection Capability TLV from the protector. It SHOULD validate each context identifier included in the TLV, and advertise the information of only those PWs that are associated with the context identifier. It SHOULD withdraw previously advertised Protection FEC TLVs, when the protector has withdrawn a previously advertised context identifier or the entire Egress Protection Capability TLV via Capability message. ========== 7. IANA Considerations This document defines the encoding of the Capability Parameter TLV for the new "Egress Protection Capability" in Section 6. This would require IANA to assign a TLV Code Point to it. SB> You need to specify this exactly as IANA would lay it out SB> in there registry This document defines a new LDP Protection FEC Element TLV in Section 6. IANA has assigned the type value 0x83 to it. SB> You should specify the registry that IANA has made this assignment it. 8. Security Considerations The security considerations discussed in RFC 5036, RFC 5331, RFC 3209, and RFC 4090 apply to this document. SB> Are you sure that there are no new security considerations? SB> If not make that explicit statement. ======== - Stewart
- [Pals] draft-ietf-pals-endpoint-fast-protection Stewart Bryant