[Pals] Shepherds review of draft-ietf-pals-endpoint-fast-protection-02
Stewart Bryant <stewart.bryant@gmail.com> Thu, 09 June 2016 11:59 UTC
Return-Path: <stewart.bryant@gmail.com>
X-Original-To: pals@ietfa.amsl.com
Delivered-To: pals@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2B62512D0CC for <pals@ietfa.amsl.com>; Thu, 9 Jun 2016 04:59:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p18NLopbJTIQ for <pals@ietfa.amsl.com>; Thu, 9 Jun 2016 04:59:36 -0700 (PDT)
Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com [IPv6:2a00:1450:400c:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 11BA512D528 for <pals@ietf.org>; Thu, 9 Jun 2016 04:59:36 -0700 (PDT)
Received: by mail-wm0-x232.google.com with SMTP id v199so104050275wmv.0 for <pals@ietf.org>; Thu, 09 Jun 2016 04:59:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:subject:to:cc:message-id:date:user-agent:mime-version :content-transfer-encoding; bh=LhV0ylOPMlHUitVyqcfdrQpAq43pnAvIx1/Kvk1/xNE=; b=NlBW2wQY0Ev1NyUjvuodd0DcC3qGbTUeI1mluFX5DJkW9O/6zVurEKU/HHmpz//WAv JRZHda42wZGoI3SQZgSKLRMVTUfqI4jRrmipU5vE9YXSrZD3R7kjal0fwGgGBvLjMhOW /ZlEM4Zei1XqKiz6750sHvHUkY1c28MZ9rM38jFPNZ5MJz3A1qU4oeqOSYvTUYdE1etQ 7tb/RewHiQcB1WhzeZrN8aVScK73LXC1KWI3z1l2leho95tVmEK1JtmBKk4XvNECSDfN zCXUq3JnpJTcvfuftLV1w//ifZhgMyULfGeIshV6r0A/rvpb5VCxtHx6sK8P8ZOQECOF 2R4A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:subject:to:cc:message-id:date:user-agent :mime-version:content-transfer-encoding; bh=LhV0ylOPMlHUitVyqcfdrQpAq43pnAvIx1/Kvk1/xNE=; b=gyAd71187O0neqpwELmH46DYeMEY6QT8d/glyE8KK1YBNEcSbVyKpzuZrvDC+Lc1dq GxFlMuEqbnLKAr/pWavpHvAAfXM6hO0ghQgpdb6HTSnOKLi5Co5UHYi5APOyt7FJ8RR3 kSXIthnyQ02wOjRHXo48KQp2K1PbGeMSFH76LT+BbuOTZWJT3pQZ1y0teuy5wbJxrX8j tiGqjk3tVEHna2OOsL3osj+bOUZFMtK0D1k6MaAjUqKjlkkkZ5hQWiQFyS67/N3669Br eAWbSv8GHnBlqQ9FhUxUk6Su5YMtpBPohoC07EocHqimaPR+oPyuBMW2fmrOppiw2G4M 5trw==
X-Gm-Message-State: ALyK8tI06zTiK+p0cEUo1jtEYX0xRxJhLJQOowDsK5ryYNsGkjNC5C4OTMiVy1al5rgakg==
X-Received: by 10.194.203.37 with SMTP id kn5mr9425478wjc.42.1465473574124; Thu, 09 Jun 2016 04:59:34 -0700 (PDT)
Received: from [192.168.2.126] (host213-123-124-182.in-addr.btopenworld.com. [213.123.124.182]) by smtp.gmail.com with ESMTPSA id s125sm29880567wms.14.2016.06.09.04.59.32 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 09 Jun 2016 04:59:33 -0700 (PDT)
From: Stewart Bryant <stewart.bryant@gmail.com>
To: draft-ietf-pals-endpoint-fast-protection@tools.ietf.org
Message-ID: <0b99da76-f9b5-7855-fabc-6a91c449bc8d@gmail.com>
Date: Thu, 09 Jun 2016 12:59:30 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/pals/lzsGYsNLMvQTRNI87Sr32xUx-ms>
Cc: "pals-chairs@tools.ietf.org" <pals-chairs@tools.ietf.org>, "pals@ietf.org" <pals@ietf.org>
Subject: [Pals] Shepherds review of draft-ietf-pals-endpoint-fast-protection-02
X-BeenThere: pals@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Pseudowire And LDP-enabled Services dicussion list." <pals.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pals>, <mailto:pals-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/pals/>
List-Post: <mailto:pals@ietf.org>
List-Help: <mailto:pals-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pals>, <mailto:pals-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jun 2016 11:59:39 -0000
Hi All, Sorry this has taken a long time, but I took over as shepherd a few weeks ago and have done the customary review of this draft. There are no fundamental issues with the text, but there are a lot of must-fix issues that we need to address before we can take this forward. Please see below for details. Regards - Stewart Firstly I ran id-nits and got a lot of output.I am not sure what is going on with the references because I-D nits is flagging a lot of them up. I think that it is because it cannot findthem in square brackets in the text. Annoying as it is we need to fix them rather than having lots of people work through the nits list. == Unused Reference: 'RFC3985' is defined on line 1280, but no explicit reference was found in the text == Unused Reference: 'RFC5659' is defined on line 1285, but no explicit reference was found in the text == Unused Reference: 'RFC5036' is defined on line 1301, but no explicit reference was found in the text == Unused Reference: 'RFC2205' is defined on line 1310, but no explicit reference was found in the text == Unused Reference: 'RFC3209' is defined on line 1315, but no explicit reference was found in the text == Unused Reference: 'RFC3031' is defined on line 1345, but no explicit reference was found in the text == Unused Reference: 'RFC2328' is defined on line 1350, but no explicit reference was found in the text == Unused Reference: 'RFC5920' is defined on line 1375, but no explicit reference was found in the text These definately need to be fixed if we can. ** Downref: Normative reference to an Informational RFC: RFC 3985 RFC3985 can be safely moved to info. ** Downref: Normative reference to an Informational RFC: RFC 5659 RFC5659 can be safely moved to info. ** Downref: Normative reference to an Informational RFC: RFC 5714 RFC5714 can be safely moved to info. Summary: 3 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above ----------------- 1. Introduction In order to protect the layer-2 service against network failures, it SB> PWs can carry other than L2, so it think this should be "the PW service" --------------- The bypass path has the property that it can guide traffic around the failure, while remaining unaffected by the topology changes resulting from the failure. When the failure occurs, the router can invoke the bypass path to achieve fast SB> which router is "the router" restoration for the service. Today, fast protection against ingress AC failure and ingress (T-)PE failure can be achievable by using a multi-homed CE and redundant SB> nit s/achievable/achieved/ ACs, such as multi-chassis link aggregation group (MC-LAG). Fast protection against failure of intermediate router of transport tunnel SB> nit should be "against the failure of an intermediate" can be achievable through RSVP fast-reroute [RFC4090] or IP/LDP fast- SB> s/achievable/achieved/ reroute [RFC5714, RFC5286]. However, there is a lack of equivalent SB> "of an equivalent" mechanism against egress AC failure, egress (T-)PE failure, and S-PE SB> Maybe you should say: "However, there is no equivalent mechanism that can be used against an egress AC failure, an egress (T-)PE failure, or an S-PE failure." failure. For these failures, service restoration has to rely on global repair or control plane repair. Global repair normally involves ingress CE or ingress (T-)PE switching traffic to another SB> "involves the ingress CE or the ingress" SB> You cannot say "another fully function path" since the first one SB< failed and thus was not functioning. SB> I would say "to an alternative path" fully functional path, based on remote failure detection via PW status notification, end-to-end OAM, etc. The mechanism is applicable to LDP signaled PWs. It is relevant to networks with redundant PWs and multi-homed CEs. It is designed on the basis of MPLS upstream label assignment and context-specific label switching [RFC5331]. Fast protection refers to its ability to restore traffic in the order of tens of milliseconds. Compared with global repair and control plane repair, this mechanism can provide faster service restoration. However, it is intended to complement those mechanisms, rather than replacing them in any fashion. SB> d/in any fashion/ ============ 3.1. Single-Segment PW |<-------------- PW1 --------------->| - PE1 -------------- P1 ---------------- PE2 - / \ / \ CE1 CE2 \ / \ / - PE3 -------------- P2 ---------------- PE4 - |<-------------- PW2 --------------->| Figure 1 In Figure 1, the IP/MPLS network consists of PE and P routers. It provides an emulation of a layer-2 service between CE1 and CE2. Each SB> It may or may not be L2. Why not say "it provides a PW service between CE1 and CE2? ============== Note that an egress endpoint failure of the traffic of a given direction may be detected by the egress CE as an ingress endpoint failure for the traffic of the reverse direction, except when the failure is on a link of the primary egress PE within the PSN, or when the traffic of the reverse direction takes a different active path. SB> The text above needs clarification - it is not immediately obvious SB> what you mean. If the CE can detect the failure, it may protect the traffic of the reverse direction by switching it to the backup path. However, this is categorized as ingress endpoint failure protection, and hence is not handled by this mechanism. SB> This is presumably "the mechanism described in this document" Figure 2 shows another possible scenario, where CE1 is single-homed to PE1, while CE2 remains multi-homed to PE2 and PE4. From the perspective of egress endpoint protection for the traffic going from CE1 to CE2 over PW1, this scenario is not much different than Figure 1. SB> Do you mean "not much different", in which case you need to explain SB> the difference, or (as I suspect is the case) do you mean is the same? ================ 4.1. Applicability The mechanism is applicable to LDP signaled PWs in an environment where an egress CE is multi-homed to a primary PE and a backup PE and there exists a backup PW, as described in Section 3. The procedure for S-PE node protection is applicable when there exists a backup S-PE on the backup PW. The mechanism assumes IP/MPLS transport tunnels. In a network where transport tunnels may provide ECMP to primary PEs, care should be taken to prevent misordered packet delivery during local repair. Imagine a scenario where the transport tunnel of a PW traverses a router with ECMP to a primary PE, and the ECMP include a direct link to the primary PE. Normally the router will attempt to forward PW packets in a load balance fashion over the ECMP, including this link. In this document, when the link fails, the router will treat the event as an egress PE failure, and reroute the portion of traffic on the link towards a backup PE. Meanwhile, the rest of the traffic will remain on the other ECMP branches to the primary PE. This will create a situation where the egress CE receives traffic from both the primary PE and the backup PE, which is undesirable if the PW or flows within the PW are sensitive to packet misordering. Therefore, the mechanism assumes that Control Word (CW) SHOULD be used for PWs and flow labels [RFC6391] SHOULD be used for flows within a PW, whenever applicable. SB> Unless you are saying use the sequence number in the PW (which SB> is hardly deployed at all) some form of misordering is inevitable, SB> since the path/queue lengths in the old and the new path (in both SB> repair and restore) will be different. The mechanisms that you SB> cite prevent misorder during normal operation, but not during SB> a transient. ============== 4.2. Local Repair and Protector The fast protection ability of the mechanism comes from local repair performed by routers upstream adjacent to failures. Each of these routers is referred to as a "point of local repair" (PLR). A PLR MUST be able to detect a failure by using a rapid mechanism, such as physical layer failure detection, Bidirectional Failure Detection (BFD) [RFC5880], etc. In anticipation of the failure, the PLR MUST also pre-establish a bypass tunnel to a "protector", and pre-install a bypass route in the data plane. The bypass tunnel MUST have the SB> I know what you mean, but unfortunately unanticipated SRLGs SB> are a problem. I think you mean "to be sure protection will work SB> ...." However a word about unexpected SRLG is probably warranted. 4.3. Context Identifier A protector may protect multiple primary PEs. The protector MUST maintain a separate label space for each primary PE. Likewise, the PWs terminated on a primary PE may be protected by multiple protectors, each for a subset of the PWs. In any case, a given PW MUST be associated with one and only one pair of {primary PE, protector}. This document introduces the notion of "context identifier" to facilitate protection establishment. A context identifier is an IPv4/v6 address assigned to an ordered pair of {primary PE, protector}. The address MUST be globally unique, or unique in the address space of the network where the primary PE and the protector reside. SB> Maybe we will get to it, but there can of course be many protectors SB> depending on the available ECMP paths between the x-PEs, and SB> all of them need to be included in this mechanism and assigned SB> one of these context addresses. SB> SB> Can you get a PLR to PLR loop? SB> A section that you will need to include, or at least provide text on SB> is manageability. ================ o A context identifier indicates the primary PE's label space on the protector. The protector may protect PWs for multiple primary PEs. For each primary PE, it MUST maintain a separate label space to store the PW labels assigned by that primary PE. It associates a PW label with a label space via the context identifier of the {primary PE, protector}, as below. In addition to the normal LDP PW signaling, the primary PE MUST have a targeted LDP session with the protector, and advertise PW labels to the protector via LDP Label Mapping messages (Section 6). The primary PE MUST attach the context identifier to each message. Upon receiving the message, the protector MUST install the advertised PW label in the label space identified by the context identifier. When a PLR sets up or resolves a bypass tunnel to the protector, it MUST use the context identifier rather than a private IP address of the protector as destination. The protector MUST use the bypass tunnel, either the MPLS tunnel label or IP tunnel destination address, as the pointer to the corresponding label space. The protector MUST forwards PW packets received on the bypass tunnel based on label lookup in that label space. SB> Some label diagrams could be really helpful, and it would be useful SB> to go carefully through the text in this section and edit it for SB> readability. I found it quite difficult. =================== 4.4.1. Co-located Protector In this model, the protector is a backup PE that is directly connected to the target CE via a backup AC, or it is a backup S-PE on a backup PW. That is, the protector is co-located with the backup (S-)PE. Examples of this model have been shown in Figure 4, Figure 5 and Figure 6 in Section 4.2. In egress AC protection and egress PE node protection, when a protector receives traffic from the PLR, it forwards the traffic to the CE via the backup AC. This is shown in Figure 7, where PE2 is the PLR for egress AC failure, P3 is the PLR for PE2 failure, and PE4 (the backup PE) is the protector. |<-------------- PW1 --------------->| - PE1 -------------- P1 ------- P3 ----- PE2 ---- / PLR \ PLR \ / \ | \ CE1 bypass\ |bypass CE2 \ \ | / \ \ | / - PE3 -------------- P2 ---------------- PE4 ---- protector |<-------------- PW2 --------------->| Figure 7 In S-PE node protection, when a protector receives traffic from the PLR, it forwards the traffic over the next segment of the backup PW. The T-PE of the backup PW in turn forwards the traffic to the CE via a backup AC. This is shown in Figure 8, where P4 is the PLR for SPE1 failure, and SPE2 (the backup S-PE) is the protector for SPE1. SB> I really would like to know what the label stacks look like. ============== 4.4.2. Centralized Protector In this model, the protector is a dedicated P router or PE router that serves the role. In egress AC protection and egress PE node protection, the protector may or may not be a backup PE with a direct connection to the target CE. In S-PE node protection, the protector may or may not be a backup S-PE on the backup PW. In egress AC protection and egress PE node protection, when the protector receives traffic from the PLR, if the protector has a direct connection (i.e. backup AC) to the CE, it forwards the traffic to the CE via the backup AC, which is similar to Figure 7. Otherwise, it forwards the traffic to a backup PE, which then forwards the traffic to the CE via a backup AC. This is shown in Figure 9, where the protector receives traffic from P3 (the PLR for egress PE failure) or PE2 (the PLR for egress AC failure) and forwards the traffic to PE4 (the backup PE). The protector may be protecting other PWs and other primary PEs as well, which is not shown in this figure for clarity. SB> The above seems a bit fluffy in terms of who does what to the stack ================= 4.6. Bypass Tunnel A PLR may protect multiple PWs associated with one or multiple pairs of {primary PE, protector}. The PLR MUST establish a bypass tunnel to each protector for each context identifier associated with that protector. The destination of the bypass tunnel MUST be the context Yimin Shen, et al. Expires July 30, 2016 [Page 18] Internet-Draft PW Endpoint Fast Failure Protection January 2016 identifier (Section 4.3.1). Since the PLR is a transit router of the transport tunnel, it SHOULD derive the context identifier from the destination of the transport tunnel. For examples, in Figure 7 and Figure 9, a bypass tunnel is established from PE2 (PLR for egress AC failure) to the protector, and another bypass tunnel is established from P3 (PLR for egress node failure) to the protector. In Figure 8 and Figure 10, a bypass tunnel is established from P4 (PLR for S-PE failure) to the protector. In local repair, a PLR reroutes traffic to the protector through a bypass tunnel, with PW label intact in the packets. This normally involves pushing a label to the label stack, if the bypass tunnel is an MPLS tunnel, or pushing an IP header to the packets, if the bypass SB> There are some assumptions about the IP protection path. SB> Do we need to consider the IP case at all in practice? tunnel is an IP tunnel. Upon receipt of the packets, the protector forwards them based on the PW label. Specifically, the protector uses the bypass tunnel as a context to determine the primary PE's label space. If the bypass tunnel is an MPLS tunnel, the protector should have assigned a non-reserved label to the bypass tunnel, and hence this label can serve as the context. If the bypass tunnel is an IP tunnel, the context identifier should be the destination address of IP header. To be useful for local repair, a bypass tunnel MUST have the property that it is not affected by any topology changes caused by the failure. It should remain effective during local repair, until the traffic is moved to another fully functional path, i.e. either the same PW over a fully functional transport tunnel, or another fully functional PW. SB> To avoid repair loops you need to not repair repairs - this SB> should be explicitly stated. =============== 4.7.1. Examples of Co-located Protector e In Figure 8, SPE2 is a co-located protector that protects PW1 SB> Typo in line above(I think) against S-PE failure. It maintains a label space for SPE1, which is identified by the context identifier of {SPE1, SPE2}. It learns SEG1's label from SPE1, and installs a forwarding entry in the label space. The nexthop of the forwarding entry indicates a label swap to SEG4's label. ============== 9. Acknowledgements Thanks to Nischal Sheth and Bhupesh Kothari for their contribution. Thanks to John E Drake, Andrew G Malis, Alexander Vainshtein, Steward SB>Please s/Steward/Stewart/ Bryant, and Mach Chen for valuable comments that helped shape this document and improve its clarity. ========================== ==========================
- Re: [Pals] Shepherds review of draft-ietf-pals-en… Yimin Shen
- Re: [Pals] Shepherds review of draft-ietf-pals-en… Yimin Shen
- [Pals] Shepherds review of draft-ietf-pals-endpoi… Stewart Bryant