Re: [Last-Call] Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11
Greg Mirsky <gregimirsky@gmail.com> Thu, 12 November 2020 21:43 UTC
Return-Path: <gregimirsky@gmail.com>
X-Original-To: last-call@ietfa.amsl.com
Delivered-To: last-call@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 63F623A0997; Thu, 12 Nov 2020 13:43:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r9j3vNI7Uzhy; Thu, 12 Nov 2020 13:43:32 -0800 (PST)
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 658273A0995; Thu, 12 Nov 2020 13:43:31 -0800 (PST)
Received: by mail-lf1-x129.google.com with SMTP id u18so10693044lfd.9; Thu, 12 Nov 2020 13:43:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=iH4oHCt7qYFV0rPKesP3LTqaXTFcZbc86sOXTLCVa/w=; b=NnHfU+SyLoaqsFRMyMHOhz6DGiqE+ZwPvqyRwoPuqSZz/NYuYxzL4iTIPZqHNScKcL Eu2mICIUAD71jzhB2GmtLEoGJUs5D9itE+gUA7wIQadZbE+QOu0nRIIC8sGWQCN64lre Ae5a8GhpHbwcd8110OxQRkLBCISr6cepfIYzwYlLxrd/d66tfMaja/aPRkLA+fyHf0po bpH0Zuh+HA1uke3VJB+yPJLQSUUw5R/9gBqeTiLTS7x9RDlRCBW2fS+y3QsiPW4uExsg Uq7nDrTkS9pR54oKo9JHX/6Ejr6APAQLEowgcqa0oqJDjjSkOWvvdgUp39cHYg2y2T8/ yeKg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iH4oHCt7qYFV0rPKesP3LTqaXTFcZbc86sOXTLCVa/w=; b=SLFJyj8fCTPtx0mg0u/Tv+RhOS3DNBjnaZ+uHgds7cnMTA6rnXaNq6jQdzV08zQ2Gz cYkoOas94uQeEsBnvgyEj3y/KunGG9UNZA+lDzDoit/xeySDiIQ9qtK7LoV/eMFOkyqF DjbPbJyVYFGqf1wHwSRHU7cu+hzLjfH9xPMh8Go6WO4k9Hc4RM/5Fiu+lUN+1dI+2bLi RZ7k/ObxOI+ilMqKuL35bOmZ+TmPwDq5ofwddx9gFLM634YH0fWPIU/m+JII4a1SmHwF nhphzfL4SU40eMFFWUMuEXwqFEzltavAo8F1nONkzCRPFONhjSpS/dwTyNVoMrRHS/Nh lZtA==
X-Gm-Message-State: AOAM533kgeA1jLz9CoBDebts6GEwCuBd7WmW6B/tOzzgSbMxM/XXPppO HOc1W1wrXBthnYU/x9lkWafJm6rMrIA+5gtRi4E=
X-Google-Smtp-Source: ABdhPJy1S6HWiasypZb9cWa6EVCRjR5e+JexmJLVNBIkUiuwDt8S5k2jrP95I2TgSaCQJXBwV0WbbD/PLjrHAgDsY+Y=
X-Received: by 2002:ac2:544d:: with SMTP id d13mr513840lfn.500.1605217409302; Thu, 12 Nov 2020 13:43:29 -0800 (PST)
MIME-Version: 1.0
References: <160345656094.22100.7057001737682109381@ietfa.amsl.com> <CA+RyBmVXOrQu2Efs9nojMTOWyy09Cd4XEYS8a5HF+18C+_X1Nw@mail.gmail.com> <DM6PR15MB237965003540C4F0679A45DEE3E80@DM6PR15MB2379.namprd15.prod.outlook.com> <CA+RyBmVpBzJOCjs-QFxV6RTNeduQxuBta+FKpeGbtWq_zX=T0w@mail.gmail.com> <DM6PR15MB2379F1203196C3015C583ECBE3E70@DM6PR15MB2379.namprd15.prod.outlook.com> <CA+RyBmUeVu7f2a7tB6SJBwnYEUsrtr5MtFwqTkJ8J++X=3XVow@mail.gmail.com> <DM6PR15MB2379A3F41D9EDE7A22BC7DBAE3E70@DM6PR15MB2379.namprd15.prod.outlook.com>
In-Reply-To: <DM6PR15MB2379A3F41D9EDE7A22BC7DBAE3E70@DM6PR15MB2379.namprd15.prod.outlook.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Thu, 12 Nov 2020 13:43:18 -0800
Message-ID: <CA+RyBmWrwHMk8f-oZaW52sjGZuJ9KHeNM-8EBpOrc5KRZZm9og@mail.gmail.com>
To: Daniel Migault <daniel.migault@ericsson.com>
Cc: "secdir@ietf.org" <secdir@ietf.org>, BESS <bess@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>, "draft-ietf-bess-mvpn-fast-failover.all@ietf.org" <draft-ietf-bess-mvpn-fast-failover.all@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000078059905b3efcda5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/last-call/iKcBZPOP-mb14-dg_RIINBg0oXA>
Subject: Re: [Last-Call] Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11
X-BeenThere: last-call@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Last Calls <last-call.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/last-call>, <mailto:last-call-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/last-call/>
List-Post: <mailto:last-call@ietf.org>
List-Help: <mailto:last-call-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/last-call>, <mailto:last-call-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Nov 2020 21:43:36 -0000
Hi Daniel, thank you for spotting it, fixed. I enjoyed our discussion and, again, thank you for the review and your thoughtful comments. Best regards, Greg On Thu, Nov 12, 2020 at 1:32 PM Daniel Migault <daniel.migault@ericsson.com> wrote: > Thanks for the response and explanation. I am fine with the text you > proposed and I consider all my concerns being addressed. > > I am reading your text as implicitly suggesting the following lines of > your response, which seems reasonable. > > """ > it is difficult to make any assumptions on how the convergence in the > control plane may impact the forwarding plane and what effect that will > have on a multicast flow. I think that very much depends on the > implementation and the HW capabilities of the platform used. > """ > > There one additional nit s/sectionSection/Section/ which I think comes > from the conversion. > > Thanks for all your clarifications! > > Yours, > Daniel > > ------------------------------ > *From:* Greg Mirsky <gregimirsky@gmail.com> > *Sent:* Thursday, November 12, 2020 3:14 PM > *To:* Daniel Migault <daniel.migault@ericsson.com> > *Cc:* secdir@ietf.org <secdir@ietf.org>; BESS <bess@ietf.org>; > last-call@ietf.org <last-call@ietf.org>; > draft-ietf-bess-mvpn-fast-failover.all@ietf.org < > draft-ietf-bess-mvpn-fast-failover.all@ietf.org> > *Subject:* Re: Secdir last call review of > draft-ietf-bess-mvpn-fast-failover-11 > > Hi Daniel, > thank you for the additional information. I understand your concerns and > agree that it is helpful to provide implementors and operators with useful > information about the potential impact the new functionality may > demonstrate in the network and how to mitigate the risks. I believe it is > important to recognize that this draft proposes mechanisms that expedite > the failure detection in a P-tunnel from the perspective of a downstream > PE. And the detection directly impacts the control plane, not the data > plane. I believe that it is difficult to make any assumptions on how the > convergence in the control plane may impact the forwarding plane and what > effect that will have on a multicast flow. I think that very much depends > on the implementation and the HW capabilities of the platform used. I've > moved the new text from Section 3.1 into the Security Considerations > section. Please let me know if you think that is a more appropriate place > for that paragraph. > Also, I've realized that the text I've proposed earlier that refers to 1:N > and N:M protection might be the source of questions and arguments and would > like to withdraw it. I hope you'll agree. > > I've attached the new diff reflecting the changes in the working version. > > Regards, > Greg > > > On Thu, Nov 12, 2020 at 9:02 AM Daniel Migault < > daniel.migault@ericsson.com> wrote: > > Hi Greg, > > Thanks for the response Greg. This seems to go in the right direction, but > I think it would be nice to detail a bit on the negative impact that may > result from the fast-fail over. > > """ > unnecessary failover negatively impacting the multicast service > """ > > I apology to appear being maybe a bit picky, but, at least to me, the > security consideration section is the place to point on specific impacts > that an operator may not have thought of and the text appears to me a bit > too vague on what can impact negatively the multicast service. > > Let me dig a bit on what I mean and probably what information I would have > expected to find. Maybe that would have been useful I provided those > earlier. Again, not being an expert in this area, please take my > following recommendations with a pitch of salt. > > What I would like, for example, to understand is whether having a > fast-failover between nodes that work properly results in a packet lost or > not. > I also envision that in some cases, this will result in packet re-ordering > which might result in packet being rejected by the end node. > In IPsec vpns, we have specific counters, keys that make fail-over > relatively complex as a context has to be maintained between the old and > the new node to pass anti replay protection and enable appropriated > encryption/decryption. It would be good to clarify if any parameters need - > or not - to be synchronized between the two nodes as its transfer > represents a risk of disrupting the traffic, and thus may be mentioned. > There probably other points I am missing due to my lack of expertise - > especially those due to operational practices. > I believe that any information that you could think of that would > encourage you to double check/validate a network outage is present over > performing the fast failover might be useful information. > Similarly, it would be good to mention cases where an operator may choose > not to deploy such mechanism. > > Yours, > Daniel > > ------------------------------ > *From:* Greg Mirsky <gregimirsky@gmail.com> > *Sent:* Thursday, November 12, 2020 10:47 AM > *To:* Daniel Migault <daniel.migault@ericsson.com> > *Cc:* secdir@ietf.org <secdir@ietf.org>; BESS <bess@ietf.org>; > last-call@ietf.org <last-call@ietf.org>; > draft-ietf-bess-mvpn-fast-failover.all@ietf.org < > draft-ietf-bess-mvpn-fast-failover.all@ietf.org> > *Subject:* Re: Secdir last call review of > draft-ietf-bess-mvpn-fast-failover-11 > > Hi Daniel, > thank you for your kind consideration of my notes. I've top-copied what > appeared to me as the remaining open issues. I hope I've not missed any of > your questions. Please find my notes in-line below tagged GIM>>. Attached > are the updated working version and the new diff. > > Regards, > Greg > > <mglt> > sure. If you know the network is down, then fast fail-over is definitively > a plus. What I think could be useful is to evaluate the cost associated to > a fast-fail-over without any network failure. This would be useful for an > operator to evaluate whether it should spend more time in diagnosing a > network failure versus performing a fast-fail-over. > Typically, if a fast failover comes a no cost at all, one operator would > maybe use one exchange to test the liveness of a node rather than 3. > > At that point, it seems to me that additional text coudl be added to > characterize the impact. These could be high level and indicative, but it > seems to me that knowing these impacts presents some value to the > operators. > </mglt> > GIM>> I would like to add a new paragraph in Section 3.1: > NEW TEXT: > All methods described in this section may produce false-negative > state changes that can be the trigger for an unnecessary failover > negatively impacting the multicast service provided by the VPN. An > operator expected to consider the network environment and use > available controls of the mechanism used to determine the status of a > P-tunnel. > > Would the new text be helpful? > > <mglt> > Thanks for the feed back, It seems to me important to mention it is not > recommended these two mechanism co-exist. > How to avoid false negative transition might be out of scope of the draft > I agree, but it seems to me worth being mentioned especially in relation to > the impacts associated to a fail-over. In case the fast-failover comes > with no impact this becomes less of a problem for operator deploying it. > > </mglt> > GIM>> I hope that the new text presented above addresses this concern. > > <mglt> > I understand the document is addressing a 1:N scenario. That said, if M:N > scenario leverage from 1:N protection it seems to me worth raising the > issue. > </mglt> > GIM>> I propose adding the clarification of the use of the Sandby PE in > Section 4: > OLD TEXT: > The procedures described below are limited to the case where the site > that contains C-S is connected to two or more PEs, though, to > simplify the description, the case of dual-homing is described. > NEW TEXT: > The procedures described below are limited to the case where the site > that contains C-S is connected to two or more PEs, though, to > simplify the description, the case of dual-homing is described. Such > a redundancy protection scheme, referred to as 1:N protection, is the > special case of M:N protection, where M working instances are sharing > protection of the N standby instances. In addition to a network > failure detection mechanism, the latter scheme requires using a > mechanism to coordinate the failover among working instances. For > that reason, M:N protection is outside the scope of this > specification. > > On Wed, Nov 11, 2020 at 8:48 AM Daniel Migault < > daniel.migault@ericsson.com> wrote: > > Hi Greg, > > Thanks for the response and clarifications. Most of my comments have been > addressed/answered. However, it seems to me that some additional text might > be added to the security consideration section document the impact on the > network of a fast-failover operation. The knowledge of these impact might > be useful for an operator to determine when the trigger can be done. > > Please see more comments inline. > > Yours, > Daniel > > ------------------------------ > *From:* Greg Mirsky <gregimirsky@gmail.com> > *Sent:* Tuesday, November 10, 2020 9:13 PM > *To:* Daniel Migault <daniel.migault@ericsson.com> > *Cc:* secdir@ietf.org <secdir@ietf.org>; BESS <bess@ietf.org>; > last-call@ietf.org <last-call@ietf.org>; > draft-ietf-bess-mvpn-fast-failover.all@ietf.org < > draft-ietf-bess-mvpn-fast-failover.all@ietf.org> > *Subject:* Re: Secdir last call review of > draft-ietf-bess-mvpn-fast-failover-11 > > Hi Daniel, > many thanks for the review, thoughtful comments, and questions, all are > much appreciated. Also, my apologies for the long delay to respond to your > comments. Please find my answers and notes in-line below tagged by GIM>>. > Attached are the new working version and the diff to -12. > > Regards, > Greg > > On Fri, Oct 23, 2020 at 5:36 AM Daniel Migault via Datatracker < > noreply@ietf.org> wrote: > > Reviewer: Daniel Migault > Review result: Has Nits > > Hi, > > > I reviewed this document as part of the Security Directorate's ongoing > effort to > review all IETF documents being processed by the IESG. These comments were > written primarily for the benefit of the Security Area Directors. Document > authors, document editors, and WG chairs should treat these comments just > like > any other IETF Last Call comments. Please note also that my expertise in > BGP is > limited, so feel free to take these comments with a pitch of salt. > > Review Results: Has Nits > > Please find my comments below. > > Yours, > Daniel > > > Multicast VPN Fast Upstream Failover > draft-ietf-bess-mvpn-fast-failover-11 > > Abstract > > This document defines multicast VPN extensions and procedures that > allow fast failover for upstream failures, by allowing downstream PEs > to take into account the status of Provider-Tunnels (P-tunnels) when > selecting the Upstream PE for a VPN multicast flow, and extending BGP > MVPN routing so that a C-multicast route can be advertised toward a > Standby Upstream PE. > > <mglt> > Though it might be just a nit, if MVPN > designates multicast VPN, it might be > clarifying to specify the acronym in the > first sentence. This would later make > the correlation with BGP MVPN clearer. > > </mglt> > > GIM>> I've updated s/BGP MVPN/BGP multicast VPN/. Also, s/mVPN/MVPN/ > throughout the document. > > > > 1. Introduction > > In the context of multicast in BGP/MPLS VPNs, it is desirable to > provide mechanisms allowing fast recovery of connectivity on > different types of failures. This document addresses failures of > elements in the provider network that are upstream of PEs connected > to VPN sites with receivers. > > <mglt> > Well I am not familiar with neither BGP > nor MPLS. It seems that BGP/MLPS IP VPNS > and MPLS/BGP IP VPNs are both used. I am > wondering if there is a distinction > between the two and a preferred way to > designate these VPNs. My understanding > is that the VPN-IPv4 characterizes the > VPN while MPLS is used by the backbone > for the transport. Since the PE are > connected to the backbone the VPN-IPv4 > needs to be labeled. > > </mglt> > > GIM>> I understand that this document often sends the reader to check RFC > 6513 and/or RFC 6514. BGP/MPLS MVPN identifies the case of providing a > multicast service over an IP VPN that is overlayed on the MPLS data plane > using the BGP control plane. > > > Section 3 describes local procedures allowing an egress PE (a PE > connected to a receiver site) to take into account the status of > P-tunnels to determine the Upstream Multicast Hop (UMH) for a given > (C-S, C-G). This method does not provide a "fast failover" solution > <mglt> > I understand the limitation is due to > BGP convergence. > > </mglt> > > GIM>> Yes, a dynamic routing protocol, BGP in this case, provides the > service restoration functionality but the restoration time is significant > and affects the experience of a client. > > when used alone, but can be used together with the mechanism > described in Section 4 for a "fast failover" solution. > > Section 4 describes protocol extensions that can speed up failover by > not requiring any multicast VPN routing message exchange at recovery > time. > > Moreover, section 5 describes a "hot leaf standby" mechanism, that > uses a combination of these two mechanisms. This approach has > similarities with the solution described in [RFC7431] to improve > failover times when PIM routing is used in a network given some > topology and metric constraints. > > > [...] > > 3.1.1. mVPN Tunnel Root Tracking > > A condition to consider that the status of a P-tunnel is up is that > the root of the tunnel, as determined in the x-PMSI Tunnel attribute, > is reachable through unicast routing tables. In this case, the > downstream PE can immediately update its UMH when the reachability > condition changes. > > That is similar to BGP next-hop tracking for VPN routes, except that > the address considered is not the BGP next-hop address, but the root > address in the x-PMSI Tunnel attribute. > > If BGP next-hop tracking is done for VPN routes and the root address > of a given tunnel happens to be the same as the next-hop address in > the BGP A-D Route advertising the tunnel, then checking, in unicast > routing tables, whether the tunnel root is reachable, will be > unnecessary duplication and thus will not bring any specific benefit. > > <mglt> > It seems to me that x-PMSI address > designates a different interface than > the one used by the Tunnel itself. If > that is correct, such mechanisms seems > to assume that one equipment up on one > interface will be up on the other > interfaces. I have the impression that a > configuration change in a PE may end up > in the P-tunnel being down, while the PE > still being reachable though the x-PMSI > Tunnel attribute. If that is a possible > scenario, the current mechanisms may not > provide more efficient mechanism than > then those of the standard BGP. > > GIM>> That is a very interesting angle, thank you. Yes, in OAM, and in the > Fault Management (FM) OAM in particular, we have to make some > assumptions about the state of the remote system based on a single event or > change of state. Usually, AFAIK, operators use not a physical interface but > a loopback to associate with a tunnel. With a fast IGP convergence, a > loopback interface is reachable as long as there's a path through the > network between two nodes. > <mglt> > Thanks for the clarification > </mglt> > > > Similarly, it is assumed the tunnel is > either up or down and the determination > of not being up if being down. I am not > convinced that the two only states. > Typically services under DDoS may be > down for a small amount of time. While > this affects the network, there is not > always a clear cut between the PE being > up or down. > </mglt> > > GIM>> In defect detection a system often has some hysteresis, i.e., time > that the system has to wait to change its state. For example, BFD changes > state from Up to Down after the system does not receive N consecutive > packets (usually 3). As a result, in some cases, the system can be tuned to > detect relatively short outages while in others be slower and miss > short-lived outages. > > > > [...] > > 3.1.6. BFD Discriminator Attribute > > P-tunnel status may be derived from the status of a multipoint BFD > session [RFC8562] whose discriminator is advertised along with an > x-PMSI A-D Route. > > This document defines the format and ways of using a new BGP > attribute called the "BFD Discriminator". It is an optional > transitive BGP attribute. In Section 7.2, IANA is requested to > allocate the codepoint value (TBA2). The format of this attribute is > shown in Figure 1. > > <mglt> > I feel that the sentence "In Section ... > TBA2)." should be removed. > > </mglt> > > GIM>> We use this to mark where to note the allocated value. Usually, this > text is replaced by the RFC Editor to read > > In Section 7.2 IANA allocated codepoint XXX. > > > > > 0 1 2 3 > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | BFD Mode | Reserved | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | BFD Discriminator | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > ~ Optional TLVs ~ > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > Figure 1: Format of the BFD Discriminator Attribute > > Where: > > BFD Mode field is the one octet long. This specification defines > the P2MP BFD Session as value 1 Section 7.2. > > Reserved field is three octets long, and the value MUST be zeroed > on transmission and ignored on receipt. > > BFD Discriminator field is four octets long. > > > > > > Morin, et al. Expires April 5, 2021 [Page 7] > > Internet-Draft mVPN Fast Upstream Failover October 2020 > > > Optional TLVs is the optional variable-length field that MAY be > used in the BFD Discriminator attribute for future extensions. > TLVs MAY be included in a sequential or nested manner. To allow > for TLV nesting, it is advised to define a new TLV as a variable- > length object. Figure 2 presents the Optional TLV format TLV that > consists of: > > * one octet-long field of TLV 's Type value (Section 7.3) > > * one octet-long field of the length of the Value field in octets > > * variable length Value field. > > The length of a TLV MUST be multiple of four octets. > <mglt> > I am wondering why the constraint on the > length is not mentioned in the paragraph > associated to the field - as opposed to > a separate paragraph. > > </mglt> > > GIM>> There might be a slight confusion due to the use of Length and > length. Capitalized - the name of the field which value is the length of > the Value field. The last sentence refers to the overall length of a TLV, > including lengths of Type, Length and Value fields. > > <mglt> > you are correct that might have confused me. > </mglt> > > > [..] > > 8. Security Considerations > > This document describes procedures based on [RFC6513] and [RFC6514] > and hence shares the security considerations respectively represented > in these specifications. > > This document uses p2mp BFD, as defined in [RFC8562], which, in turn, > is based on [RFC5880]. Security considerations relevant to each > protocol are discussed in the respective protocol specifications. An > implementation that supports this specification MUST use a mechanism > to control the maximum number of p2mp BFD sessions that can be active > at the same time. > > <mglt> > At a high level view - or at least my > interpretation of it - the document > proposes a mechanism based on BFD to > detect fault in the path. Upon a fault > detection a fail-over operation is > instructed using BGP. This rocedure is > expected to perform a faster fail-over > than traditional BGP convergence on > maintaining routing tables. Once the > fail over has been performed, BFD is > confirms the new path is "legitimate" > and works. > > It seems correct to me that the current > protocol relies on BGP / BFD security. > That said, having BFD authentication > based on MD5 or SHA1 may suggest that > stronger primitives be recommended. > While this does not concerns the current > document, it seems to me that the > information might be relayed to routing > ADs. > > What remains unclear to me - and I > assume this might be due to my lake or > expertise in routing area - is the impact > associated to performing a fail-over > both on 1) the data plane and 2) the > standard BGP way to establish routing > tables. > > Regarding the data plane, I am wondering > if fail-over results in a lost of > packets for example - I suppose for > example that at least the packets in the > process of being forwarded might be > lost. I believe that providing details > on this may be good. > > GIM>> You bring up a very topic for the discussion, thank you. With > network failure detection in place, the fail-over can be viewed as the > reaction to a network failure. If that is the case, then packet loss > experienced by service due to the fail-over is the result of the network > failure. Would you agree with that view? A shorter failure detection > interval and faster fail-over should minimize the packet loss and, as a > result, the negative impact on the service itself. > > <mglt> > sure. If you know the network is down, then fast fail-over is definitively > a plus. What I think could be useful is to evaluate the cost associated to > a fast-fail-over without any network failure. This would be useful for an > operator to evaluate whether it should spend more time in diagnosing a > network failure versus performing a fast-fail-over. > Typically, if a fast failover comes a no cost at all, one operator would > maybe use one exchange to test the liveness of a node rather than 3. > > At that point, it seems to me that additional text coudl be added to > characterize the impact. These could be high level and indicative, but it > seems to me that knowing these impacts presents some value to the > operators. > </mglt> > > If there are any impacts I would like to > understand also in which cases the > decision to perform a failover operation > may result in more harm than the event > that has been over-interpreted. An > hypothetical scenario could be that the > non reception of a BFD packet is > interpreted as a PE being down while it > may not be correct and the PE might have > been simply under stress. A "too fast" fail-over > may over interpreted it and perform a > fail-over. If such things could happen, > an attacker could leverage a micro event > to perform network operation that are > not negligible. Another way to see that > is that an attacker might not have > direct access to the control plan, but > could use the data plan to generate a > stress and sort of control the fail > over. It seems to me that some text > might be welcome to prevent such cases > to happen. This could be guidance for > declaring a tunnel down for example. > > GIM>> I agree with your scenario. Over-short detection interval may > produce a false-negative transition to the Down state in BFD and thus > triggering the fail-over. I think that that is more an operational issue, > something that an operator will consider when deploying the mechanism > specified in this draft. Resulting from addressing RtgDir review the draft > was updated to provide more guidance: > In many cases, it is not practical to use both protection > methods at the same time because uncorrelated timers might cause > unnecessary switchovers and destabilize the network. > <mglt> > Thanks for the feed back, It seems to me important to mention it is not > recommended these two mechanism co-exist. > How to avoid false negative transition might be out of scope of the draft > I agree, but it seems to me worth being mentioned especially in relation to > the impacts associated to a fail-over. In case the fast-failover comes > with no impact this becomes less of a problem for operator deploying it. > > </mglt> > Though the text above might not be general, I think that it also applies > to the scenario you've presented. > > > Similarly, it would be good to add some > text regarding the interferences with > the non-fast forwarding fail over when > performed by the standard BGP. > Typically, my impression is that the > fast fail-over mechanism is a local > decision versus the BGP convergence that > is more global. As a result, even with > more time this two mechanisms may come > with different outcomes. One such > example to illustrate my purpose could > be the following. Note that this is only > illustrative of my purpose, and I let > you find and pick on ethat is more > appropriated. I am thinking of a case > where a standby PE is be shared among > multiple PEs - supposing this situation > could occur. Typically, if PE_1, PE_2 > are shared by PE_a, ..., PE_z. In case > PE_a and PE_b are down, we expect PE_a > to switch to PE_1 and PE_b to switch to > PE_2. It seems to me that BGP would end > up in such situation while a local > decision may end up in PE_a and PE_a to > switch to PE_1. > > </mglt> > > GIM>> Thank you for the scenario that is very common in deploying > protection based on the shared redundant resources. Such schemes, referred > to as M:N protection, in addition to using mechanism detecting a network > failure, e.g., BFD, require a protocol to coordinate the switchover. This > specification applies to a more special deployment scenario where one > working PE is protected by one or more standby PEs, i.e., 1:N protection. > > <mglt> > I understand the document is addressing a 1:N scenario. That said, if M:N > scenario leverage from 1:N protection it seems to me worth raising the > issue. > </mglt> > >
- [Last-Call] Secdir last call review of draft-ietf… Daniel Migault via Datatracker
- Re: [Last-Call] Secdir last call review of draft-… Greg Mirsky
- Re: [Last-Call] Secdir last call review of draft-… Greg Mirsky
- Re: [Last-Call] Secdir last call review of draft-… Daniel Migault
- Re: [Last-Call] Secdir last call review of draft-… Greg Mirsky
- Re: [Last-Call] Secdir last call review of draft-… Daniel Migault
- Re: [Last-Call] Secdir last call review of draft-… Greg Mirsky
- Re: [Last-Call] Secdir last call review of draft-… Daniel Migault
- Re: [Last-Call] Secdir last call review of draft-… Greg Mirsky