Re: [Last-Call] Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11

Greg Mirsky <gregimirsky@gmail.com> Thu, 12 November 2020 21:43 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: last-call@ietfa.amsl.com
Delivered-To: last-call@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 63F623A0997; Thu, 12 Nov 2020 13:43:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r9j3vNI7Uzhy; Thu, 12 Nov 2020 13:43:32 -0800 (PST)
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 658273A0995; Thu, 12 Nov 2020 13:43:31 -0800 (PST)
Received: by mail-lf1-x129.google.com with SMTP id u18so10693044lfd.9; Thu, 12 Nov 2020 13:43:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=iH4oHCt7qYFV0rPKesP3LTqaXTFcZbc86sOXTLCVa/w=; b=NnHfU+SyLoaqsFRMyMHOhz6DGiqE+ZwPvqyRwoPuqSZz/NYuYxzL4iTIPZqHNScKcL Eu2mICIUAD71jzhB2GmtLEoGJUs5D9itE+gUA7wIQadZbE+QOu0nRIIC8sGWQCN64lre Ae5a8GhpHbwcd8110OxQRkLBCISr6cepfIYzwYlLxrd/d66tfMaja/aPRkLA+fyHf0po bpH0Zuh+HA1uke3VJB+yPJLQSUUw5R/9gBqeTiLTS7x9RDlRCBW2fS+y3QsiPW4uExsg Uq7nDrTkS9pR54oKo9JHX/6Ejr6APAQLEowgcqa0oqJDjjSkOWvvdgUp39cHYg2y2T8/ yeKg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iH4oHCt7qYFV0rPKesP3LTqaXTFcZbc86sOXTLCVa/w=; b=SLFJyj8fCTPtx0mg0u/Tv+RhOS3DNBjnaZ+uHgds7cnMTA6rnXaNq6jQdzV08zQ2Gz cYkoOas94uQeEsBnvgyEj3y/KunGG9UNZA+lDzDoit/xeySDiIQ9qtK7LoV/eMFOkyqF DjbPbJyVYFGqf1wHwSRHU7cu+hzLjfH9xPMh8Go6WO4k9Hc4RM/5Fiu+lUN+1dI+2bLi RZ7k/ObxOI+ilMqKuL35bOmZ+TmPwDq5ofwddx9gFLM634YH0fWPIU/m+JII4a1SmHwF nhphzfL4SU40eMFFWUMuEXwqFEzltavAo8F1nONkzCRPFONhjSpS/dwTyNVoMrRHS/Nh lZtA==
X-Gm-Message-State: AOAM533kgeA1jLz9CoBDebts6GEwCuBd7WmW6B/tOzzgSbMxM/XXPppO HOc1W1wrXBthnYU/x9lkWafJm6rMrIA+5gtRi4E=
X-Google-Smtp-Source: ABdhPJy1S6HWiasypZb9cWa6EVCRjR5e+JexmJLVNBIkUiuwDt8S5k2jrP95I2TgSaCQJXBwV0WbbD/PLjrHAgDsY+Y=
X-Received: by 2002:ac2:544d:: with SMTP id d13mr513840lfn.500.1605217409302; Thu, 12 Nov 2020 13:43:29 -0800 (PST)
MIME-Version: 1.0
References: <160345656094.22100.7057001737682109381@ietfa.amsl.com> <CA+RyBmVXOrQu2Efs9nojMTOWyy09Cd4XEYS8a5HF+18C+_X1Nw@mail.gmail.com> <DM6PR15MB237965003540C4F0679A45DEE3E80@DM6PR15MB2379.namprd15.prod.outlook.com> <CA+RyBmVpBzJOCjs-QFxV6RTNeduQxuBta+FKpeGbtWq_zX=T0w@mail.gmail.com> <DM6PR15MB2379F1203196C3015C583ECBE3E70@DM6PR15MB2379.namprd15.prod.outlook.com> <CA+RyBmUeVu7f2a7tB6SJBwnYEUsrtr5MtFwqTkJ8J++X=3XVow@mail.gmail.com> <DM6PR15MB2379A3F41D9EDE7A22BC7DBAE3E70@DM6PR15MB2379.namprd15.prod.outlook.com>
In-Reply-To: <DM6PR15MB2379A3F41D9EDE7A22BC7DBAE3E70@DM6PR15MB2379.namprd15.prod.outlook.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Thu, 12 Nov 2020 13:43:18 -0800
Message-ID: <CA+RyBmWrwHMk8f-oZaW52sjGZuJ9KHeNM-8EBpOrc5KRZZm9og@mail.gmail.com>
To: Daniel Migault <daniel.migault@ericsson.com>
Cc: "secdir@ietf.org" <secdir@ietf.org>, BESS <bess@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>, "draft-ietf-bess-mvpn-fast-failover.all@ietf.org" <draft-ietf-bess-mvpn-fast-failover.all@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000078059905b3efcda5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/last-call/iKcBZPOP-mb14-dg_RIINBg0oXA>
Subject: Re: [Last-Call] Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11
X-BeenThere: last-call@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Last Calls <last-call.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/last-call>, <mailto:last-call-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/last-call/>
List-Post: <mailto:last-call@ietf.org>
List-Help: <mailto:last-call-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/last-call>, <mailto:last-call-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Nov 2020 21:43:36 -0000

Hi Daniel,
thank you for spotting it, fixed.
 I enjoyed our discussion and, again, thank you for the review and your
thoughtful comments.

Best regards,
Greg

On Thu, Nov 12, 2020 at 1:32 PM Daniel Migault <daniel.migault@ericsson.com>
wrote:

> Thanks for the response and explanation. I am fine with the text you
> proposed and I consider all my concerns being addressed.
>
> I am reading your text as implicitly suggesting the following lines of
> your response, which seems reasonable.
>
> """
>  it is difficult to make any assumptions on how the convergence in the
> control plane may impact the forwarding plane and what effect that will
> have on a multicast flow. I think that very much depends on the
> implementation and the HW capabilities of the platform used.
> """
>
> There  one additional nit s/sectionSection/Section/ which I think comes
> from the conversion.
>
> Thanks for all your clarifications!
>
> Yours,
> Daniel
>
> ------------------------------
> *From:* Greg Mirsky <gregimirsky@gmail.com>
> *Sent:* Thursday, November 12, 2020 3:14 PM
> *To:* Daniel Migault <daniel.migault@ericsson.com>
> *Cc:* secdir@ietf.org <secdir@ietf.org>; BESS <bess@ietf.org>;
> last-call@ietf.org <last-call@ietf.org>;
> draft-ietf-bess-mvpn-fast-failover.all@ietf.org <
> draft-ietf-bess-mvpn-fast-failover.all@ietf.org>
> *Subject:* Re: Secdir last call review of
> draft-ietf-bess-mvpn-fast-failover-11
>
> Hi Daniel,
> thank you for the additional information. I understand your concerns and
> agree that it is helpful to provide implementors and operators with useful
> information about the potential impact the new functionality may
> demonstrate in the network and how to mitigate the risks. I believe it is
> important to recognize that this draft proposes mechanisms that expedite
> the failure detection in a P-tunnel from the perspective of a downstream
> PE. And the detection directly impacts the control plane, not the data
> plane. I believe that it is difficult to make any assumptions on how the
> convergence in the control plane may impact the forwarding plane and what
> effect that will have on a multicast flow. I think that very much depends
> on the implementation and the HW capabilities of the platform used. I've
> moved the new text from Section 3.1 into the Security Considerations
> section. Please let me know if you think that is a more appropriate place
> for that paragraph.
> Also, I've realized that the text I've proposed earlier that refers to 1:N
> and N:M protection might be the source of questions and arguments and would
> like to withdraw it. I hope you'll agree.
>
> I've attached the new diff reflecting the changes in the working version.
>
> Regards,
> Greg
>
>
> On Thu, Nov 12, 2020 at 9:02 AM Daniel Migault <
> daniel.migault@ericsson.com> wrote:
>
> Hi Greg,
>
> Thanks for the response Greg. This seems to go in the right direction, but
> I think it would be nice to detail a bit on the negative impact that may
> result from the fast-fail over.
>
> """
> unnecessary failover negatively impacting the multicast service
> """
>
> I apology to appear being maybe a bit picky, but, at least to me, the
> security consideration section is the place to point on specific impacts
> that an operator may not have thought of and the text appears to me a bit
> too vague on what can impact negatively the multicast service.
>
> Let me dig a bit on what I mean and probably what information I would have
> expected to find. Maybe that would have been useful I provided those
> earlier. Again, not being an expert in this area, please take my
> following recommendations with a pitch of salt.
>
> What I would like, for example, to understand is whether having a
> fast-failover between nodes that work properly results in a packet lost or
> not.
> I also envision that in some cases, this will result in packet re-ordering
> which might result in packet being rejected by the end node.
> In IPsec vpns, we have specific counters, keys that make fail-over
> relatively complex as a context has to be maintained between the old and
> the new node to pass anti replay protection and enable appropriated
> encryption/decryption. It would be good to clarify if any parameters need -
> or not - to be synchronized between the two nodes as its transfer
> represents a risk of disrupting the traffic, and thus may be mentioned.
> There probably other points I am missing due to my lack of expertise -
> especially those due to operational practices.
> I believe that any information that you could think of that would
> encourage you to double check/validate a network outage is present over
> performing the fast failover might be useful information.
> Similarly, it would be good to mention cases where an operator may choose
> not to deploy such mechanism.
>
> Yours,
> Daniel
>
> ------------------------------
> *From:* Greg Mirsky <gregimirsky@gmail.com>
> *Sent:* Thursday, November 12, 2020 10:47 AM
> *To:* Daniel Migault <daniel.migault@ericsson.com>
> *Cc:* secdir@ietf.org <secdir@ietf.org>; BESS <bess@ietf.org>;
> last-call@ietf.org <last-call@ietf.org>;
> draft-ietf-bess-mvpn-fast-failover.all@ietf.org <
> draft-ietf-bess-mvpn-fast-failover.all@ietf.org>
> *Subject:* Re: Secdir last call review of
> draft-ietf-bess-mvpn-fast-failover-11
>
> Hi Daniel,
> thank you for your kind consideration of my notes. I've top-copied what
> appeared to me as the remaining open issues. I hope I've not missed any of
> your questions. Please find my notes in-line below tagged GIM>>. Attached
> are the updated working version and the new diff.
>
> Regards,
> Greg
>
> <mglt>
> sure. If you know the network is down, then fast fail-over is definitively
> a plus. What I think could be useful is to evaluate the cost associated to
> a fast-fail-over without any network failure.  This would be useful for an
> operator to evaluate whether it should spend more time in diagnosing a
> network failure versus performing a fast-fail-over.
> Typically, if a fast failover comes a no cost at all, one operator would
> maybe use one exchange to test the liveness of a node rather than 3.
>
> At that point, it seems to me that additional text coudl be added to
> characterize the impact. These could be high level and indicative, but it
> seems to me that knowing these impacts presents some value to the
> operators.
> </mglt>
> GIM>> I would like to add a new paragraph in Section 3.1:
> NEW TEXT:
>    All methods described in this section may produce false-negative
>    state changes that can be the trigger for an unnecessary failover
>    negatively impacting the multicast service provided by the VPN.  An
>    operator expected to consider the network environment and use
>    available controls of the mechanism used to determine the status of a
>    P-tunnel.
>
> Would the new text be helpful?
>
> <mglt>
> Thanks for the feed back, It seems to me important to mention it is not
> recommended these two mechanism co-exist.
> How to avoid false negative transition might be out of scope of the draft
> I agree, but it seems to me worth being mentioned especially in relation to
> the impacts associated to a fail-over.  In case the fast-failover comes
> with no impact this becomes less of a problem for operator deploying it.
>
> </mglt>
> GIM>> I hope that the new text presented above addresses this concern.
>
> <mglt>
> I understand the document is addressing a 1:N scenario. That said, if M:N
> scenario leverage from 1:N protection it seems to me worth raising the
> issue.
> </mglt>
> GIM>> I propose adding the clarification of the use of the Sandby PE in
> Section 4:
> OLD TEXT:
>    The procedures described below are limited to the case where the site
>    that contains C-S is connected to two or more PEs, though, to
>    simplify the description, the case of dual-homing is described.
> NEW TEXT:
>    The procedures described below are limited to the case where the site
>    that contains C-S is connected to two or more PEs, though, to
>    simplify the description, the case of dual-homing is described.  Such
>    a redundancy protection scheme, referred to as 1:N protection, is the
>    special case of M:N protection, where M working instances are sharing
>    protection of the N standby instances.  In addition to a network
>    failure detection mechanism, the latter scheme requires using a
>    mechanism to coordinate the failover among working instances.  For
>    that reason, M:N protection is outside the scope of this
>    specification.
>
> On Wed, Nov 11, 2020 at 8:48 AM Daniel Migault <
> daniel.migault@ericsson.com> wrote:
>
> Hi Greg,
>
> Thanks for the response and clarifications. Most of my comments have been
> addressed/answered. However, it seems to me that some additional text might
> be added to the security consideration section document the impact on the
> network of a fast-failover operation. The knowledge of these impact might
> be useful for an operator to determine when the trigger can be done.
>
> Please see more comments inline.
>
> Yours,
> Daniel
>
> ------------------------------
> *From:* Greg Mirsky <gregimirsky@gmail.com>
> *Sent:* Tuesday, November 10, 2020 9:13 PM
> *To:* Daniel Migault <daniel.migault@ericsson.com>
> *Cc:* secdir@ietf.org <secdir@ietf.org>; BESS <bess@ietf.org>;
> last-call@ietf.org <last-call@ietf.org>;
> draft-ietf-bess-mvpn-fast-failover.all@ietf.org <
> draft-ietf-bess-mvpn-fast-failover.all@ietf.org>
> *Subject:* Re: Secdir last call review of
> draft-ietf-bess-mvpn-fast-failover-11
>
> Hi Daniel,
> many thanks for the review, thoughtful comments, and questions, all are
> much appreciated. Also, my apologies for the long delay to respond to your
> comments. Please find my answers and notes in-line below tagged by GIM>>.
> Attached are the new working version and the diff to -12.
>
> Regards,
> Greg
>
> On Fri, Oct 23, 2020 at 5:36 AM Daniel Migault via Datatracker <
> noreply@ietf.org> wrote:
>
> Reviewer: Daniel Migault
> Review result: Has Nits
>
> Hi,
>
>
> I reviewed this document as part of the Security Directorate's ongoing
> effort to
> review all IETF documents being processed by the IESG.  These comments were
> written primarily for the benefit of the Security Area Directors.  Document
> authors, document editors, and WG chairs should treat these comments just
> like
> any other IETF Last Call comments.  Please note also that my expertise in
> BGP is
> limited, so feel free to take these comments with a pitch of salt.
>
> Review Results: Has Nits
>
> Please find my comments below.
>
> Yours,
> Daniel
>
>
>                   Multicast VPN Fast Upstream Failover
>                  draft-ietf-bess-mvpn-fast-failover-11
>
> Abstract
>
>    This document defines multicast VPN extensions and procedures that
>    allow fast failover for upstream failures, by allowing downstream PEs
>    to take into account the status of Provider-Tunnels (P-tunnels) when
>    selecting the Upstream PE for a VPN multicast flow, and extending BGP
>    MVPN routing so that a C-multicast route can be advertised toward a
>    Standby Upstream PE.
>
> <mglt>
> Though it might be just a nit, if MVPN
> designates multicast VPN, it might be
> clarifying to specify the acronym in the
> first sentence. This would later make
> the correlation with BGP MVPN clearer.
>
> </mglt>
>
> GIM>> I've updated s/BGP MVPN/BGP multicast VPN/. Also, s/mVPN/MVPN/
> throughout the document.
>
>
>
> 1.  Introduction
>
>    In the context of multicast in BGP/MPLS VPNs, it is desirable to
>    provide mechanisms allowing fast recovery of connectivity on
>    different types of failures.  This document addresses failures of
>    elements in the provider network that are upstream of PEs connected
>    to VPN sites with receivers.
>
> <mglt>
> Well I am not familiar with neither BGP
> nor MPLS. It seems that BGP/MLPS IP VPNS
> and MPLS/BGP IP VPNs are both used. I am
> wondering if there is a distinction
> between the two and a preferred way to
> designate these VPNs.  My understanding
> is that the VPN-IPv4 characterizes the
> VPN while MPLS is used by the backbone
> for the transport.  Since the PE are
> connected to the backbone the VPN-IPv4
> needs to be labeled.
>
> </mglt>
>
> GIM>> I understand that this document often sends the reader to check RFC
> 6513 and/or RFC 6514. BGP/MPLS MVPN identifies the case of providing a
> multicast service over an IP VPN that is overlayed on the MPLS data plane
> using the BGP control plane.
>
>
>    Section 3 describes local procedures allowing an egress PE (a PE
>    connected to a receiver site) to take into account the status of
>    P-tunnels to determine the Upstream Multicast Hop (UMH) for a given
>    (C-S, C-G).  This method does not provide a "fast failover" solution
> <mglt>
> I understand the limitation is due to
> BGP convergence.
>
> </mglt>
>
> GIM>> Yes, a dynamic routing protocol, BGP in this case, provides the
> service restoration functionality but the restoration time is significant
> and affects the experience of a client.
>
>    when used alone, but can be used together with the mechanism
>    described in Section 4 for a "fast failover" solution.
>
>    Section 4 describes protocol extensions that can speed up failover by
>    not requiring any multicast VPN routing message exchange at recovery
>    time.
>
>    Moreover, section 5 describes a "hot leaf standby" mechanism, that
>    uses a combination of these two mechanisms.  This approach has
>    similarities with the solution described in [RFC7431] to improve
>    failover times when PIM routing is used in a network given some
>    topology and metric constraints.
>
>
> [...]
>
> 3.1.1.  mVPN Tunnel Root Tracking
>
>    A condition to consider that the status of a P-tunnel is up is that
>    the root of the tunnel, as determined in the x-PMSI Tunnel attribute,
>    is reachable through unicast routing tables.  In this case, the
>    downstream PE can immediately update its UMH when the reachability
>    condition changes.
>
>    That is similar to BGP next-hop tracking for VPN routes, except that
>    the address considered is not the BGP next-hop address, but the root
>    address in the x-PMSI Tunnel attribute.
>
>    If BGP next-hop tracking is done for VPN routes and the root address
>    of a given tunnel happens to be the same as the next-hop address in
>    the BGP A-D Route advertising the tunnel, then checking, in unicast
>    routing tables, whether the tunnel root is reachable, will be
>    unnecessary duplication and thus will not bring any specific benefit.
>
> <mglt>
> It seems to me that x-PMSI address
> designates a different interface than
> the one used by the Tunnel itself. If
> that is correct, such mechanisms seems
> to assume that one equipment up on one
> interface will be up on the other
> interfaces. I have the impression that a
> configuration change in a PE may end up
> in the P-tunnel being down, while the PE
> still being reachable though the x-PMSI
> Tunnel attribute. If that is a possible
> scenario, the current mechanisms may not
> provide more efficient mechanism than
> then those of the standard BGP.
>
> GIM>> That is a very interesting angle, thank you. Yes, in OAM, and in the
> Fault Management (FM) OAM in particular, we have to make some
> assumptions about the state of the remote system based on a single event or
> change of state. Usually, AFAIK, operators use not a physical interface but
> a loopback to associate with a tunnel. With a fast IGP convergence, a
> loopback interface is reachable as long as there's a path through the
> network between two nodes.
> <mglt>
> Thanks for the clarification
> </mglt>
>
>
> Similarly, it is assumed the tunnel is
> either up or down and the determination
> of not being up if being down.  I am not
> convinced that the two only states.
> Typically services under DDoS may be
> down for a small amount of time. While
> this affects the network, there is not
> always a clear cut between the PE being
> up or down.
> </mglt>
>
> GIM>>  In defect detection a system often has some hysteresis, i.e., time
> that the system has to wait to change its state. For example, BFD changes
> state from Up to Down after the system does not receive N consecutive
> packets (usually 3). As a result, in some cases, the system can be tuned to
> detect relatively short outages while in others be slower and miss
> short-lived outages.
>
>
>
> [...]
>
> 3.1.6.  BFD Discriminator Attribute
>
>    P-tunnel status may be derived from the status of a multipoint BFD
>    session [RFC8562] whose discriminator is advertised along with an
>    x-PMSI A-D Route.
>
>    This document defines the format and ways of using a new BGP
>    attribute called the "BFD Discriminator".  It is an optional
>    transitive BGP attribute.  In Section 7.2, IANA is requested to
>    allocate the codepoint value (TBA2).  The format of this attribute is
>    shown in Figure 1.
>
> <mglt>
> I feel that the sentence "In Section ...
> TBA2)." should be removed.
>
> </mglt>
>
> GIM>> We use this to mark where to note the allocated value. Usually, this
> text is replaced by the RFC Editor to read
>
> In Section 7.2 IANA allocated codepoint XXX.
>
>
>
>
>        0                   1                   2                   3
>        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>       |    BFD Mode   |                  Reserved                     |
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>       |                       BFD Discriminator                       |
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>       ~                         Optional TLVs                         ~
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>
>             Figure 1: Format of the BFD Discriminator Attribute
>
>    Where:
>
>       BFD Mode field is the one octet long.  This specification defines
>       the P2MP BFD Session as value 1 Section 7.2.
>
>       Reserved field is three octets long, and the value MUST be zeroed
>       on transmission and ignored on receipt.
>
>       BFD Discriminator field is four octets long.
>
>
>
>
>
> Morin, et al.             Expires April 5, 2021                 [Page 7]
>
> Internet-Draft         mVPN Fast Upstream Failover          October 2020
>
>
>       Optional TLVs is the optional variable-length field that MAY be
>       used in the BFD Discriminator attribute for future extensions.
>       TLVs MAY be included in a sequential or nested manner.  To allow
>       for TLV nesting, it is advised to define a new TLV as a variable-
>       length object.  Figure 2 presents the Optional TLV format TLV that
>       consists of:
>
>       *  one octet-long field of TLV 's Type value (Section 7.3)
>
>       *  one octet-long field of the length of the Value field in octets
>
>       *  variable length Value field.
>
>       The length of a TLV MUST be multiple of four octets.
> <mglt>
> I am wondering why the constraint on the
> length is not mentioned in the paragraph
> associated to the field - as opposed to
> a  separate paragraph.
>
> </mglt>
>
> GIM>> There might be a slight confusion due to the use of Length and
> length. Capitalized - the name of the field which value is the length of
> the Value field. The last sentence refers to the overall length of a TLV,
> including lengths of Type, Length and Value fields.
>
> <mglt>
> you are correct that might have confused me.
> </mglt>
>
>
> [..]
>
> 8.  Security Considerations
>
>    This document describes procedures based on [RFC6513] and [RFC6514]
>    and hence shares the security considerations respectively represented
>    in these specifications.
>
>    This document uses p2mp BFD, as defined in [RFC8562], which, in turn,
>    is based on [RFC5880].  Security considerations relevant to each
>    protocol are discussed in the respective protocol specifications.  An
>    implementation that supports this specification MUST use a mechanism
>    to control the maximum number of p2mp BFD sessions that can be active
>    at the same time.
>
> <mglt>
> At a high level view - or at least my
> interpretation of it - the document
> proposes a mechanism based on BFD to
> detect fault in the path.  Upon a fault
> detection a fail-over operation is
> instructed using BGP. This rocedure is
> expected to perform a faster fail-over
> than traditional BGP convergence on
> maintaining routing tables. Once the
> fail over has been performed, BFD is
> confirms the new path is "legitimate"
> and works.
>
> It seems correct to me that the current
> protocol relies on BGP / BFD security.
> That said, having BFD authentication
> based on MD5 or SHA1 may suggest that
> stronger primitives be recommended.
> While this does not concerns the current
> document, it seems to me that the
> information might be relayed to routing
> ADs.
>
> What remains unclear to me - and I
> assume this might be due to my lake or
> expertise in routing area - is the impact
> associated to performing a fail-over
> both on 1) the data plane and 2) the
> standard BGP way to establish routing
> tables.
>
> Regarding the data plane, I am wondering
> if fail-over results in a lost of
> packets for example - I suppose for
> example that at least the packets in the
> process of being forwarded might be
> lost. I believe that providing details
> on this may be good.
>
> GIM>> You bring up a very topic for the discussion, thank you. With
> network failure detection in place, the fail-over can be viewed as the
> reaction to a network failure.  If that is the case, then packet loss
> experienced by service due to the fail-over is the result of the network
> failure. Would you agree with that view? A shorter failure detection
> interval and faster fail-over should minimize the packet loss and, as a
> result, the negative impact on the service itself.
>
> <mglt>
> sure. If you know the network is down, then fast fail-over is definitively
> a plus. What I think could be useful is to evaluate the cost associated to
> a fast-fail-over without any network failure.  This would be useful for an
> operator to evaluate whether it should spend more time in diagnosing a
> network failure versus performing a fast-fail-over.
> Typically, if a fast failover comes a no cost at all, one operator would
> maybe use one exchange to test the liveness of a node rather than 3.
>
> At that point, it seems to me that additional text coudl be added to
> characterize the impact. These could be high level and indicative, but it
> seems to me that knowing these impacts presents some value to the
> operators.
> </mglt>
>
> If there are any impacts I would like to
> understand also in which cases the
> decision to perform a failover operation
> may result in more harm than the event
> that has been over-interpreted. An
> hypothetical scenario could be that the
> non reception of a BFD packet is
> interpreted as a PE being down while it
> may not be correct and the PE might have
> been simply under stress. A "too fast" fail-over
> may over interpreted it and perform a
> fail-over. If such things could happen,
> an attacker could leverage a micro event
> to perform network operation that are
> not negligible. Another way to see that
> is that an attacker might not have
> direct access to the control plan, but
> could use the data plan to generate a
> stress and sort of control the fail
> over. It seems to me that some text
> might be welcome to prevent such cases
> to happen. This could be guidance for
> declaring a tunnel down for example.
>
> GIM>> I agree with your scenario. Over-short detection interval may
> produce a false-negative transition to the Down state in BFD and thus
> triggering the fail-over. I think that that is more an operational issue,
> something that an operator will consider when deploying the mechanism
> specified in this draft. Resulting from addressing RtgDir review the draft
> was updated to provide more guidance:
>    In many cases, it is not practical to use both protection
>    methods at the same time because uncorrelated timers might cause
>    unnecessary switchovers and destabilize the network.
> <mglt>
> Thanks for the feed back, It seems to me important to mention it is not
> recommended these two mechanism co-exist.
> How to avoid false negative transition might be out of scope of the draft
> I agree, but it seems to me worth being mentioned especially in relation to
> the impacts associated to a fail-over.  In case the fast-failover comes
> with no impact this becomes less of a problem for operator deploying it.
>
> </mglt>
> Though the text above might not be general, I think that it also applies
> to the scenario you've presented.
>
>
> Similarly, it would be good to add some
> text regarding the interferences with
> the non-fast forwarding fail over when
> performed by the standard BGP.
> Typically, my impression is that the
> fast fail-over mechanism is a local
> decision versus the BGP convergence that
> is more global. As a result, even with
> more time this two mechanisms may come
> with different outcomes. One such
> example to illustrate my purpose could
> be the following. Note that this is only
> illustrative of my purpose, and I let
> you find and pick on ethat is more
> appropriated.   I am thinking of a case
> where a standby PE is be shared among
> multiple PEs - supposing this situation
> could occur.  Typically, if PE_1, PE_2
> are shared by PE_a, ..., PE_z. In case
> PE_a and PE_b are down, we expect PE_a
> to switch to PE_1 and PE_b to switch to
> PE_2. It seems to me that BGP would end
> up in such situation while a local
> decision may end up in PE_a and PE_a to
> switch to PE_1.
>
> </mglt>
>
> GIM>> Thank you for the scenario that is very common in deploying
> protection based on the shared redundant resources. Such schemes, referred
> to as M:N protection, in addition to using mechanism detecting a network
> failure, e.g., BFD, require a protocol to coordinate the switchover. This
> specification applies to a more special deployment scenario where one
> working PE is protected by one or more standby PEs, i.e., 1:N protection.
>
> <mglt>
> I understand the document is addressing a 1:N scenario. That said, if M:N
> scenario leverage from 1:N protection it seems to me worth raising the
> issue.
> </mglt>
>
>