Re: [secdir] Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11
Daniel Migault <daniel.migault@ericsson.com> Thu, 12 November 2020 21:33 UTC
Return-Path: <daniel.migault@ericsson.com>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2AE293A0997; Thu, 12 Nov 2020 13:33:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.101
X-Spam-Level:
X-Spam-Status: No, score=-2.101 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=ericsson.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6JoDC4zYLLab; Thu, 12 Nov 2020 13:33:23 -0800 (PST)
Received: from NAM02-BL2-obe.outbound.protection.outlook.com (mail-eopbgr750077.outbound.protection.outlook.com [40.107.75.77]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 93D993A0995; Thu, 12 Nov 2020 13:32:57 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Nd4dnjyMY3iew1WTMfeqsa8V/VFPN/XxW/5HJaoJ4TPZAi3/o1lnZYXm1u0UXu5HnTo5mwE9fy58qFNcCRDlhYm5FXmkfS+pyOPOdE67N0PxLu6KACxHHF/J0rm5EuzhT320hYQJOR02H4NfU9xgcmtn6b7eoC/Gde4sNd+sZj9KGNPEWRpRQESJifCSXDmEPOWFubVyCNByi+H6h3XqmmaXNlIWWXf2nyF6/4hTQJ/x1h106pw8UXyZjh/ONucj1Bw+K2j8N7Sn4CVC9spRwA1gGRWYFcRb8RuQRPzX2kOREjiHFmaFTaEJWtCv+1+fpL3yAlYqblaimR2Tb3aO6g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=C6+bTMgjLZuMqjUyaaNbN/VYtzn8KAqgVjrUx0aSjJI=; b=IFjaEGIYu0GSaGNTa7qDkhaV3LRWMFqrEs5YSi1UMIH63IKFklu+01VQ/qeb+JuPos4czdNPBzHe/z6tDcp2EO1t8J920I2fnD7QLOcqW+YcQRi+Zfa+GVFRSXJeB53/xAO/ueotU+yDFCcWCSxFpVxhxGD6qwHFuJycKinTy6kjbhI2Nk7FarE9ojPxV4yrt4N+rhwoKwyG3mhhdUYQ4RtEO+KZ5E8y72MKJhvZQV/irynTqPXwM4aLD1NOW1HWOmPtBrElKaFcst+9gHjD+i7cl5Rewht3Z7cdFeQuQzRVnoCY27J9xZFv3Wbldh2gwLX6Ce2GzQewAwAiLjdW4Q==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=ericsson.com; dmarc=pass action=none header.from=ericsson.com; dkim=pass header.d=ericsson.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=C6+bTMgjLZuMqjUyaaNbN/VYtzn8KAqgVjrUx0aSjJI=; b=s02CM7JhLntUpqxoJOhref+dWI8nz4zUm6pquzn7hf1hTszS03h9fiLPca/sPlZR2TdXOq4ojHbtfUnNf8AuaoxVnB75c9+nwKWOm5iK1zzZX82n9I9qwImoABvHFGWXLTx1vE7rbx6KqQkQO5VOx+Ahq9fZclFaRH8WuxU5nGU=
Received: from DM6PR15MB2379.namprd15.prod.outlook.com (2603:10b6:5:8a::16) by DM5PR15MB1481.namprd15.prod.outlook.com (2603:10b6:3:ca::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3541.21; Thu, 12 Nov 2020 21:32:51 +0000
Received: from DM6PR15MB2379.namprd15.prod.outlook.com ([fe80::28b4:8429:419b:15b0]) by DM6PR15MB2379.namprd15.prod.outlook.com ([fe80::28b4:8429:419b:15b0%3]) with mapi id 15.20.3541.025; Thu, 12 Nov 2020 21:32:51 +0000
From: Daniel Migault <daniel.migault@ericsson.com>
To: Greg Mirsky <gregimirsky@gmail.com>
CC: "secdir@ietf.org" <secdir@ietf.org>, BESS <bess@ietf.org>, "last-call@ietf.org" <last-call@ietf.org>, "draft-ietf-bess-mvpn-fast-failover.all@ietf.org" <draft-ietf-bess-mvpn-fast-failover.all@ietf.org>
Thread-Topic: Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11
Thread-Index: AQHWt9C5emGUONFgKUKxrX3Gy2SfzqnDA2lugAGjIICAAAkkm4AAQX+AgAASbnU=
Date: Thu, 12 Nov 2020 21:32:51 +0000
Message-ID: <DM6PR15MB2379A3F41D9EDE7A22BC7DBAE3E70@DM6PR15MB2379.namprd15.prod.outlook.com>
References: <160345656094.22100.7057001737682109381@ietfa.amsl.com> <CA+RyBmVXOrQu2Efs9nojMTOWyy09Cd4XEYS8a5HF+18C+_X1Nw@mail.gmail.com> <DM6PR15MB237965003540C4F0679A45DEE3E80@DM6PR15MB2379.namprd15.prod.outlook.com> <CA+RyBmVpBzJOCjs-QFxV6RTNeduQxuBta+FKpeGbtWq_zX=T0w@mail.gmail.com> <DM6PR15MB2379F1203196C3015C583ECBE3E70@DM6PR15MB2379.namprd15.prod.outlook.com>, <CA+RyBmUeVu7f2a7tB6SJBwnYEUsrtr5MtFwqTkJ8J++X=3XVow@mail.gmail.com>
In-Reply-To: <CA+RyBmUeVu7f2a7tB6SJBwnYEUsrtr5MtFwqTkJ8J++X=3XVow@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=ericsson.com;
x-originating-ip: [96.22.11.129]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: e11caf84-0c92-4236-6945-08d887528241
x-ms-traffictypediagnostic: DM5PR15MB1481:
x-microsoft-antispam-prvs: <DM5PR15MB14813FF6ED6E2EBD40444BB1E3E70@DM5PR15MB1481.namprd15.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:9508;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: c0sIjdaSx6IBVpsKgUn+lZISQG1xtAo4J52HNodgjMfDpuOqPAgYjnbgQPiR9n52ucyn9ppyRQ3hkGDPNm62DGEkLiKi4SVce614Kz4vXlcfmJuEHnsaBZ0kBXrZOVGTykuOuzkVoEh1X3r7Qt9ah3awN+8e6RbOFTesTUpYrPLA1Drb6ltv5W0U84vVoZWQpVYY+Mhg88NbiMN1ZczF29FlCLsUtZR1GjXYIiQpaH01vbchPaNZTucjVNLym/irBnxsd+eOZE6mvxMVabhrbMhYbUnhkKApdNHIyj9Iq6D4PcAPScr0Y9cfZf4VyLL5+iw6KMuaD1bcjhHQqFOuUbX+JW6hUbVuqj3PpPK4tiv0XvqqOFVoS6IrHnZFxjQk
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR15MB2379.namprd15.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(376002)(346002)(366004)(136003)(396003)(7696005)(55016002)(9686003)(478600001)(44832011)(54906003)(91956017)(64756008)(71200400001)(66476007)(316002)(19627405001)(76116006)(30864003)(26005)(66446008)(52536014)(8676002)(6506007)(8936002)(33656002)(83380400001)(5660300002)(2906002)(66574015)(6916009)(86362001)(186003)(53546011)(66946007)(4326008)(66556008)(21314003)(579004); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: ugcHvIeYgx2yeUCoqYMMNMcaHhLQKN9yBd6IfpTlKXuAhL3lCHW9EbA/sMqWjdoy01fYKfHVhesrrJgM4P4dZF/urhlwVJHmZladglui/3DaW7GG8D0JfugQDPialBwym9dccCuRPVkP3s+VBx/VXY0tomqpnNC0AcBvnBXjPZQEN9u43LN/yw2uFywBi1TcRmKbiGcrwiOpR1B8JZ+8zXhDOVppEK+B5W8ZQnO713shQa/oWpRYCNt7NSAtaF8RWToK4YkQB2fbosQemkug68LkrOtpMAml/7fNZ8nlpufaI2yK4cqyCwtPbNu9bHc9bG+BRT2fIsSEQ6lmhKgbc4m57219UbzEH6ferSQUQbZlMntGQAxARMmgSdZeJ4vP0fZZNAbCtbueRaZcKVpBm2tHPrrvTM1ppzkSvwPMvqgpgV/m4qpZs7RbgLo+U38hkvJ/2rPQ7T5Pr3Wlh1U3TBvVd+C/kzfQJN0BLdTy4kpaKAemopwpOEPkwf++sd5mIxRD3LN4b8OB3uoOJeWFh9GUmuewqlw2AxDusGHCDD0WXBEoIcvda3X+Gv72a8GQioLLco2CT6uHZQ+0E3OXKYYfJ0T8ETo4JvtdUYUqAuU1SwPVDe8I0oF5ANWpgt8+rlk4yImREPMFSmieja7jDw==
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_DM6PR15MB2379A3F41D9EDE7A22BC7DBAE3E70DM6PR15MB2379namp_"
MIME-Version: 1.0
X-OriginatorOrg: ericsson.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: DM6PR15MB2379.namprd15.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: e11caf84-0c92-4236-6945-08d887528241
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Nov 2020 21:32:51.4948 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 92e84ceb-fbfd-47ab-be52-080c6b87953f
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: Jv7kLZ5T2i7Fz09qX9jWcnsNYtDdwPKhjZgQjiLoShXiWb1W6qTKTUqxiTTMxiCF/e8aSsahZ4jj94kPpCbzMLJD8FGyo7EJJKNfPCtR5T4=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR15MB1481
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/1ZnBc19UNSEcoaM2rBIxN-Aj9X0>
Subject: Re: [secdir] Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Nov 2020 21:33:28 -0000
Thanks for the response and explanation. I am fine with the text you proposed and I consider all my concerns being addressed. I am reading your text as implicitly suggesting the following lines of your response, which seems reasonable. """ it is difficult to make any assumptions on how the convergence in the control plane may impact the forwarding plane and what effect that will have on a multicast flow. I think that very much depends on the implementation and the HW capabilities of the platform used. """ There one additional nit s/sectionSection/Section/ which I think comes from the conversion. Thanks for all your clarifications! Yours, Daniel ________________________________ From: Greg Mirsky <gregimirsky@gmail.com> Sent: Thursday, November 12, 2020 3:14 PM To: Daniel Migault <daniel.migault@ericsson.com> Cc: secdir@ietf.org <secdir@ietf.org>; BESS <bess@ietf.org>; last-call@ietf.org <last-call@ietf.org>; draft-ietf-bess-mvpn-fast-failover.all@ietf.org <draft-ietf-bess-mvpn-fast-failover.all@ietf.org> Subject: Re: Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11 Hi Daniel, thank you for the additional information. I understand your concerns and agree that it is helpful to provide implementors and operators with useful information about the potential impact the new functionality may demonstrate in the network and how to mitigate the risks. I believe it is important to recognize that this draft proposes mechanisms that expedite the failure detection in a P-tunnel from the perspective of a downstream PE. And the detection directly impacts the control plane, not the data plane. I believe that it is difficult to make any assumptions on how the convergence in the control plane may impact the forwarding plane and what effect that will have on a multicast flow. I think that very much depends on the implementation and the HW capabilities of the platform used. I've moved the new text from Section 3.1 into the Security Considerations section. Please let me know if you think that is a more appropriate place for that paragraph. Also, I've realized that the text I've proposed earlier that refers to 1:N and N:M protection might be the source of questions and arguments and would like to withdraw it. I hope you'll agree. I've attached the new diff reflecting the changes in the working version. Regards, Greg On Thu, Nov 12, 2020 at 9:02 AM Daniel Migault <daniel.migault@ericsson.com<mailto:daniel.migault@ericsson.com>> wrote: Hi Greg, Thanks for the response Greg. This seems to go in the right direction, but I think it would be nice to detail a bit on the negative impact that may result from the fast-fail over. """ unnecessary failover negatively impacting the multicast service """ I apology to appear being maybe a bit picky, but, at least to me, the security consideration section is the place to point on specific impacts that an operator may not have thought of and the text appears to me a bit too vague on what can impact negatively the multicast service. Let me dig a bit on what I mean and probably what information I would have expected to find. Maybe that would have been useful I provided those earlier. Again, not being an expert in this area, please take my following recommendations with a pitch of salt. What I would like, for example, to understand is whether having a fast-failover between nodes that work properly results in a packet lost or not. I also envision that in some cases, this will result in packet re-ordering which might result in packet being rejected by the end node. In IPsec vpns, we have specific counters, keys that make fail-over relatively complex as a context has to be maintained between the old and the new node to pass anti replay protection and enable appropriated encryption/decryption. It would be good to clarify if any parameters need - or not - to be synchronized between the two nodes as its transfer represents a risk of disrupting the traffic, and thus may be mentioned. There probably other points I am missing due to my lack of expertise - especially those due to operational practices. I believe that any information that you could think of that would encourage you to double check/validate a network outage is present over performing the fast failover might be useful information. Similarly, it would be good to mention cases where an operator may choose not to deploy such mechanism. Yours, Daniel ________________________________ From: Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>> Sent: Thursday, November 12, 2020 10:47 AM To: Daniel Migault <daniel.migault@ericsson.com<mailto:daniel.migault@ericsson.com>> Cc: secdir@ietf.org<mailto:secdir@ietf.org> <secdir@ietf.org<mailto:secdir@ietf.org>>; BESS <bess@ietf.org<mailto:bess@ietf.org>>; last-call@ietf.org<mailto:last-call@ietf.org> <last-call@ietf.org<mailto:last-call@ietf.org>>; draft-ietf-bess-mvpn-fast-failover.all@ietf.org<mailto:draft-ietf-bess-mvpn-fast-failover.all@ietf.org> <draft-ietf-bess-mvpn-fast-failover.all@ietf.org<mailto:draft-ietf-bess-mvpn-fast-failover.all@ietf.org>> Subject: Re: Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11 Hi Daniel, thank you for your kind consideration of my notes. I've top-copied what appeared to me as the remaining open issues. I hope I've not missed any of your questions. Please find my notes in-line below tagged GIM>>. Attached are the updated working version and the new diff. Regards, Greg <mglt> sure. If you know the network is down, then fast fail-over is definitively a plus. What I think could be useful is to evaluate the cost associated to a fast-fail-over without any network failure. This would be useful for an operator to evaluate whether it should spend more time in diagnosing a network failure versus performing a fast-fail-over. Typically, if a fast failover comes a no cost at all, one operator would maybe use one exchange to test the liveness of a node rather than 3. At that point, it seems to me that additional text coudl be added to characterize the impact. These could be high level and indicative, but it seems to me that knowing these impacts presents some value to the operators. </mglt> GIM>> I would like to add a new paragraph in Section 3.1: NEW TEXT: All methods described in this section may produce false-negative state changes that can be the trigger for an unnecessary failover negatively impacting the multicast service provided by the VPN. An operator expected to consider the network environment and use available controls of the mechanism used to determine the status of a P-tunnel. Would the new text be helpful? <mglt> Thanks for the feed back, It seems to me important to mention it is not recommended these two mechanism co-exist. How to avoid false negative transition might be out of scope of the draft I agree, but it seems to me worth being mentioned especially in relation to the impacts associated to a fail-over. In case the fast-failover comes with no impact this becomes less of a problem for operator deploying it. </mglt> GIM>> I hope that the new text presented above addresses this concern. <mglt> I understand the document is addressing a 1:N scenario. That said, if M:N scenario leverage from 1:N protection it seems to me worth raising the issue. </mglt> GIM>> I propose adding the clarification of the use of the Sandby PE in Section 4: OLD TEXT: The procedures described below are limited to the case where the site that contains C-S is connected to two or more PEs, though, to simplify the description, the case of dual-homing is described. NEW TEXT: The procedures described below are limited to the case where the site that contains C-S is connected to two or more PEs, though, to simplify the description, the case of dual-homing is described. Such a redundancy protection scheme, referred to as 1:N protection, is the special case of M:N protection, where M working instances are sharing protection of the N standby instances. In addition to a network failure detection mechanism, the latter scheme requires using a mechanism to coordinate the failover among working instances. For that reason, M:N protection is outside the scope of this specification. On Wed, Nov 11, 2020 at 8:48 AM Daniel Migault <daniel.migault@ericsson.com<mailto:daniel.migault@ericsson.com>> wrote: Hi Greg, Thanks for the response and clarifications. Most of my comments have been addressed/answered. However, it seems to me that some additional text might be added to the security consideration section document the impact on the network of a fast-failover operation. The knowledge of these impact might be useful for an operator to determine when the trigger can be done. Please see more comments inline. Yours, Daniel ________________________________ From: Greg Mirsky <gregimirsky@gmail.com<mailto:gregimirsky@gmail.com>> Sent: Tuesday, November 10, 2020 9:13 PM To: Daniel Migault <daniel.migault@ericsson.com<mailto:daniel.migault@ericsson.com>> Cc: secdir@ietf.org<mailto:secdir@ietf.org> <secdir@ietf.org<mailto:secdir@ietf.org>>; BESS <bess@ietf.org<mailto:bess@ietf.org>>; last-call@ietf.org<mailto:last-call@ietf.org> <last-call@ietf.org<mailto:last-call@ietf.org>>; draft-ietf-bess-mvpn-fast-failover.all@ietf.org<mailto:draft-ietf-bess-mvpn-fast-failover.all@ietf.org> <draft-ietf-bess-mvpn-fast-failover.all@ietf.org<mailto:draft-ietf-bess-mvpn-fast-failover.all@ietf.org>> Subject: Re: Secdir last call review of draft-ietf-bess-mvpn-fast-failover-11 Hi Daniel, many thanks for the review, thoughtful comments, and questions, all are much appreciated. Also, my apologies for the long delay to respond to your comments. Please find my answers and notes in-line below tagged by GIM>>. Attached are the new working version and the diff to -12. Regards, Greg On Fri, Oct 23, 2020 at 5:36 AM Daniel Migault via Datatracker <noreply@ietf.org<mailto:noreply@ietf.org>> wrote: Reviewer: Daniel Migault Review result: Has Nits Hi, I reviewed this document as part of the Security Directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the Security Area Directors. Document authors, document editors, and WG chairs should treat these comments just like any other IETF Last Call comments. Please note also that my expertise in BGP is limited, so feel free to take these comments with a pitch of salt. Review Results: Has Nits Please find my comments below. Yours, Daniel Multicast VPN Fast Upstream Failover draft-ietf-bess-mvpn-fast-failover-11 Abstract This document defines multicast VPN extensions and procedures that allow fast failover for upstream failures, by allowing downstream PEs to take into account the status of Provider-Tunnels (P-tunnels) when selecting the Upstream PE for a VPN multicast flow, and extending BGP MVPN routing so that a C-multicast route can be advertised toward a Standby Upstream PE. <mglt> Though it might be just a nit, if MVPN designates multicast VPN, it might be clarifying to specify the acronym in the first sentence. This would later make the correlation with BGP MVPN clearer. </mglt> GIM>> I've updated s/BGP MVPN/BGP multicast VPN/. Also, s/mVPN/MVPN/ throughout the document. 1. Introduction In the context of multicast in BGP/MPLS VPNs, it is desirable to provide mechanisms allowing fast recovery of connectivity on different types of failures. This document addresses failures of elements in the provider network that are upstream of PEs connected to VPN sites with receivers. <mglt> Well I am not familiar with neither BGP nor MPLS. It seems that BGP/MLPS IP VPNS and MPLS/BGP IP VPNs are both used. I am wondering if there is a distinction between the two and a preferred way to designate these VPNs. My understanding is that the VPN-IPv4 characterizes the VPN while MPLS is used by the backbone for the transport. Since the PE are connected to the backbone the VPN-IPv4 needs to be labeled. </mglt> GIM>> I understand that this document often sends the reader to check RFC 6513 and/or RFC 6514. BGP/MPLS MVPN identifies the case of providing a multicast service over an IP VPN that is overlayed on the MPLS data plane using the BGP control plane. Section 3 describes local procedures allowing an egress PE (a PE connected to a receiver site) to take into account the status of P-tunnels to determine the Upstream Multicast Hop (UMH) for a given (C-S, C-G). This method does not provide a "fast failover" solution <mglt> I understand the limitation is due to BGP convergence. </mglt> GIM>> Yes, a dynamic routing protocol, BGP in this case, provides the service restoration functionality but the restoration time is significant and affects the experience of a client. when used alone, but can be used together with the mechanism described in Section 4 for a "fast failover" solution. Section 4 describes protocol extensions that can speed up failover by not requiring any multicast VPN routing message exchange at recovery time. Moreover, section 5 describes a "hot leaf standby" mechanism, that uses a combination of these two mechanisms. This approach has similarities with the solution described in [RFC7431] to improve failover times when PIM routing is used in a network given some topology and metric constraints. [...] 3.1.1. mVPN Tunnel Root Tracking A condition to consider that the status of a P-tunnel is up is that the root of the tunnel, as determined in the x-PMSI Tunnel attribute, is reachable through unicast routing tables. In this case, the downstream PE can immediately update its UMH when the reachability condition changes. That is similar to BGP next-hop tracking for VPN routes, except that the address considered is not the BGP next-hop address, but the root address in the x-PMSI Tunnel attribute. If BGP next-hop tracking is done for VPN routes and the root address of a given tunnel happens to be the same as the next-hop address in the BGP A-D Route advertising the tunnel, then checking, in unicast routing tables, whether the tunnel root is reachable, will be unnecessary duplication and thus will not bring any specific benefit. <mglt> It seems to me that x-PMSI address designates a different interface than the one used by the Tunnel itself. If that is correct, such mechanisms seems to assume that one equipment up on one interface will be up on the other interfaces. I have the impression that a configuration change in a PE may end up in the P-tunnel being down, while the PE still being reachable though the x-PMSI Tunnel attribute. If that is a possible scenario, the current mechanisms may not provide more efficient mechanism than then those of the standard BGP. GIM>> That is a very interesting angle, thank you. Yes, in OAM, and in the Fault Management (FM) OAM in particular, we have to make some assumptions about the state of the remote system based on a single event or change of state. Usually, AFAIK, operators use not a physical interface but a loopback to associate with a tunnel. With a fast IGP convergence, a loopback interface is reachable as long as there's a path through the network between two nodes. <mglt> Thanks for the clarification </mglt> Similarly, it is assumed the tunnel is either up or down and the determination of not being up if being down. I am not convinced that the two only states. Typically services under DDoS may be down for a small amount of time. While this affects the network, there is not always a clear cut between the PE being up or down. </mglt> GIM>> In defect detection a system often has some hysteresis, i.e., time that the system has to wait to change its state. For example, BFD changes state from Up to Down after the system does not receive N consecutive packets (usually 3). As a result, in some cases, the system can be tuned to detect relatively short outages while in others be slower and miss short-lived outages. [...] 3.1.6. BFD Discriminator Attribute P-tunnel status may be derived from the status of a multipoint BFD session [RFC8562] whose discriminator is advertised along with an x-PMSI A-D Route. This document defines the format and ways of using a new BGP attribute called the "BFD Discriminator". It is an optional transitive BGP attribute. In Section 7.2, IANA is requested to allocate the codepoint value (TBA2). The format of this attribute is shown in Figure 1. <mglt> I feel that the sentence "In Section ... TBA2)." should be removed. </mglt> GIM>> We use this to mark where to note the allocated value. Usually, this text is replaced by the RFC Editor to read In Section 7.2 IANA allocated codepoint XXX. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BFD Mode | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BFD Discriminator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ Optional TLVs ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: Format of the BFD Discriminator Attribute Where: BFD Mode field is the one octet long. This specification defines the P2MP BFD Session as value 1 Section 7.2. Reserved field is three octets long, and the value MUST be zeroed on transmission and ignored on receipt. BFD Discriminator field is four octets long. Morin, et al. Expires April 5, 2021 [Page 7] Internet-Draft mVPN Fast Upstream Failover October 2020 Optional TLVs is the optional variable-length field that MAY be used in the BFD Discriminator attribute for future extensions. TLVs MAY be included in a sequential or nested manner. To allow for TLV nesting, it is advised to define a new TLV as a variable- length object. Figure 2 presents the Optional TLV format TLV that consists of: * one octet-long field of TLV 's Type value (Section 7.3) * one octet-long field of the length of the Value field in octets * variable length Value field. The length of a TLV MUST be multiple of four octets. <mglt> I am wondering why the constraint on the length is not mentioned in the paragraph associated to the field - as opposed to a separate paragraph. </mglt> GIM>> There might be a slight confusion due to the use of Length and length. Capitalized - the name of the field which value is the length of the Value field. The last sentence refers to the overall length of a TLV, including lengths of Type, Length and Value fields. <mglt> you are correct that might have confused me. </mglt> [..] 8. Security Considerations This document describes procedures based on [RFC6513] and [RFC6514] and hence shares the security considerations respectively represented in these specifications. This document uses p2mp BFD, as defined in [RFC8562], which, in turn, is based on [RFC5880]. Security considerations relevant to each protocol are discussed in the respective protocol specifications. An implementation that supports this specification MUST use a mechanism to control the maximum number of p2mp BFD sessions that can be active at the same time. <mglt> At a high level view - or at least my interpretation of it - the document proposes a mechanism based on BFD to detect fault in the path. Upon a fault detection a fail-over operation is instructed using BGP. This rocedure is expected to perform a faster fail-over than traditional BGP convergence on maintaining routing tables. Once the fail over has been performed, BFD is confirms the new path is "legitimate" and works. It seems correct to me that the current protocol relies on BGP / BFD security. That said, having BFD authentication based on MD5 or SHA1 may suggest that stronger primitives be recommended. While this does not concerns the current document, it seems to me that the information might be relayed to routing ADs. What remains unclear to me - and I assume this might be due to my lake or expertise in routing area - is the impact associated to performing a fail-over both on 1) the data plane and 2) the standard BGP way to establish routing tables. Regarding the data plane, I am wondering if fail-over results in a lost of packets for example - I suppose for example that at least the packets in the process of being forwarded might be lost. I believe that providing details on this may be good. GIM>> You bring up a very topic for the discussion, thank you. With network failure detection in place, the fail-over can be viewed as the reaction to a network failure. If that is the case, then packet loss experienced by service due to the fail-over is the result of the network failure. Would you agree with that view? A shorter failure detection interval and faster fail-over should minimize the packet loss and, as a result, the negative impact on the service itself. <mglt> sure. If you know the network is down, then fast fail-over is definitively a plus. What I think could be useful is to evaluate the cost associated to a fast-fail-over without any network failure. This would be useful for an operator to evaluate whether it should spend more time in diagnosing a network failure versus performing a fast-fail-over. Typically, if a fast failover comes a no cost at all, one operator would maybe use one exchange to test the liveness of a node rather than 3. At that point, it seems to me that additional text coudl be added to characterize the impact. These could be high level and indicative, but it seems to me that knowing these impacts presents some value to the operators. </mglt> If there are any impacts I would like to understand also in which cases the decision to perform a failover operation may result in more harm than the event that has been over-interpreted. An hypothetical scenario could be that the non reception of a BFD packet is interpreted as a PE being down while it may not be correct and the PE might have been simply under stress. A "too fast" fail-over may over interpreted it and perform a fail-over. If such things could happen, an attacker could leverage a micro event to perform network operation that are not negligible. Another way to see that is that an attacker might not have direct access to the control plan, but could use the data plan to generate a stress and sort of control the fail over. It seems to me that some text might be welcome to prevent such cases to happen. This could be guidance for declaring a tunnel down for example. GIM>> I agree with your scenario. Over-short detection interval may produce a false-negative transition to the Down state in BFD and thus triggering the fail-over. I think that that is more an operational issue, something that an operator will consider when deploying the mechanism specified in this draft. Resulting from addressing RtgDir review the draft was updated to provide more guidance: In many cases, it is not practical to use both protection methods at the same time because uncorrelated timers might cause unnecessary switchovers and destabilize the network. <mglt> Thanks for the feed back, It seems to me important to mention it is not recommended these two mechanism co-exist. How to avoid false negative transition might be out of scope of the draft I agree, but it seems to me worth being mentioned especially in relation to the impacts associated to a fail-over. In case the fast-failover comes with no impact this becomes less of a problem for operator deploying it. </mglt> Though the text above might not be general, I think that it also applies to the scenario you've presented. Similarly, it would be good to add some text regarding the interferences with the non-fast forwarding fail over when performed by the standard BGP. Typically, my impression is that the fast fail-over mechanism is a local decision versus the BGP convergence that is more global. As a result, even with more time this two mechanisms may come with different outcomes. One such example to illustrate my purpose could be the following. Note that this is only illustrative of my purpose, and I let you find and pick on ethat is more appropriated. I am thinking of a case where a standby PE is be shared among multiple PEs - supposing this situation could occur. Typically, if PE_1, PE_2 are shared by PE_a, ..., PE_z. In case PE_a and PE_b are down, we expect PE_a to switch to PE_1 and PE_b to switch to PE_2. It seems to me that BGP would end up in such situation while a local decision may end up in PE_a and PE_a to switch to PE_1. </mglt> GIM>> Thank you for the scenario that is very common in deploying protection based on the shared redundant resources. Such schemes, referred to as M:N protection, in addition to using mechanism detecting a network failure, e.g., BFD, require a protocol to coordinate the switchover. This specification applies to a more special deployment scenario where one working PE is protected by one or more standby PEs, i.e., 1:N protection. <mglt> I understand the document is addressing a 1:N scenario. That said, if M:N scenario leverage from 1:N protection it seems to me worth raising the issue. </mglt>
- [secdir] Secdir last call review of draft-ietf-be… Daniel Migault via Datatracker
- Re: [secdir] Secdir last call review of draft-iet… Greg Mirsky
- Re: [secdir] Secdir last call review of draft-iet… Greg Mirsky
- Re: [secdir] Secdir last call review of draft-iet… Daniel Migault
- Re: [secdir] Secdir last call review of draft-iet… Greg Mirsky
- Re: [secdir] Secdir last call review of draft-iet… Daniel Migault
- Re: [secdir] Secdir last call review of draft-iet… Greg Mirsky
- Re: [secdir] Secdir last call review of draft-iet… Daniel Migault
- Re: [secdir] Secdir last call review of draft-iet… Greg Mirsky