RE: WGLC for draft-rtgwg-mrt-frr-architecture

Rob Shakir <rjs@rob.sh> Mon, 21 December 2015 15:46 UTC

Return-Path: <rjs@rob.sh>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 980DA1A9076 for <rtgwg@ietfa.amsl.com>; Mon, 21 Dec 2015 07:46:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nONgPd-vazXi for <rtgwg@ietfa.amsl.com>; Mon, 21 Dec 2015 07:46:45 -0800 (PST)
Received: from cappuccino.rob.sh (cappuccino.rob.sh [IPv6:2a03:9800:10:4c::cafe:b00c]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2AFE61A9077 for <rtgwg@ietf.org>; Mon, 21 Dec 2015 07:46:43 -0800 (PST)
Received: from [31.123.215.9] (helo=latte) by cappuccino.rob.sh with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from <rjs@rob.sh>) id 1aB2fF-0007ue-US; Mon, 21 Dec 2015 15:46:22 +0000
Date: Mon, 21 Dec 2015 15:46:39 +0000
From: Rob Shakir <rjs@rob.sh>
To: Chris Bowers <cbowers@juniper.net>, "rtgwg@ietf.org" <rtgwg@ietf.org>, Stewart Bryant <stewart.bryant@gmail.com>, "draft-rtgwg-mrt-frr-architecture@tools.ietf.org" <draft-rtgwg-mrt-frr-architecture@tools.ietf.org>, Alvaro Retana <aretana@cisco.com>
Message-ID: <etPan.56781edf.6b1fb64.10d@latte>
In-Reply-To: <CO2PR05MB619B671A75A6BB2075DD25DA9E40@CO2PR05MB619.namprd05.prod.outlook.com>
References: <566083D0.1020607@gmail.com> <etPan.566efcf6.5e5c4ea5.122@piccolo.rob.sh> <CO2PR05MB619B671A75A6BB2075DD25DA9E40@CO2PR05MB619.namprd05.prod.outlook.com>
Subject: RE: WGLC for draft-rtgwg-mrt-frr-architecture
X-Mailer: Airmail Beta (338)
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="56781edf_4001611f_10d"
Archived-At: <http://mailarchive.ietf.org/arch/msg/rtgwg/nlLYCyquApfs3N0SHh1dFuF-v3I>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Dec 2015 15:46:47 -0000

Chris,

Thanks for the clarification around the fact that MRT can be run as part of a multi-technology approach. I wouldn’t endorse such an approach operationally — why run multiple technologies alongside each other that one must understand vs. a single one that could meet multiple requirements - but some operators may find such an approach useful.

On 21 December, 2015 at 2:47:20 PM, Chris Bowers (cbowers@juniper.net) wrote:

I also want to comment on the fact that remote LFA produces multiple alternates to choose from. With respect to determining if an alternate provides node protection or not, the fact that remote LFA computes many possible alternate paths could be viewed as a drawback, as opposed to an advantage. For a given PLR and failure mode and destination, in general it will be the case that many nodes qualify as PQ nodes. In order to determine if the complete repair path from PLR to PQ-node and PQ-node to destination is node-protecting, additional computation is needed. The most efficient approach seems to be to run a forward SPF from the PQ-node being evaluated. In some topologies, it is not uncommon for many nodes to qualify as PQ nodes. In order to avoid spending too much time churning away at running forward SPFs rooted at PQ-nodes, some implementations may find it useful to limit the number of PQ nodes evaluated for node-protection. 

By comparison, for roughly the computational cost of evaluating three PQ nodes for node-protection, MRT produces a path which is guaranteed to be node-protecting, if node protection is possible. In cases where node-protection and maximum coverage is important, it seems reasonable to give operators the option of having an efficient means of generating a node-protecting path as opposed to the trial and error approach of evaluating large numbers of PQ nodes, which may or may not ultimately provide a node-protecting path. 
I feel this analysis misses a fundamental point — ‘cost’ does not equate only to the number of cycles that we must spend to find an alternate. Instead we need to consider the whole picture. The question one really needs to consider here is whether the “cost” of using CPU cycles is something that we want to optimise for, over the cost of investing in operational expertise/tooling to ensure manageability and capacity.

Much of the work that LFA manageability and TI-LFA do is to make flows align with what is “expected” to happen in the network - such that it is easy to understand for operational personnel, but also, it does not drive investment in new capacity which is used *only* in repair scenarios (such investment is generally required IMHO, in order to not congest and degrade all application traffic during that repair). 

In my experience (and of course, YMMV), optimising for the latter operational reasons is very much worth spending more cycles on the control-plane, especially now that there are tending to be more resources available there. I think characterisations such as the one above are very academically interesting, but I find that in this case, that has less relevance when we come to actually operating a network.

I think there’s interesting work here, but I’ll continue to struggle with whether it really is usable operationally.

Regards,

r.