Re: WGLC for draft-rtgwg-mrt-frr-architecture

Rob Shakir <rjs@rob.sh> Mon, 14 December 2015 17:33 UTC

Return-Path: <rjs@rob.sh>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8EE5B1ACE71 for <rtgwg@ietfa.amsl.com>; Mon, 14 Dec 2015 09:33:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.79
X-Spam-Level:
X-Spam-Status: No, score=0.79 tagged_above=-999 required=5 tests=[BAYES_50=0.8, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dt4McKo3GhCi for <rtgwg@ietfa.amsl.com>; Mon, 14 Dec 2015 09:33:22 -0800 (PST)
Received: from cappuccino.rob.sh (cappuccino.rob.sh [IPv6:2a03:9800:10:4c::cafe:b00c]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4D9161ACEBC for <rtgwg@ietf.org>; Mon, 14 Dec 2015 09:31:41 -0800 (PST)
Received: from [2602:61:7ea4:7f00:4d45:c209:7415:a3e0] (helo=piccolo.rob.sh) by cappuccino.rob.sh with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from <rjs@rob.sh>) id 1a8Wy0-0003U5-Dz; Mon, 14 Dec 2015 17:31:20 +0000
Date: Mon, 14 Dec 2015 10:31:34 -0700
From: Rob Shakir <rjs@rob.sh>
To: Alvaro Retana <aretana@cisco.com>, "rtgwg@ietf.org" <rtgwg@ietf.org>, draft-rtgwg-mrt-frr-architecture@tools.ietf.org, Stewart Bryant <stewart.bryant@gmail.com>
Message-ID: <etPan.566efcf6.5e5c4ea5.122@piccolo.rob.sh>
In-Reply-To: <566083D0.1020607@gmail.com>
References: <566083D0.1020607@gmail.com>
Subject: Re: WGLC for draft-rtgwg-mrt-frr-architecture
X-Mailer: Airmail Beta (337)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Archived-At: <http://mailarchive.ietf.org/arch/msg/rtgwg/R4Jkcz10dr52lmS6rKphhj3MtL0>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Dec 2015 17:33:24 -0000

Hi Stewart, Authors,
 
On 3 December 2015 at 11:03:27, Stewart Bryant (stewart.bryant@gmail.com(mailto:stewart.bryant@gmail.com)) wrote:
> Firstly my high order bit, and I suspect I will be in the rough on this. I am not convinced that 
> this is the best solution that we can produce to this problem, and I am concerned about the 
> operational issues that result from the non-intuitive repair paths when compared to other
> methods. I am also concerned about our ability to get a solution as complicated as this right 
> first time in all of the corner cases. Thus I think that rather than recommend this to the 
> industry as a standard, we should issue it as informational and see how it works out at scale.  

Whilst I’m not sure that I can comment whether this is the ‘best’ solution - I have (previously) spent some time thinking about the problem of disjointness and the applicability of this technique to some of the operational problems that I have had to deal with.

I have a couple of operationally-focused considerations that I think the operator of any network that is thinking of deploying MRTs would need to be thinking about. Primarily, we must say that we are happy with having one set of disjoint topologies (the MRT red/blue), which is utilised during the failure case. This might mean that node A and node B, which are topologically very close to one another might forward via a more indirect path based on the fact that the ‘red’ or ‘blue’ single tree that was selected is less optimal than a local set of disjoint paths that could be found. This means that we need to be able to say that the traffic that we are carrying via this network is also tolerant of that sub-optimality between those two nodes. In a world of less networks, and more variant traffic being carried on them, I am not sure that we can.

The second concern that I have with this approach is around operational influence of path selection. I have not re-reviewed the architecture again in great detail before sending this e-mail, so would apologise in advance if there are nuances that I did not understand. We have seen from LFA that whilst there may be N ways for traffic to get from A to B during a repair, from an operational perspective, not all paths are created equal (and hence we have the LFA manageability work). For example, paths that conform to ‘normal’ traffic routing within the network are generally preferred for manageability and capacity planning reasons. I would agree that we should leave this draft as informational, until such time as we have demonstrated, in live networks,  whether these operational challenges can be met by the MRT approach.

The reason for my reticence is that in the networks that I have operated that have tried to have single network-wide distribution trees, or single sets of repair paths, we have always ended up splitting things out (even when from a service perspective, they may have looked the same) to meet the myriad of business and operational concerns that we need to meet. To this end, I would be encouraging any operator that is thinking of using MRT for disjointness, or for repair, to consider what happens when they end up needing different treatment per-application, per-customer, or per-traffic class. At this point, one would want to compare the complexity of whether MRTs could be extended to these applications (that is to say, whether a set of (network-wide) “global” repair paths will be suitable), or whether they really would be better setting the foundation of having the ability to deploy a local repair paths, and building their operational processes around that.

At the end of the day, in my opinion, it is the operational process around these technologies that is the most complex thing to change, and I’m concerned that if I recommended building this process around MRT, I would be establishing new processes around something that offered more ability to do locally- rather than globally-optimal repair paths in the future.

Just my $0.02. There’s some interesting work here, but I share some of Stewart’s concerns.

Kind regards,
r.