RE: AD Review of draft-ietf-rtgwg-mrt-frr-architecture-08

Chris Bowers <cbowers@juniper.net> Sun, 10 January 2016 17:17 UTC

Return-Path: <cbowers@juniper.net>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CAC301ACE50; Sun, 10 Jan 2016 09:17:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.799
X-Spam-Level:
X-Spam-Status: No, score=0.799 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_MESSAGE=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BKaXF6zWUQdp; Sun, 10 Jan 2016 09:17:04 -0800 (PST)
Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0713.outbound.protection.outlook.com [IPv6:2a01:111:f400:fc10::1:713]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 66FEE1ACE43; Sun, 10 Jan 2016 09:17:03 -0800 (PST)
Received: from BY2PR05MB614.namprd05.prod.outlook.com (10.141.218.148) by BY2PR05MB614.namprd05.prod.outlook.com (10.141.218.148) with Microsoft SMTP Server (TLS) id 15.1.361.13; Sun, 10 Jan 2016 17:16:43 +0000
Received: from BY2PR05MB614.namprd05.prod.outlook.com ([10.141.218.148]) by BY2PR05MB614.namprd05.prod.outlook.com ([10.141.218.148]) with mapi id 15.01.0361.006; Sun, 10 Jan 2016 17:16:43 +0000
From: Chris Bowers <cbowers@juniper.net>
To: "Alvaro Retana (aretana)" <aretana@cisco.com>, "draft-ietf-rtgwg-mrt-frr-architecture@ietf.org" <draft-ietf-rtgwg-mrt-frr-architecture@ietf.org>
Subject: RE: AD Review of draft-ietf-rtgwg-mrt-frr-architecture-08
Thread-Topic: RE: AD Review of draft-ietf-rtgwg-mrt-frr-architecture-08
Thread-Index: AdFLymRfrf/gcWOOTVmsW4QZcyGUtg==
Date: Sun, 10 Jan 2016 17:16:43 +0000
Message-ID: <BY2PR05MB61465BA37FE49CCECF3B1DDA9C80@BY2PR05MB614.namprd05.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=cbowers@juniper.net;
x-originating-ip: [66.129.239.15]
x-microsoft-exchange-diagnostics: 1; BY2PR05MB614; 5:Bq6QTjGbBzGaqE/ZvCT6Hq5IT1bcHcRxW0r86S3Sl5jeyQmyuPX+KWukj5l9PuMERKiLyhe47BSjPvDarsVHzZU0XIPtHznhTnDoUwjnxd+fuFL8c8zrtZyMZupwX3wtiXjKskSlpF/f1dpWogDKRg==; 24:vkl36EZJ3OB8Jt+MEwjD5QjOkqv/rFph8fJNk3m7Cp+Y49pbSWYCnxji3ohASLOsm4YKMIgS2Ybeym4qDqL4L7MfmVJlcmoOwDdVjRSiaS4=
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR05MB614;
x-ms-office365-filtering-correlation-id: 3743262e-4ecb-40d3-71ee-08d319e1cffc
x-microsoft-antispam-prvs: <BY2PR05MB61450F8B50A570D4654E37AA9C80@BY2PR05MB614.namprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(95692535739014);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(520078)(8121501046)(3002001)(10201501046); SRVR:BY2PR05MB614; BCL:0; PCL:0; RULEID:; SRVR:BY2PR05MB614;
x-forefront-prvs: 0817737FD1
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(189002)(199003)(51444003)(164054003)(377454003)(76576001)(1220700001)(575784001)(6116002)(106356001)(3846002)(54356999)(97736004)(101416001)(122556002)(66066001)(81156007)(11100500001)(2906002)(50986999)(5004730100002)(15975445007)(74316001)(102836003)(86362001)(4326007)(19580395003)(5001770100001)(5003600100002)(2900100001)(586003)(77096005)(40100003)(10400500002)(33656002)(189998001)(5001960100002)(92566002)(19580405001)(105586002)(230783001)(99286002)(2501003)(1096002)(87936001)(5008740100001)(5002640100001)(427584002)(579004)(569005); DIR:OUT; SFP:1102; SCL:1; SRVR:BY2PR05MB614; H:BY2PR05MB614.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en;
received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts)
spamdiagnosticoutput: 1:23
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_BY2PR05MB61465BA37FE49CCECF3B1DDA9C80BY2PR05MB614namprd_"
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Jan 2016 17:16:43.3698 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR05MB614
Archived-At: <http://mailarchive.ietf.org/arch/msg/rtgwg/3Pua_URZw6w8y7bu1500FbZ8QUs>
Cc: "rtgwg-chairs@ietf.org" <rtgwg-chairs@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Jan 2016 17:17:22 -0000

Alvaro,

Thanks for your detailed feedback.  I've posted  a new revision incorporating your comments as well as some of the outstanding comments from Stewart Bryant and Rob Shakir related to the introduction.

The new revision is here.
https://www.ietf.org/internet-drafts/draft-ietf-rtgwg-mrt-frr-architecture-09.txt
The diff is here.
https://www.ietf.org/rfcdiff?url2=draft-ietf-rtgwg-mrt-frr-architecture-09

See specific responses and text inline with [CB].

I also included links to diffs on github containing the XML edits addressing particular issues or set of issues.  They are intended to help with reviewing the edits, but please refer to the version submitted for the latest definitive revision.

Thanks,
Chris

From: Alvaro Retana (aretana) [mailto:aretana@cisco.com]
Sent: Saturday, January 02, 2016 7:09 AM
To: draft-ietf-rtgwg-mrt-frr-architecture@ietf.org<mailto:draft-ietf-rtgwg-mrt-frr-architecture@ietf.org>
Cc: rtgwg-chairs@ietf.org<mailto:rtgwg-chairs@ietf.org>; rtgwg@ietf.org<mailto:rtgwg@ietf.org>; Janos Farkas <janos.farkas@ericsson.com<mailto:janos.farkas@ericsson.com>>
Subject: AD Review of draft-ietf-rtgwg-mrt-frr-architecture-08

Hi!

Happy New Year!

I just finished reviewing this document.  I have several comments (please see below) that I want to see addressed before starting the IETF Last Call.

Out of the items marked as Major, the one that concerns me the most is the one related to Operations/Management Considerations.  It is surprising to me that such extensive work in the Standards Track didn't have any Operations/Management (or even Security!) Considerations until the current version.  My comments below echo some of the opinions on the list, but I can't claim that they are all inclusive.  As I pointed out below, I am not looking for a dissertation on the topic, but much more than what's there should be included.

Thanks!

Alvaro.


Major:
1.      In general, I feel uncomfortable with documents making value statements related, for example, to how they perform against different solutions.  The purpose of this document should be to describe the technology, not to compare against other solutions - that work (if wanted/needed) should be done in a different document.  Please remove comparisons to other technology and relative statements.  Some examples:
o       Abstract: "MRT is also extremely computationally efficient...computation is less than the LFA computation..."  As was expressed on the list, maybe CPU cycles are not that important if compared to other aspects...
o       Introduction: "Other existing or proposed solutions are partial solutions or have significant issues, as described below."  It is ok to describe other solutions, but please limit the description to the facts.
*       About the table: there are obviously other columns that could have been included, which means that the table is not complete.

==========
[CB]  The abstract and introduction have been significantly revised to try to remove value statements.  I modified the comparison table and added several columns, trying to provide a factual comparison of tradeoffs based on different metrics without making value judgements.   (See the revised document or main diff above.)  However, I am also willing to simply remove the table and descriptions, if that makes more sense.
==========
o       Section 4: "Modeling results comparing the alternate path lengths obtained with MRT to other approaches are described in [I-D.ietf-rtgwg-mrt-frr-algorithm]."  I am also including the corresponding comment in my review of that other document.

[CB] I removed that sentence.
o       Section 5: "is an advantage of using MRT"  In this case, at least a reference to Section 15. (Applying Policy to Select from Multiple Possible Alternates for FRR) might be in order.

[CB] I removed that sentence.
2.      References to Extensions.  This document being where the architecture for MRT is defined should set the stage/define requirements for extensions that are to be defined elsewhere, and not concern itself with the solutions themselves.  In other words, please remove references to where solutions (in the form of extensions) are being specified.  Some pointers:
o       All references to I-D.ietf-ospf-mrt, I-D.ietf-isis-mrt and I-D.ietf-mpls-ldp-mrt, except maybe the ones in Section 13. (Implementation Status).
o       There are two places where it is mentioned that the capabilities to advertise additional loopbacks "have not been defined".
o       Section 7. (MRT Island Formation) starts by talking about the "purpose of communicating support for MRT in the IGP" which is the first time in the document that is mentioned.  While distribution with an IGP may be the obvious mechanism, please describe the requirement.
o       Another example of the same occurs in Section 8.2. (Router-specific MRT paramaters) where it says that "additional router-specific MRT parameters may need to be distributed via the IGP", when I think the requirement is that these additional parameters need to be known by all routers in the MRT Island. [Again, distribution using the IGP may be the obvious choice..]

==========
[CB] The changes removing references to extensions are reflected in the following diff.
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/c01aaedf02124512ff2bc057ace6846c59f6733a
==========
3.      Algorithm.  draft-ietf-rtgwg-mrt-frr-algorithm says that it "defines the...algorithm that is used in the default MRT profile".  Please make the text in this document consistent with that when referring to draft-ietf-rtgwg-mrt-frr-algorithm.  Some descriptions used in the text: "the exact MRT algorithm", "the algorithm to compute MRTs", "Example algorithm"

==========
[CB] These changes are shown in this diff:
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/1555f58f0195a0f417d4b81bbaec9361edb01df6
==========
4.      Operations/Management Considerations
o       Given that the MRT paths don't follow the shortest paths, or even potentially planned backup paths in the network, I think you should include something about the potential impact related to capacity planning, congestion, stretch, etc.
o       What about address management?  Are there considerations about assignment and management for the additional loopbacks required for IP tunnels?
o       Section 15. (Applying Policy to Select from Multiple Possible Alternates for FRR) basically says that policy can be applied "to select the best alternate from those provided by MRT and other FRR technologies".  You're right to point out that "[I-D.ietf-rtgwg-lfa-manageability] discusses many of the potential criteria that one might take into account when evaluating different alternates for selection".  What are the considerations that should be taken into account when comparing between MRT and others?  Are the criteria and requirements outlined in I-D.ietf-rtgwg-lfa-manageability applicable?  Even though I-D.ietf-rtgwg-lfa-manageability is intended to be LFA-specific, should it be a Normative reference? [Note that there's a similar comment related to  I-D.ietf-rtgwg-lfa-manageability in my review of draft-ietf-rtgwg-mrt-frr-algorithm.]
o       Applicability/Guidance for Operators
*       Section 4. (Maximally Redundant Trees (MRT)) clearly explains about the impact of not having a 2-connected network for MRT to be applicable.  Section 11.3. (MRT Alternates for Destinations Outside the MRT Island) talks about partial implementation in an area.
*       I think it would be important to consolidate some of that guidance (there's probably more) in a single place.  Note that I'm not looking for a 30 page extension (a la RFC6571), just some general guidance.
o       Given that both Sections 14 and 15 were added just in the latest version of the document, please consider taking a look at RFC5706.
o       [Nit] Consider making Section 15. (Applying Policy to Select from Multiple Possible Alternates for FRR) a sub-section of Section 14. (Operations and Management Considerations).

==========
[CB] These changes are shown in this diff:
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/c361f37638390a1b0364d67726367aabef203976
Revised section 14 is also reproduced below.
14.  Operational Considerations

   The following aspects of MRT-FRR are useful to consider when
   deploying the technology in different operational environments and
   network topologies.

14.1.  Verifying Forwarding on MRT Paths

   The forwarding paths created by MRT-FRR are not used by normal (non-
   FRR) traffic.  They are only used to carry FRR traffic for a short
   period of time after a failure has been detected.  It is RECOMMENDED
   that an operator proactively monitor the MRT forwarding paths in
   order to be certain that the paths will be able to carry FRR traffic
   when needed.  Therefore, an implementation SHOULD provide an operator
   with the ability to test MRT paths with OAM traffic.  For example,
   when MRT paths are realized using LDP labels distributed for
   topology-scoped FECs, an implementation can use the MPLS ping and
   traceroute as defined in [RFC4379] and extended in [RFC7307] for
   topology-scoped FECs.

14.2.  Traffic Capacity on Backup Paths

   During a fast-reroute event initiated by a PLR in response to a
   network failure, the flow of traffic in the network will generally
   not be identical to the flow of traffic after the IGP forwarding
   state has converged, taking the failure into account.  Therefore,
   even if a network has been engineered to have enough capacity on the
   appropriate links to carry all traffic after the IGP has converged
   after the failure, the network may still not have enough capacity on
   the appropriate links to carry the flow of traffic during a fast-
   reroute event.  This can result in more traffic loss during the fast-
   reroute event than might otherwise be expected.

   Note that there are two somewhat distinct aspects to this phenomenon.
   The first is that the path from the PLR to the destination during the
   fast-reroute event may be different from the path after the IGP
   converges.  In this case, any traffic for the destination that
   reaches the PLR during the fast-reroute event will follow a different
   path from the PLR to the destination than will be followed after IGP
   convergence.

   The second aspect is that the amount of traffic arriving at the PLR
   for affected destinations during the fast-reroute event may be larger
   than the amount of traffic arriving at the PLR for affected
   destinations after IGP convergence.  Immediately after a failure, any
   non-PLR routers that were sending traffic to the PLR before the
   failure will continue sending traffic to the PLR, and that traffic
   will be carried over backup paths from the PLR to the destinations.
   After IGP convergence, upstream non-PLR routers may direct some
   traffic away from the PLR.

   In order to reduce or eliminate the potential for transient traffic
   loss due to inadequate capacity during fast-reroute events, an
   operator can model the amount of traffic taking different paths
   during a fast-reroute event.  If it is determined that there is not
   enough capacity to support a given fast-reroute event, the operator
   can address the issue either by augmenting capacity on certain links
   or modifying the backup paths themselves.

   The MRT Lowpoint algorithm produces a pair of diverse paths to each
   destination.  These paths are generated by following the directed
   links on a common GADAG.  MRT-FRR allows an operator to exclude a
   link from the MRT Island, and thus the GADAG, by advertising it as
   MRT-Ineligible.  Such a link will not be used on the MRT forwarding
   path for any destination.  Advertising links as MRT-Ineligible is the
   main tool provided by MRT-FRR for keeping backup traffic off of lower
   bandwidth links during fast-reroute events.

   Note that all of the backup paths produced by the MRT Lowpoint
   algorithm are closely tied to the common GADAG computed as part of
   that algorithm.  Therefore, it is generally not possible to modify a
   subset of paths without affecting other paths.  This precludes more
   fine-grained modification of individual backup paths when using only
   paths computed by the MRT Lowpoint algorithm.

   However, it may be desirable to allow an operator to use MRT-FRR
   alternates together with alternates provided by other FRR
   technologies.  A policy-based alternate selection process can allow
   an operator to select the best alternate from those provided by MRT
   and other FRR technologies.  As a concrete example, it may be
   desirable to implement a policy where a downstream LFA (if it exists
   for a given failure mode and destination) is preferred over a given
   MRT alternate.  This combination gives the operator the ability to
   affect where traffic flows during a fast-reroute event, while still
   producing backup paths that use no additional labels for LDP traffic
   and will not loop under multiple failures.  This and other choices of
   alternate selection policy can be evaluated in the context of their
   effect on fast-reroute traffic flow and available capacity, as well
   as other deployment considerations.

   Note that future documents may define MRT profiles in addition to the
   default profile defined here.  Different MRT profiles will generally
   produce alternate paths with different properties.  An implementation
   may allow an operator to use different MRT profiles instead of or in
   addition to the default profile.

14.3.  MRT IP Tunnel Loopback Address Management

   As described in Section 6.1.2, if an implementation uses IP tunneling
   as the mechanism to realize MRT forwarding paths, each node must
   advertise an MRT-Red and an MRT-Blue loopback address.  These IP
   addresses must be unique within the routing domain to the extent that
   they do not overlap with each other or with any other routing table
   entries.  It is expected that operators will use existing tools and
   processes for managing infrastructure IP addresses to manage these
   additional MRT-related loopback addresses.

14.4.  MRT-FRR in a Network with Degraded Connectivity

   Ideally, routers is a service provider network using MRT-FRR will be
   initially deployed in a 2-connected topology, allowing MRT-FRR to
   find completely diverse paths to all destinations.  However, a
   network can differ from an ideal 2-connected topology for many
   possible reasons, including network failures and planned maintenance
   events.

   MRT-FRR is designed to continue to function properly when network
   connectivity is degraded.  When a network contains cut-vertices or
   cut-links dividing the network into different 2-connected blocks,
   MRT-FRR will continue to provide completely diverse paths for
   destinations within the same block as the PLR.  For a destination in
   a different block from the PLR, the redundant paths created by MRT-
   FRR will be link and node diverse within each block, and the paths
   will only share links and nodes that are cut-links or cut-vertices in
   the topology.

   If a network becomes partitioned with one set of routers having no
   connectivity to another set of routers, MRT-FRR will function
   independently in each set of connected routers, providing redundant
   paths to destinations in same set of connected routers as a given
   PLR.

14.5.  Partial Deployment of MRT-FRR in a Network

   A network operator may choose to deploy MRT-FRR only on a subset of
   routers in an IGP area.  MRT-FRR is designed to accommodate this
   partial deployment scenario.  Only routers that advertise support for
   a given MRT profile will be included in a given MRT Island.  For a
   PLR within the MRT Island, MRT-FRR will create redundant forwarding
   paths to all destinations with the MRT Island using maximally
   redundant trees all the way to those destinations.  For destinations
   outside of the MRT Island, MRT-FRR creates paths to the destination
   which use forwarding state created by MRT-FRR within the MRT Island
   and shortest path forwarding state outside of the MRT Island.  The
   paths created by MRT-FRR to non-Island destinations are guaranteed to
   be diverse within the MRT Island (if topologically possible).
   However, the part of the paths outside of the MRT Island may not be
   diverse.

==========
5.      MRT Profile Selection and Algorithm Transition:
o       From Section 7.2. (Support for a specific MRT profile): "All routers in an MRT Island MUST support the same MRT profile"...and..."A given router can support multiple MRT profiles and participate in multiple MRT Islands.".  If I understand this correctly, routers can support multiple MRT profiles for the same area/level, right?  If so, how do the routers in the area/level agree on which profile will be the one supported?
o       Also, Section 8.1. (MRT Profile Options) says that "If a router advertises support for multiple MRT profiles, then it MUST create the transit forwarding topologies for each..."  Are these multiple profiles inside the same area/level?  These two pieces of text don't seem to be in sync.  [But I may be missing something somewhere.]
o       MRT Algorithm Transition: How is it done?  In Section 8.1. (MRT Profile Options) says that "Algorithm transitions can be managed by advertising multiple MRT profiles", but there's no explanation of how.  This comment is related to the one above about MRT Profile Selection.
o       [Minor] I may have missed this somewhere..
*       The MRT MPLS MT-ID value is associated with the MRT profile, so that (for example) the MRT-Red MPLS MT-ID for the default profile is 3997, right?  If so, how does one introduce a new profile?  I'm guessing that by registering new MT-ID values.
1.      What happens if the MT-ID for the Red and Blue MRTs don't correspond to the same profile?

==========
[CB] The question may already be answered by the edits, but I will try to answer here as well.  All routers are expected to be using the same MT-ID values for the same profile.  But if a router receives LDP advertisements with an MT-ID values it doesn't understand, based on RFC7307, it should respond with an "Invalid Topology ID" status code in the LDP Notification message.
[CB] Diff for this group of comments:
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/22689df3b7b311ea34f1751f123ff4f161516943
I modified the text in section 7.2 to better explain simultaneous support for multiple profiles (included below.)

   Note that a router may advertise support for multiple different MRT
   profiles.  The process of MRT Island formation takes place
   independently for each MRT profile advertised by a given router.  For
   example, consider a network with 40 connected routers in the same
   area advertising support for MRT Profile A and MRT Profile B.  Two
   distinct MRT Islands will be formed corresponding to Profile A and
   Profile B, with each island containing all 40 routers.  A complete
   set of maximally redundant trees will be computed for each island
   following the rules defined for each profile.  If we add a third MRT
   Profile to this example, with Profile C being advertised by a
   connected subset of 30 routers, there will be a third MRT Island
   formed corresponding to those 30 routers, and a third set of
   maximally redundant trees will be computed.  In this example, 40
   routers would compute and install two sets of MRT transit forwarding
   entries corresponding to Profiles A and B, while 30 routers would
   compute and install three sets of MRT transit forwarding entries
   corresponding to Profiles A, B, and C.

   Text explaining profile (or algorithm transition)
   The ability of MRT-FRR to support transit forwarding entries for
   multiple profiles can be used to facilitate a smooth transition from
   an existing deployed MRT Profile to a new MRT Profile.  The new
   profile can be activated in parallel with the existing profile,
   installing the transit forwarding entries for the new profile without
   affecting the transit forwarding entries for the existing profile.
   Once the new transit forwarding state has been verified, the router
   can be configured to use the alternates computed by the new profile
   in the event of a failure.


I also added text to clarify how a new MRT profile is defined with new MT-ID values.
   When a new MRT Profile is defined, new values should be allocated
   from the MPLS Multi-Topology Identifiers Registry, corresponding to
   the MRT-Red and MRT-Blue MT-ID values for the new MRT Profile .

===============
6.      Section 17. (IANA Considerations) talks about the "MRT Profile TLV", which is not defined anywhere in this document.  I think that asking for the registry creation is enough.
7.      Section 18. (Security Considerations)  Even though I don't think this document should make explicit references to extensions, clearly there will be a transport that needs to be secured: authentication, privacy, etc.
8.      References: RFC2119 should be Normative.
================
[CB] Major issue 6-8 are addressed in this diff.

https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/6ce1afdf59f9fe3f9037c0f76b3a00c0db932998

================
Minor:
1.      Section 1. (Introduction) says that: "Once traffic has been moved to one of MRTs, it is not subject to further repair actions. Thus, the traffic will not loop even if a worse failure (e.g. node) occurs when protection was only available for a simpler failure (e.g. link)."  I'm sure you mean that the worse failure occurs in the original topology (not in the MRT).  Please clarify.

=================
[CB] new text to clarify.
Once traffic has been
   moved to one of the MRTs by one PLR, that traffic is not subject to
   further repair actions by another PLR, even in the event of multiple
   simultaneous failures.  Therefore, traffic repaired by MRT-FRR will
   not loop between different PLRs responding to different simultaneous
   failures.
==============
2.      Multicast protection is out of scope, right?  There's a reference to I-D.atlas-rtgwg-mrt-mc-arch in the Introduction (which is fine), but not explicit indication of the scope.  Section 8.1. (MRT Profile Options) also talks about multicast then describing "MRT Forwarding Mechanism" ("The None option in may be useful for multicast global protection.").
o       In Section 8.1. (MRT Profile Options), is the "Area/Level Border Behavior" specific to multicast?  BTW, please avoid describing the options with questions.

================
[CB]  Area Border behavior is not specific to multicast.  I removed the confusing reference to [I-D.atlas-rtgwg-mrt-mc-arch] in section 8.1
[CB] I made several edits throughout to clarify that multicast is out of scope.
Diff for minor issues 1 and 2 is found here.
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/542ead0a83f4e3572b57d16f01abca86b0fe31c2
================
3.      Section 1.1. (Importance of 100% Coverage) talks about how micro-loop prevention is something that can be achieved with complete coverage, and Section 12. (Network Convergence and Preparing for the Next Failure) says it is something that needs attention after a failure, but then the document doesn't say how MRT can be used:  the use of MRT (to support Farside Tunneling) is declared out of scope, and an "orphan" statement (no references and no solution) about micro-loop mitigation is made ("Micro-loop mitigation mechanisms can also work when combined with MRT.").
o       Section 12.1. (Micro-forwarding loop prevention and MRTs) does say that "Managing micro-loops is an orthogonal issue to having alternates for local repair...", but I think you need to explain some more about how micro-loops may not be an issue or how MRT helps.

================
[CB] I rewrote section 12.1 to clarify, and added small edits elsewhere.  Complete diff below.
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/673580abf1ae331bf3964c4f7128163e77b85edf
12.1.  Micro-loop prevention and MRTs

   A micro-loop is a transient packet forwarding loop among two or more
   routers that can occur during IGP convergence.  [RFC5715] discusses
   several techniques for preventing micro-loops.  This section
   discusses how MRT-FRR relates to two of the micro-loop prevention
   techniques discussed in [RFC5715], Nearside Tunneling and Farside
   Tunneling.

   In Nearside Tunneling, a router (PLR) adjacent to a failure perform
   local repair and inform remote routers of the failure.  The remote
   routers initially tunnel affected traffic to the nearest PLR, using
   tunnels which are unaffected by the failure.  Once the forwarding
   state for normal shortest path routing has converged, the remote
   routers return the traffic to shortest path forwarding.  MRT-FRR is
   relevant for Nearside Tunneling for the following reason.  The
   process of tunneling traffic to the PLRs and waiting a sufficient
   amount of time for IGP forwarding state convergence with Nearside
   Tunneling means that traffic will generally be relying on the local
   repair at the PLR for longer than it would in the absence of Nearside
   Tunneling.  Since MRT-FRR provides 100% coverage for single link and
   node failure, it may be an attractive option to provide the local
   repair paths when Nearside Tunneling is deployed.

   MRT-FRR is also relevant for the Farside Tunneling micro-loop
  prevention technique.  In Farside Tunneling, remote routers tunnel
  traffic affected by a failure to a node downstream of the failure
  with respect to traffic destination.  This node can be viewed as
   being on the farside of the failure with respect to the node
   initiating the tunnel.  Note that the discussion of Farside Tunneling
   in [RFC5715] focuses on the case where the farside node is
   immediately adjacent to a failed link or node.  However, the farside
   node may be any node downstream of the failure with respect to
   traffic destination, including the destination itself.  The tunneling
   mechanism used to reach the farside node must be unaffected by the
   failure.  The alternative forwarding paths created by MRT-FRR have
   the potential to be used to forward traffic from the remote routers
   upstream of the failure all the way to the destination.  In the event
   of failure, either the MRT-Red or MRT-Blue path from the remote
   upstream router to the destination is guaranteed to avoid a link
   failure or inferred node failure.  The MRT forwarding paths are also
   guaranteed to not be subject to micro-loops because they are locked
   to the topology before the failure.

   We note that the computations in [I-D.ietf-rtgwg-mrt-frr-algorithm]
   address the case of a PLR adjacent to a failure determining which
   choice of MRT-Red or MRT-Blue will avoid a failed link or node.  More
   computation may be required for an arbitrary remote upstream router
   to determine whether to choose MRT-Red or MRT-Blue for a given
   destination and failure.

==================
4.      Section 6.3. (Forwarding IP Unicast Traffic over MRT Paths) says that, for IP forwarding "consistency with LDP is RECOMMENDED".  Why?  I'm guessing it might be simpler to be consistent if both LDP and IP traffic are being repaired, but what about IP-only networks?

[CB] I deleted the recommendation.
5.      Section 7.3.1. (Existing IGP exclusion mechanisms)
o       "In OSPF...a metric of 2^16-1 (0xFFFF)..."   RFC6987 defined a constant called MaxLinkMetric.
o       "...to prevent transit traffic from using a particular router...[RFC6987] specifies setting all outgoing interface metrics to 0xFFFF" -- that won't prevent traffic through the router if it's the only path, look at the R-bit in OSPFv3; for OSPFv2, I think the latest attempt is draft-ietf-ospf-ospfv2-hbit.
o       All this doesn't result in an incorrect behavior per the rules at the end of this section.

===================
[CB] I added text about the OSPFv3 V6-bit and R-bit and clarified how MaxLinkMetric only discourages transit traffic.
   For OSPFv2 and OSPFv3, [RFC6987] specifies setting all outgoing
   interface metrics to 0xFFFF to discourage transit traffic from using
   a router.( [RFC6987] defines the metric value 0xFFFF as
   MaxLinkMetric, a fixed architectural value for OSPF.)  For OSPFv3,
   [RFC5340] specifies that a router be excluded from the intra-area
   shortest path tree computation if the V6-bit or R-bit of the LSA
   options is not set in the Router LSA.

   The following rules for MRT Island formation ensure that MRT FRR
   protection traffic does not use a link or router that is discouraged
   or prevented from carrying traffic by existing IGP mechanisms.

   1.  A bidirectional link MUST be excluded from an MRT Island if
       either the forward or reverse cost on the link is 0xFFFFFE (for
       ISIS) or 0xFFFF for OSPF.

   2.  A router MUST be excluded from an MRT Island if it is advertised
       with the overload bit set (for ISIS), or it is advertised with
       metric values of 0xFFFF on all of its outgoing interfaces (for
       OSPFv2 and OSPFv3).

   3.  A router MUST be excluded from an MRT Island if it is advertised
       with either the V6-bit or R-bit of the LSA options not set in the
       Router LSA.

=====================
[CB] Diff for minor 4-5.
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/da088b9b7749c10627e625759c6b4728b63acef9?diff=split
======================
6.      Section 8.3. (Default MRT profile): s/priority/GADAG Root Selection Priority

[CB]  Done.
7.      Section 10. (Inter-area Forwarding Behavior)  Are there cases where it is ok (or even desirable) to keep the traffic in MRT-Red/Blue?  The circumstantial case where independent failures occur in different areas sounds like one where the traffic shouldn't be forwarded onto the default topology by the ABR - but this is a case where the ABR is the entry point to a new repair.  Are there others?  I'm just wondering why this Section doesn't commit (s/should/SHOULD or even MUST) to saying that the traffic has to be taken off an MRT at an ABR.

========================
[CB] For unicast FRR, I can't think of a scenario where one would want to stay on the red or blue MRT after crossing area borders.    For multicast distribution on MRTs, it may be useful to stay on red or blue across borders.  The default MRT profile specifies ABRs/LBRs SHOULD ensure that traffic leaving the area also exits the MRT-Red or MRT-Blue forwarding topology.  I think the draft can be somewhat less strict about this requirement because different behavior by some ABRs won't cause routing loops or lost traffic, but I'm open to changes here if you think it makes sense.
==========================
8.      Section 12.2. (MRT Recalculation for the Default MRT Profile) includes in the MRT recalculation sequence "a configured (or advertised) period".  Even though this Section talks only about the Default MRT Profile, it seems to me that a "recalculation timer" might be a nice things to have as part of any MRT Profile.
o       I peeked at the algorithm ID and didn't find a timer defined there either.

========================
[CB]  Extensions are defined in draft-ietf-ospf-mrt and draft-ietf-isis-mrt so I added this text.
   New protocol extensions for advertising the time needed to recompute
   shortest path routes and install them in the FIB will be defined
   elsewhere.
=====================

Nits:
1.      Please put a reference to Section 7 (?) in 1.2  when talking about MRT Islands.
2.      "Any RT is an MRT but many MRTs are not RTs."  Counterintuitive since it sounds like an MRT is the maximal version of an RT.

=========================
[CB]  Rewrote these definitions to clarify.
   Redundant Trees (RT):   A pair of trees where the path from any node
      X to the root R along the first tree is node-disjoint with the
      path from the same node X to the root along the second tree.
      Redundant trees can always be computed in 2-connected graphs.

   Maximally Redundant Trees (MRT):   A pair of trees where the path
      from any node X to the root R along the first tree and the path
      from the same node X to the root along the second tree share the
      minimum number of nodes and the minimum number of links.  Each
      such shared node is a cut-vertex.  Any shared links are cut-links.
      In graphs that are not 2-connected, it is not possible to compute
      RTs.  However, it is possible to compute MRTs.  MRTs are maximally
      redundant in the sense that they are as redundant as possible
      given the constraints of the network graph.

[CB] Also took the opportunity to re-order the terminology section.

==========================
3.      Please expand on first use: SPT, PLR, MT-ID

===============================
[CB] Diff for nits 1-3:
https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/52068aff5dfc9fb5ff2b42dfb2a19ce5447023fa
===========================
4.      Some substitutions:
o       s/it is used to described/it is used to describe
o       s/choice of tunnel egress MAY be flexible/choice of tunnel egress is flexible
o       s/either IPv4 and IPv6/either IPv4 or IPv6
o       s/Forwarding Mechanisms( MRT/Forwarding Mechanisms (MRT
o       s/The key difference is whether the traffic, once out of the MRT Island, remains in the same area/level and.../The key difference is whether the traffic, once out of the MRT Island, remains in the same area/level or...
5.      "...we will use the terms area and ABR to indicate either an OSPF area and OSPF ABR or ISIS level and ISIS LBR", but then you use ABR/LBR and area/level anyway.  Suggestion: generalize to domain and DBR, define it in the terminology..
6.      Shouldn't there be a reference to Appendix A somewhere in Section 11?

==================================
[CB] https://github.com/cbowers/draft-ietf-rtgwg-mrt-frr-architecture/commit/94a247c426104b06862118149be9254fc553bb25
==================================