AD Review of draft-ietf-rtgwg-mrt-frr-architecture-08

"Alvaro Retana (aretana)" <aretana@cisco.com> Sat, 02 January 2016 13:08 UTC

Return-Path: <aretana@cisco.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A44EF1A002F; Sat, 2 Jan 2016 05:08:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -11.81
X-Spam-Level:
X-Spam-Status: No, score=-11.81 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yJp2QIwv2hgV; Sat, 2 Jan 2016 05:08:55 -0800 (PST)
Received: from rcdn-iport-2.cisco.com (rcdn-iport-2.cisco.com [173.37.86.73]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E43DC1A002D; Sat, 2 Jan 2016 05:08:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=41167; q=dns/txt; s=iport; t=1451740134; x=1452949734; h=from:to:cc:subject:date:message-id:mime-version; bh=NyjEKLGQ9JRhvT0LVRSmN2vqE8NkjahrjzAaTm/O8mw=; b=XgnOrg/l9SjfJ/kzfuqwthoIdzPYcnxaC/tD5pReq/H71MqXv+lE0gkU GHXTyBTgR5RJrpeSrdp03UjciotmuCD5fUR0xun61ZwSwr8XxCkXsmyix uFrupekL+yUNjcUD/fjtHcNMQ6zEGjJs/LcDlS4z3TWOAHGzrh1GUfQhU I=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0BfBQADy4dW/4oNJK1egm5MgUWIU7N9gWSGD4ESORMBAQEBAQEBfwuENwQaWgUSARobCwE/FxAEDiCIFL93AQEBAQYBAQEBAQEdhlYBiRsUZYQmBYVcjS6DfAGIMIJlgjuBXIRGiFmOOQEjAUCECoQ4QoEIAQEB
X-IronPort-AV: E=Sophos; i="5.20,512,1444694400"; d="scan'208,217"; a="62893654"
Received: from alln-core-5.cisco.com ([173.36.13.138]) by rcdn-iport-2.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 02 Jan 2016 13:08:52 +0000
Received: from XCH-ALN-001.cisco.com (xch-aln-001.cisco.com [173.36.7.11]) by alln-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id u02D8q0R012208 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Sat, 2 Jan 2016 13:08:52 GMT
Received: from xch-aln-002.cisco.com (173.36.7.12) by XCH-ALN-001.cisco.com (173.36.7.11) with Microsoft SMTP Server (TLS) id 15.0.1104.5; Sat, 2 Jan 2016 07:08:52 -0600
Received: from xch-aln-002.cisco.com ([173.36.7.12]) by XCH-ALN-002.cisco.com ([173.36.7.12]) with mapi id 15.00.1104.009; Sat, 2 Jan 2016 07:08:52 -0600
From: "Alvaro Retana (aretana)" <aretana@cisco.com>
To: "draft-ietf-rtgwg-mrt-frr-architecture@ietf.org" <draft-ietf-rtgwg-mrt-frr-architecture@ietf.org>
Subject: AD Review of draft-ietf-rtgwg-mrt-frr-architecture-08
Thread-Topic: AD Review of draft-ietf-rtgwg-mrt-frr-architecture-08
Thread-Index: AQHRRV6597SzUW9co02T3uxhrwX4Hw==
Date: Sat, 02 Jan 2016 13:08:51 +0000
Message-ID: <D29F3CC9.F4D4A%aretana@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.117.15.5]
Content-Type: multipart/alternative; boundary="_000_D29F3CC9F4D4Aaretanaciscocom_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/rtgwg/zEpqxhoj4qSUoldQhVA5aVHGCpc>
Cc: "rtgwg-chairs@ietf.org" <rtgwg-chairs@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Jan 2016 13:08:59 -0000

Hi!

Happy New Year!

I just finished reviewing this document.  I have several comments (please see below) that I want to see addressed before starting the IETF Last Call.

Out of the items marked as Major, the one that concerns me the most is the one related to Operations/Management Considerations.  It is surprising to me that such extensive work in the Standards Track didn't have any Operations/Management (or even Security!) Considerations until the current version.  My comments below echo some of the opinions on the list, but I can't claim that they are all inclusive.  As I pointed out below, I am not looking for a dissertation on the topic, but much more than what's there should be included.

Thanks!

Alvaro.


Major:

  1.  In general, I feel uncomfortable with documents making value statements related, for example, to how they perform against different solutions.  The purpose of this document should be to describe the technology, not to compare against other solutions — that work (if wanted/needed) should be done in a different document.  Please remove comparisons to other technology and relative statements.  Some examples:
     *   Abstract: "MRT is also extremely computationally efficient…computation is less than the LFA computation..."  As was expressed on the list, maybe CPU cycles are not that important if compared to other aspects…
     *   Introduction: "Other existing or proposed solutions are partial solutions or have significant issues, as described below."  It is ok to describe other solutions, but please limit the description to the facts.
        *   About the table: there are obviously other columns that could have been included, which means that the table is not complete.
     *   Section 4: "Modeling results comparing the alternate path lengths obtained with MRT to other approaches are described in [I-D.ietf-rtgwg-mrt-frr-algorithm]."  I am also including the corresponding comment in my review of that other document.
     *   Section 5: "is an advantage of using MRT"  In this case, at least a reference to Section 15. (Applying Policy to Select from Multiple Possible Alternates for FRR) might be in order.
  2.  References to Extensions.  This document being where the architecture for MRT is defined should set the stage/define requirements for extensions that are to be defined elsewhere, and not concern itself with the solutions themselves.  In other words, please remove references to where solutions (in the form of extensions) are being specified.  Some pointers:
     *   All references to I-D.ietf-ospf-mrt, I-D.ietf-isis-mrt and I-D.ietf-mpls-ldp-mrt, except maybe the ones in Section 13. (Implementation Status).
     *   There are two places where it is mentioned that the capabilities to advertise additional loopbacks "have not been defined".
     *   Section 7. (MRT Island Formation) starts by talking about the "purpose of communicating support for MRT in the IGP" which is the first time in the document that is mentioned.  While distribution with an IGP may be the obvious mechanism, please describe the requirement.
     *   Another example of the same occurs in Section 8.2. (Router-specific MRT paramaters) where it says that "additional router-specific MRT parameters may need to be distributed via the IGP", when I think the requirement is that these additional parameters need to be known by all routers in the MRT Island. [Again, distribution using the IGP may be the obvious choice..]
  3.  Algorithm.  draft-ietf-rtgwg-mrt-frr-algorithm says that it "defines the…algorithm that is used in the default MRT profile".  Please make the text in this document consistent with that when referring to draft-ietf-rtgwg-mrt-frr-algorithm.  Some descriptions used in the text: "the exact MRT algorithm", "the algorithm to compute MRTs", "Example algorithm"
  4.  Operations/Management Considerations
     *   Given that the MRT paths don't follow the shortest paths, or even potentially planned backup paths in the network, I think you should include something about the potential impact related to capacity planning, congestion, stretch, etc.
     *   What about address management?  Are there considerations about assignment and management for the additional loopbacks required for IP tunnels?
     *   Section 15. (Applying Policy to Select from Multiple Possible Alternates for FRR) basically says that policy can be applied "to select the best alternate from those provided by MRT and other FRR technologies".  You're right to point out that "[I-D.ietf-rtgwg-lfa-manageability] discusses many of the potential criteria that one might take into account when evaluating different alternates for selection".  What are the considerations that should be taken into account when comparing between MRT and others?  Are the criteria and requirements outlined in I-D.ietf-rtgwg-lfa-manageability applicable?  Even though I-D.ietf-rtgwg-lfa-manageability is intended to be LFA-specific, should it be a Normative reference? [Note that there's a similar comment related to  I-D.ietf-rtgwg-lfa-manageability in my review of draft-ietf-rtgwg-mrt-frr-algorithm.]
     *   Applicability/Guidance for Operators
        *   Section 4. (Maximally Redundant Trees (MRT)) clearly explains about the impact of not having a 2-connected network for MRT to be applicable.  Section 11.3. (MRT Alternates for Destinations Outside the MRT Island) talks about partial implementation in an area.
        *   I think it would be important to consolidate some of that guidance (there's probably more) in a single place.  Note that I'm not looking for a 30 page extension (a la RFC6571), just some general guidance.
     *   Given that both Sections 14 and 15 were added just in the latest version of the document, please consider taking a look at RFC5706.
     *   [Nit] Consider making Section 15. (Applying Policy to Select from Multiple Possible Alternates for FRR) a sub-section of Section 14. (Operations and Management Considerations).
  5.  MRT Profile Selection and Algorithm Transition:
     *   From Section 7.2. (Support for a specific MRT profile): "All routers in an MRT Island MUST support the same MRT profile"…and…"A given router can support multiple MRT profiles and participate in multiple MRT Islands.".  If I understand this correctly, routers can support multiple MRT profiles for the same area/level, right?  If so, how do the routers in the area/level agree on which profile will be the one supported?
     *   Also, Section 8.1. (MRT Profile Options) says that "If a router advertises support for multiple MRT profiles, then it MUST create the transit forwarding topologies for each…"  Are these multiple profiles inside the same area/level?  These two pieces of text don't seem to be in sync.  [But I may be missing something somewhere.]
     *   MRT Algorithm Transition: How is it done?  In Section 8.1. (MRT Profile Options) says that "Algorithm transitions can be managed by advertising multiple MRT profiles", but there's no explanation of how.  This comment is related to the one above about MRT Profile Selection.
     *   [Minor] I may have missed this somewhere..
        *   The MRT MPLS MT-ID value is associated with the MRT profile, so that (for example) the MRT-Red MPLS MT-ID for the default profile is 3997, right?  If so, how does one introduce a new profile?  I'm guessing that by registering new MT-ID values.
           *   What happens if the MT-ID for the Red and Blue MRTs don't correspond to the same profile?
  6.  Section 17. (IANA Considerations) talks about the "MRT Profile TLV", which is not defined anywhere in this document.  I think that asking for the registry creation is enough.
  7.  Section 18. (Security Considerations)  Even though I don't think this document should make explicit references to extensions, clearly there will be a transport that needs to be secured: authentication, privacy, etc.
  8.  References: RFC2119 should be Normative.

Minor:

  1.  Section 1. (Introduction) says that: "Once traffic has been moved to one of MRTs, it is not subject to further repair actions. Thus, the traffic will not loop even if a worse failure (e.g. node) occurs when protection was only available for a simpler failure (e.g. link)."  I'm sure you mean that the worse failure occurs in the original topology (not in the MRT).  Please clarify.
  2.  Multicast protection is out of scope, right?  There's a reference to I-D.atlas-rtgwg-mrt-mc-arch in the Introduction (which is fine), but not explicit indication of the scope.  Section 8.1. (MRT Profile Options) also talks about multicast then describing "MRT Forwarding Mechanism" ("The None option in may be useful for multicast global protection.").
     *   In Section 8.1. (MRT Profile Options), is the "Area/Level Border Behavior" specific to multicast?  BTW, please avoid describing the options with questions.
  3.  Section 1.1. (Importance of 100% Coverage) talks about how micro-loop prevention is something that can be achieved with complete coverage, and Section 12. (Network Convergence and Preparing for the Next Failure) says it is something that needs attention after a failure, but then the document doesn't say how MRT can be used:  the use of MRT (to support Farside Tunneling) is declared out of scope, and an "orphan" statement (no references and no solution) about micro-loop mitigation is made ("Micro-loop mitigation mechanisms can also work when combined with MRT.").
     *   Section 12.1. (Micro-forwarding loop prevention and MRTs) does say that "Managing micro-loops is an orthogonal issue to having alternates for local repair…", but I think you need to explain some more about how micro-loops may not be an issue or how MRT helps.
  4.  Section 6.3. (Forwarding IP Unicast Traffic over MRT Paths) says that, for IP forwarding "consistency with LDP is RECOMMENDED".  Why?  I'm guessing it might be simpler to be consistent if both LDP and IP traffic are being repaired, but what about IP-only networks?
  5.  Section 7.3.1. (Existing IGP exclusion mechanisms)
     *   "In OSPF…a metric of 2^16-1 (0xFFFF)…"   RFC6987 defined a constant called MaxLinkMetric.
     *   "…to prevent transit traffic from using a particular router…[RFC6987] specifies setting all outgoing interface metrics to 0xFFFF" -- that won't prevent traffic through the router if it's the only path, look at the R-bit in OSPFv3; for OSPFv2, I think the latest attempt is draft-ietf-ospf-ospfv2-hbit.
     *   All this doesn't result in an incorrect behavior per the rules at the end of this section.
  6.  Section 8.3. (Default MRT profile): s/priority/GADAG Root Selection Priority
  7.  Section 10. (Inter-area Forwarding Behavior)  Are there cases where it is ok (or even desirable) to keep the traffic in MRT-Red/Blue?  The circumstantial case where independent failures occur in different areas sounds like one where the traffic shouldn't be forwarded onto the default topology by the ABR — but this is a case where the ABR is the entry point to a new repair.  Are there others?  I'm just wondering why this Section doesn't commit (s/should/SHOULD or even MUST) to saying that the traffic has to be taken off an MRT at an ABR.
  8.  Section 12.2. (MRT Recalculation for the Default MRT Profile) includes in the MRT recalculation sequence "a configured (or advertised) period".  Even though this Section talks only about the Default MRT Profile, it seems to me that a "recalculation timer" might be a nice things to have as part of any MRT Profile.
     *   I peeked at the algorithm ID and didn't find a timer defined there either.

Nits:

  1.  Please put a reference to Section 7 (?) in 1.2  when talking about MRT Islands.
  2.  "Any RT is an MRT but many MRTs are not RTs."  Counterintuitive since it sounds like an MRT is the maximal version of an RT.
  3.  Please expand on first use: SPT, PLR, MT-ID
  4.  Some substitutions:
     *   s/it is used to described/it is used to describe
     *
     *   s/choice of tunnel egress MAY be flexible/choice of tunnel egress is flexible
     *   s/either IPv4 and IPv6/either IPv4 or IPv6
     *   s/Forwarding Mechanisms( MRT/Forwarding Mechanisms (MRT
     *   s/The key difference is whether the traffic, once out of the MRT Island, remains in the same area/level and…/The key difference is whether the traffic, once out of the MRT Island, remains in the same area/level or…
  5.  "…we will use the terms area and ABR to indicate either an OSPF area and OSPF ABR or ISIS level and ISIS LBR", but then you use ABR/LBR and area/level anyway.  Suggestion: generalize to domain and DBR, define it in the terminology..
  6.  Shouldn't there be a reference to Appendix A somewhere in Section 11?