Re: [Rift] RtgDir Early review: draft-ietf-rift-rift-06
Tony Przygienda <tonysietf@gmail.com> Tue, 23 July 2019 20:27 UTC
Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB10E120950; Tue, 23 Jul 2019 13:27:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kN5_U66O6vty; Tue, 23 Jul 2019 13:27:17 -0700 (PDT)
Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C82331203C2; Tue, 23 Jul 2019 13:27:13 -0700 (PDT)
Received: by mail-ed1-x52b.google.com with SMTP id s49so10309358edb.1; Tue, 23 Jul 2019 13:27:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GwyPMv1j9avH8Vnfm3Rmt3e1Fpn0qM+URyG5VmS8ZRw=; b=MXAh9Uk/uDFl0zGFzjX9HuNF3HgsVD1uauZHLlCQ5VVtaNVNz/+xIomYbbi+e/hJ27 DXoxPPw+aZpNqb06iApLWOAq8aajEz+McqaLa0tbwHc3px4NCeLXRgk7SlrcAodJa5BF nx06GIcHlycLIfYj1rKPsLQfC174r6MKZX3vCEnzrJEc8ld6MDUABnxhd7XYeWCLMqbW cDrW++MondNQ0UvpII/psj6NLHFBXwOjDE40RCE1uTiCFelHYDc601LeOwckXjejMSrI tRlcL7Dt3eujYTeLeWG9TdcjUdH32Q/LEEwqulTR4yxFbqb8yYM489FeMcY3wyomd59N u9Eg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GwyPMv1j9avH8Vnfm3Rmt3e1Fpn0qM+URyG5VmS8ZRw=; b=V+MNFQvbjMQgpB/oXNJq6M7VxQasVUCj7gcLxBQy2pb/4qbdsOz8rE4U1IdjYgaXiV bdc2/lJ+csUQ8a03JWI2SEeaUjoUl26ojtWP8HUbvL7mjRo2/PHT+4yDWxxK8hd5WZ7i Tzb/BqVatw34Swrk6HJ1lbgfRZ0hb/1x4dzASHcZPIxReLbi+nmsMcKheW4C6ju8eS7o c71VNuF6/J3cxs5DSTWYwsum0RLT7pt7pMrHgZ5kseksXl//RxIASLfRHJa3pBR1mzSF 3OF+76PxF5sMkEXyJDXUa5wDyYSTZeLUeZIkSNMczqhEU8VRplYL+EcYe65xEO5h+ov1 cNNw==
X-Gm-Message-State: APjAAAWXBmXHc99TVe2x6x6iKiBXbwoJVvwyNxF+YGXaiu/4/KS0ppEk jerpNPwj9ogy5WCcO/zhySKHOuICbnBCgfGYFDqGZQEECwE=
X-Google-Smtp-Source: APXvYqxZ46g02b/rhNnbkW561ppp1yj7jIQ1i0b82DaEKS7fLfQyLIHEGF9gfKQBIEzF1n/kU7H2muf19UY8HUV/U3U=
X-Received: by 2002:a50:b87c:: with SMTP id k57mr66624955ede.226.1563913632155; Tue, 23 Jul 2019 13:27:12 -0700 (PDT)
MIME-Version: 1.0
References: <097e01d5418d$847ee300$8d7ca900$@riw.us>
In-Reply-To: <097e01d5418d$847ee300$8d7ca900$@riw.us>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Tue, 23 Jul 2019 16:26:35 -0400
Message-ID: <CA+wi2hO0SMadg1LHNvORCxD37kkriA9LUJEbevSURbcNTvk3hQ@mail.gmail.com>
To: Russ White <russ@riw.us>
Cc: rtg-wg-chairs@ietf.org, draft-ietf-rift-rift@ietf.org, rtg-dir@ietf.org, rift@ietf.org
Content-Type: multipart/alternative; boundary="000000000000810aab058e5f044a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/WfceSWboY4G1kyJRkp6u1AyZkC8>
Subject: Re: [Rift] RtgDir Early review: draft-ietf-rift-rift-06
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Jul 2019 20:27:22 -0000
thanks for quick review, Russ. Flew over it. most stuff seems clarification and editorials to me. Let's sit with authors in the bar and go over it, I have -07 pending anyway right after IETF due to the v4/v6 router requirement comments and bruno's interop so this is a great time in fact to file off the last rough edges ... --- tony On Tue, Jul 23, 2019 at 3:33 PM <russ@riw.us> wrote: > Hello > > I have been selected to do a routing directorate “early” review of this > draft. > https://datatracker.ietf.org/doc/draft-foo-name/ > > The routing directorate will, on request from the working group chair, > perform an “early” review of a draft before it is submitted for publication > to the IESG. The early review can be performed at any time during the > draft’s lifetime as a working group document. The purpose of the early > review depends on the stage that the document has reached. > > As this document has recently been adopted by the working group, my focus > for the review is on providing a new perspective on the work, with the > intention of catching any issues early on in the document's life cycle. > > For more information about the Routing Directorate, please see > http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir > > Document: draft-ietf-rift-rift-06 > Reviewer: Russ White > Review Date: 23 July 2019 > Intended Status: Standards Track > > Summary: > > I have significant concerns about this document. It needs more work before > being submitted to the IESG. > > Comments: > > Overall this draft is very readable, although I have suggested wording > changes below. The protocol described is useful and interesting. I largely > have questions about the structure and wording of the draft, although there > are a couple of technical questions in the comments below (probably just > things I missed the explanation for, however). > > Structurally, this document almost feels like two different documents. The > first specifies a set of use cases and requirements, while the second > specifies a protocol. I wonder if it wouldn't be better to split this > document into two pieces, making each one more specific, and possibly a bit > easier to read? We might too late in the process to make such a change -- > if so, that's okay, just thought it might be worth considering. > > Throughout -- "fat tree," "Clos," and other terms are used in various > places. As a reader, given the different meanings for these terms, I found > this a bit confusing. It might be better to use "spine and leaf" throughout. > > Throughout -- the term "folded" has two very different meanings... The > first is a multistage fabric on which traffic flows bi-directionally, the > second is a way of drawing spine and leaf fabrics in blocks. It might be > useful to clarify this from the beginning somehow, so readers who are > expecting one meaning don't misread things because a different meaning is > intended (?). I think it's always used here in the "way of drawing a spine > and leaf fabric" here. > > == > Page 9 > > The definitions of the various forms of TIE might be better if they > clearly differentiated between topology and reachability information (?). > The OSPF router LSA, for instance, contains both kinds of information. In > IS-IS, adjacencies, which are called Node TIES (I think) are tlv22, while > the Prefix TIE would be a tlv128 and 236. I don't think there is an > equivalent separation between topology (reachable neighbors) and > reachability (reachable prefixes) in OSPF. > > == > REQ13: "Taking a path through the spine in cases where a shorter path is > available is highly undesirable." > > This doesn't seem to be related to the previous statements in the > requirement... Not certain this needs to be included? Or maybe it wants to > be someplace else in the list? > > == > REQ10 and REQ13 both seem to say the same thing -- although REQ13 is > better worded. Maybe both of these are not needed? > > == > REQ15: "...links of a single node having same addresses." -- might be > better as "...links of a single node sharing the same address." > > == > REQ18: "...security mechanisms that allow to restrict nodes, especially > leafs without proper credentials from forming three-way adjacencies." > > Might be better as: > > "...security mechanisms that allow the operator to restrict nodes, > especially leaf nodes without proper credentials, from forming a three-way > adjacency." > > == > PEND2: 500,000 seems way too small for the scale numbers I've heard in the > past -- especially if we are including the underlay protocol running on the > hosts in the mix. If there are 300k ports is at least 1 prefix per edge > port, and potentially 10-15 prefixes per edge port. If you guess around 20% > of the workloads attached to the fabric might be mobile, then the number > here is more likely 1 million at the base, with a top end of around 2-3 > million prefixes. It might be worth adjusting this number a bit larger (?). > > == > Section 5, paragraph after the header > > "...when "pointing north" and path vector [RFC4271] protocol when > "pointing south". While this is an unusual combination,..." > > Elsewhere in the document "distance-vector" is used. It might be better to > be consistent? > > == > Section 5.1.1 > > In the first paragraph ("The most singular property..."), omitting > east-west control plane information flow is mentioned twice; could probably > just be once. > > The first sentence in the second paragraph is a bit of a run-on; might be > best to omit some of this, or break it into two sentences. > > Second paragraph: "The application of those principle lead to RIFT..." > doesn't seem right. Might be better as: "The application of these > principles lead to RIFT..." > > == > Section 5.1.2, paragraph beginning: "Given the difficulty of visualizing > multi plane design which are..." First, this is one long sentence. Second, > what's illustrated looks like a crossbar, but it's not a crossbar. I find > this a little confusing (?). The problem is -- I don't know of another term > to use here. > > == > Section 5.1.2, paragraph beginning: "The typical topology for which RIFT > is defined is built of a number P..." This seems to describe a five stage > butterfly or pod-and-spine fabric, but does not seem to describe a seven > stage (?). > > Overall -- I'm not certain trying to describe these topologies at this > level in a protocol specification draft is all that useful. The terminology > is fluid (used by different people in different ways), and this depth of > explanation does not seem to be useful to describe the protocol operation. > This is one place where splitting the document into two pieces might be > helpful. > > == > Paragraph beginning: "It is critical at this junction that the concept and > the picture of..." What is shown is not really a crossbar fabric... I'm not > certain calling this a crossbar is helpful here (?). > > == > Section 5.1.4: "This happens naturally in single plane design but needs > additional considerations in multiplane fabrics." When aggregation is used, > as well -- but not when no aggregation of reachability information is > deployed in the fabric. It might be worth adding that qualification here. > > == > Section 5.1.4 > > Sentence beginning: "In more detail, by reserving two ports on each > Top-of-Fabric node it is possible to connect them together in an interplane > bi-directional ring as illustrated in Figure 13..." > > I might have missed it, but this document does not seem to address how > traffic will be prevented from flowing through these links. It's probably > important to note either that traffic will flow along these links, hence > they need to be sized appropriately, or how traffic is prevented from > flowing along these links. > > == > Section 5.1.5 > > Sentence beginning: "The effect of a positive disaggregation..." If > positive deaggregation takes place, it could draw all the traffic to the > deaggregated destinations along a small set of links, causing a hot spot in > the fabric. This might be taken care of "naturally" in the way RIFT does > positive deaggregation, but it might be good to explain this in the > document. > > A more general observation: positive deaggregation, as described in the > document, can potentially cause cascading failures where a group of links > or nodes fail, causing a lot of new information to be pushed into lower > levels rather quickly, which could potentially cause those nodes to overrun > their table or processing space and fail in turn, causing more information > to be dumped into the control plane, etc. It may be worth connsidering > having some form of "hysteresis," timer or other mechanism to prevent > cascading failures of this kind. > > == > Section 5.2.2 > > "All RIFT routers MUST support IPv4 forwarding and MAY support IPv6..." > > So an IPv6 only fabric is not possible? > > == > Section 5.2.6 > > "After the SPF is run, it is necessary to attach according prefixes." > > I think this might want to mean -- > > "After SPF is run, it is necessary to attach the resulting reachability > information in the form of prefixes." > > (?) > > == > Section 5.2.6 > > It seems like the first and third paragraphs describe the same procedure? > Are both descriptions necessary? > > == > Section 5.2.6 > > "An exemplary implementation..." should probably be "an example > implementation..." ?? > > == > Section 5.2.6 > > "After the positive prefixes are attached and tie-broken, negative > prefixes are attached and used in case of northbound computation, ideally > from the shortest length to the longest." > > It seems like the prefixes should be "tie-broken" before being attached. > This sentence is generally confusing, perhaps: > > "After the positive prefixes have been attached, the negative prefixes > will be attached in their prefix-length order (from the shortest to the > longest). These negative prefixes will only be used when computing > northbound reachability." > > == > Section 5.2.6 > > "Observe that despite seeming quite computationally expensive the > operations are only necessary if the only available advertisements for a > prefix are negative since tie-breaking always prefers positive information > for the prefix which stops any kind of recursion since positive information > never inherits next hops." > > Just because something is not done all that often does not mean it's not > computationally expensive. :-) Maybe something like this would be better: > > "Although these operations can be computationally expensive, the overall > load on devices in the network is low because these computations are not > run very often, as positive route advertisements are always preferred over > negative ones. This prevents recursion in most cases because positive > reachability information never inherits next hops." > > Or something like this? > > == > Section 5.2.6 > > "Negative 2001:db8::/32 entry inherits from" > > Probably wants to be "The negative 2001:db8::/32 entry..." > > == > Section 5.2.6 > > "Finally, let us look at the case where S3 becomes available again as > default gateway, ..." > > "...the default gateway..." or "...a default gateway..." > > == > Section 5.2.7 > > "Each RIFT node can optionally operate in zero touch provisioning..." > > "optionally" is not needed here > > == > Section 5.2.7.2 > > "RIFT identifies each node via a SystemID which is a 64 bits wide integer. > It is relatively simple to derive a, for all practical purposes collision > free, value for each node on startup. For that purpose, a node MUST use as > system ID EUI-64 MA-L format [EUI64] where the organizationally governed 24 > bits can be used to generate system IDs for multiple RIFT instances running > on the system." > > This seems too wordy -- maybe: > > "RIFT nodes require a 64 bit SystemID in the EUI-64 MA-L format formed as > described in [EUI64]. The organizationally goverened portion of this ID > (24 bits) can be used to generate multiple IDs if required to indicate more > than one RIFT instance." > > == > Section 5.3.1 > > "Overload Bit MUST be respected..." > > should be > > "The overload bit MUST be respected..." > > == > Section 5.3.2 > > "Since the leafs do see only "one hop away" they do not need to run a full > SPF but can simply gather prefix candidates from their neighbors and build > the according routing table." > > Seems like this should be > > "Since leaf nodes only have one hop of topology information, they do not > need to run SPF. Instead, they can gather the available prefix candidates > and build the routing table accordingly." > > == > Section 5.3.3 > > "time stamp: With this method, the infrastructure memorizes the..." > > I think the correct word here is "records," rather than "memorizes" (?) > > == > Section 5.3.3 > > "sequence counter: With this method, a mobile node notifies its point..." > > Another problem here is the node might not "know" it has been moved from > one location to another in the fabric, particularly if a layer 2 only > attached process is moved along with it's ARP cache, and something like the > EVPN default MAC is used to enable the move. I don't know if this is worth > mentioning, but it does seem like a case that needs to be thought about to > make certain it works with the process described. > > == > Section 5.3.3 > > "The leaf node MUST advertise a time stamp of the latest sighting..." > > Is this for all prefixes, or just for prefixes that are somehow marked or > configured as potentially mobile on the fabric? > > == > Section 5.3.3 > > "RIFT also defines an abstract negative clock (ANSC) that compares as less > than any other clock." > > Does this mean lower priority, or always older than any other clock? It > seems like this means "older" based on 5.3.3.1, but the language could > probably be clearer. > > == > Section 5.3.3 > > One thing not covered here is what happens if some workload moves between > two RIFT fabrics interconnected via some other protocol, such as BGP. It > seems the most obvious solution is to have RIFT treat redistributed > destinations the same as directly attached destinations in terms of clocks, > etc., but this is not called out in the document. It might be worth > mentioning. > > == > Section 5.3.3.3 > > This section seems a little ... muddled? You want to both have the ability > to use only the most recently available versions of any anycast > destination, while also not requiring all the anycast advertisements to > have the same time stamp (?). I'm not certain this section entirely solves > that problem. This sentence might be creating the confusion in my reading > of the section, as I don't really understand what it is saying: > > "An anycast prefix does not carry a clock or all clock attributes MUST be > the same under the rules of Section 5.3.3.1." > > MUST be treated the same as prefixes advertised under the rules of > 5.3.3.1? I'm not certain. > > == > Section 5.3.3.4 > > "RIFT is agnostic whether any overlay technology like [MIP, LISP, VxLAN, > NVO3] and the associated signaling is deployed over it. But it is expected > that leaf nodes, and possibly Top-of-Fabric nodes can perform the according > encapsulation." > > "according" should be "correct," I think. > > == > Section 5.3.6.1 > > "All the multiplications and additions are saturating, i.e. when exceeding > range of the bandwidth type are set to highest possible value of the type." > > Maybe better: > > "If a calculation produces a result exceeding the range of the type, the > result is set to the highest possible value for that type." > > > > _______________________________________________ > RIFT mailing list > RIFT@ietf.org > https://www.ietf.org/mailman/listinfo/rift >
- [Rift] RtgDir Early review: draft-ietf-rift-rift-… russ
- Re: [Rift] RtgDir Early review: draft-ietf-rift-r… Tony Przygienda
- Re: [Rift] RtgDir Early review: draft-ietf-rift-r… Tony Przygienda
- Re: [Rift] RtgDir Early review: draft-ietf-rift-r… Tony Przygienda