Re: [RTG-DIR] [Rift] RtgDir Early review: draft-ietf-rift-rift-06
Tony Przygienda <tonysietf@gmail.com> Thu, 25 July 2019 13:41 UTC
Return-Path: <tonysietf@gmail.com>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5B53B120073; Thu, 25 Jul 2019 06:41:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id B9kBIKbf8pY7; Thu, 25 Jul 2019 06:41:47 -0700 (PDT)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B6796120261; Thu, 25 Jul 2019 06:41:43 -0700 (PDT)
Received: by mail-ed1-x52a.google.com with SMTP id w20so50251498edd.2; Thu, 25 Jul 2019 06:41:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uDvxWwHakzKd6zOyadV1WAlugRVA33bpwGc2rhaLvMY=; b=uj54h+buiLFQOJqSX4lQ6ULa1n2eHw87D+GU2NGPHZfZW+vc5WoGjY7xxId5PbD19Y VIZFMjyOBZ8rryfumo/NMO+cO8n/6Jqlhb2tDE8ChDyaEIKmd9IbQLYex49ZYIrzooQW rIZ3KVi2G1UgGDrluL1lZ3zVjo0ZQeVx6XR/bKiaR8cHUYUe3aVwIgDha8kKA76blbDJ rh7D63BZtVdq9UOpqFTyMG6A3SvrSaqZA5nGrLxY6UV05R3XzaEfsefJMn+ZX4zC8LTc h2RokByRc6Z3AJVDVg/Dx2uCKTD1G+r++UGiGWYR0MmhzuKOm7qBb27H+jSREV4wFwkI O0eg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uDvxWwHakzKd6zOyadV1WAlugRVA33bpwGc2rhaLvMY=; b=XGuQFHwDJUS7yCBKgSJmikcfnXkTirxm0nn04sKdTOtt1bxvafbyy2IWNXtNLIPLai d61+tLzM+pz6gAWAEoHuBxY2FyNnFSOd1HgkCvX9HAuvc1dLZRE8xSe5cZKbtSgkMBLq xFlhXdCH6PxzXKabgXHcB8SRyIegDHLHLE09rLj8uHdSJdzMby7KZ+pGsWNPARAcyGjB CE2yhJZUYwDtH0idWWtXCajqpfQKSMLPOg+9KFA6sfN/hRqLiLRHkClTaPkUfCa9RxNS fVZaHrz85uVco1Hr+ZmrGFGil0buLlaMU/BaWvEIY1nPE/yagURBk+O3gGaDOKwUwVyx 41xw==
X-Gm-Message-State: APjAAAWGSWdceXfhYO+GEHzNgPQWEoZE4t6Up1LdXJj3fy/jsDS3zwPI Z7jjO4MtWJK7mSg6rM99ln5apaku0waQeGc6tGI=
X-Google-Smtp-Source: APXvYqxF/145XSDKXvOYJ2HXZ3AmU98UdJbjcJVnOpl1Ubki7fYuaiaJULNWR9UX2JMT8UvDRjC2r4TS+CssN9tyxtk=
X-Received: by 2002:a50:b13b:: with SMTP id k56mr79339278edd.192.1564062102154; Thu, 25 Jul 2019 06:41:42 -0700 (PDT)
MIME-Version: 1.0
References: <097e01d5418d$847ee300$8d7ca900$@riw.us> <CA+wi2hO8_-u64-4vo-64GY7KZtdL4s-qKprvLzcjoSmn2jC7XA@mail.gmail.com>
In-Reply-To: <CA+wi2hO8_-u64-4vo-64GY7KZtdL4s-qKprvLzcjoSmn2jC7XA@mail.gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Thu, 25 Jul 2019 09:41:06 -0400
Message-ID: <CA+wi2hOaDevFPN1eSZGSGzr081REp=Oq6=H6r+OxMp3a9hfYcQ@mail.gmail.com>
To: Russ White <russ@riw.us>
Cc: rtg-wg-chairs@ietf.org, draft-ietf-rift-rift@ietf.org, rtg-dir@ietf.org, rift@ietf.org
Content-Type: multipart/alternative; boundary="0000000000000173f6058e819654"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/CWO2jrm4u-1fVO_LPfme9kyscdE>
Subject: Re: [RTG-DIR] [Rift] RtgDir Early review: draft-ietf-rift-rift-06
X-BeenThere: rtg-dir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir/>
List-Post: <mailto:rtg-dir@ietf.org>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-dir>, <mailto:rtg-dir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jul 2019 13:41:53 -0000
BTW, if desired recordings of the ssions on my laptop and I can share them if desired. very tedious, very boring ... On Thu, Jul 25, 2019 at 9:40 AM Tony Przygienda <tonysietf@gmail.com> wrote: > Meeting minutes of the author/review meet on the comments > > On Tue, Jul 23, 2019 at 3:33 PM <russ@riw.us> wrote: > >> Hello >> >> I have been selected to do a routing directorate “early” review of this >> draft. >> https://datatracker.ietf.org/doc/draft-foo-name/ >> >> The routing directorate will, on request from the working group chair, >> perform an “early” review of a draft before it is submitted for publication >> to the IESG. The early review can be performed at any time during the >> draft’s lifetime as a working group document. The purpose of the early >> review depends on the stage that the document has reached. >> >> As this document has recently been adopted by the working group, my focus >> for the review is on providing a new perspective on the work, with the >> intention of catching any issues early on in the document's life cycle. >> >> For more information about the Routing Directorate, please see >> http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir >> >> Document: draft-ietf-rift-rift-06 >> Reviewer: Russ White >> Review Date: 23 July 2019 >> Intended Status: Standards Track >> >> Summary: >> >> I have significant concerns about this document. It needs more work >> before being submitted to the IESG. >> >> Comments: >> >> Overall this draft is very readable, although I have suggested wording >> changes below. The protocol described is useful and interesting. I largely >> have questions about the structure and wording of the draft, although there >> are a couple of technical questions in the comments below (probably just >> things I missed the explanation for, however). >> >> Structurally, this document almost feels like two different documents. >> The first specifies a set of use cases and requirements, while the second >> specifies a protocol. I wonder if it wouldn't be better to split this >> document into two pieces, making each one more specific, and possibly a bit >> easier to read? We might too late in the process to make such a change -- >> if so, that's okay, just thought it might be worth considering. >> > > On the intro add sentence "this doc encompasses reqs, secs and specs" > > >> >> Throughout -- "fat tree," "Clos," and other terms are used in various >> places. As a reader, given the different meanings for these terms, I found >> this a bit confusing. It might be better to use "spine and leaf" throughout. >> > > Add to glossary: When we use term Clos we mean by it variants of Clos such > as multi-plane and what is loosely called Fat Tree and even more generic > spine-and-leaf designs. > > >> Throughout -- the term "folded" has two very different meanings... The >> first is a multistage fabric on which traffic flows bi-directionally, the >> second is a way of drawing spine and leaf fabrics in blocks. It might be >> useful to clarify this from the beginning somehow, so readers who are >> expecting one meaning don't misread things because a different meaning is >> intended (?). I think it's always used here in the "way of drawing a spine >> and leaf fabric" here. >> > > add to glossary: folded for us is drawing with spines on the top and > implicitly assuming bi-directionality on all leafs. > > >> >> == >> Page 9 >> >> The definitions of the various forms of TIE might be better if they >> clearly differentiated between topology and reachability information (?). >> The OSPF router LSA, for instance, contains both kinds of information. In >> IS-IS, adjacencies, which are called Node TIES (I think) are tlv22, while >> the Prefix TIE would be a tlv128 and 236. I don't think there is an >> equivalent separation between topology (reachable neighbors) and >> reachability (reachable prefixes) in OSPF. >> > > remove the "largely equivalent to OSPF Router LSA" part > > >> >> == >> REQ13: "Taking a path through the spine in cases where a shorter path is >> available is highly undesirable." >> >> This doesn't seem to be related to the previous statements in the >> requirement... Not certain this needs to be included? Or maybe it wants to >> be someplace else in the list? >> > > Split off the sentence > > >> == >> REQ10 and REQ13 both seem to say the same thing -- although REQ13 is >> better worded. Maybe both of these are not needed? >> > > no action > > >> == >> REQ15: "...links of a single node having same addresses." -- might be >> better as "...links of a single node sharing the same address." >> >> > accepted > > >> == >> REQ18: "...security mechanisms that allow to restrict nodes, especially >> leafs without proper credentials from forming three-way adjacencies." >> >> Might be better as: >> >> "...security mechanisms that allow the operator to restrict nodes, >> especially leaf nodes without proper credentials, from forming a three-way >> adjacency." >> >> > accepted > > == >> PEND2: 500,000 seems way too small for the scale numbers I've heard in >> the past -- especially if we are including the underlay protocol running on >> the hosts in the mix. If there are 300k ports is at least 1 prefix per edge >> port, and potentially 10-15 prefixes per edge port. If you guess around 20% >> of the workloads attached to the fabric might be mobile, then the number >> here is more likely 1 million at the base, with a top end of around 2-3 >> million prefixes. It might be worth adjusting this number a bit larger (?). >> >> > PEND2 to be killed > > >> == >> Section 5, paragraph after the header >> >> "...when "pointing north" and path vector [RFC4271] protocol when >> "pointing south". While this is an unusual combination,..." >> >> Elsewhere in the document "distance-vector" is used. It might be better >> to be consistent? >> >> > replace with distance vector > > >> == >> Section 5.1.1 >> >> In the first paragraph ("The most singular property..."), omitting >> east-west control plane information flow is mentioned twice; could probably >> just be once. >> > > Consult with Deborah whether we should tighten the prose or make it > "looser" ? > > >> The first sentence in the second paragraph is a bit of a run-on; might be >> best to omit some of this, or break it into two sentences. >> >> Second paragraph: "The application of those principle lead to RIFT..." >> doesn't seem right. Might be better as: "The application of these >> principles lead to RIFT..." >> > > accepted > > >> >> == >> Section 5.1.2, paragraph beginning: "Given the difficulty of visualizing >> multi plane design which are..." First, this is one long sentence. Second, >> what's illustrated looks like a crossbar, but it's not a crossbar. I find >> this a little confusing (?). The problem is -- I don't know of another term >> to use here. >> > > sentence needs splitting > > crossbar stands > > >> >> == >> Section 5.1.2, paragraph beginning: "The typical topology for which RIFT >> is defined is built of a number P..." This seems to describe a five stage >> butterfly or pod-and-spine fabric, but does not seem to describe a seven >> stage (?). >> > > by a number of "top of fabric" nodes ... > > >> >> Overall -- I'm not certain trying to describe these topologies at this >> level in a protocol specification draft is all that useful. The terminology >> is fluid (used by different people in different ways), and this depth of >> explanation does not seem to be useful to describe the protocol operation. >> This is one place where splitting the document into two pieces might be >> helpful. >> > > Word of caution sentence: please follow the glossary tightly, you get lost > quickly otherwise ... > > >> >> == >> Paragraph beginning: "It is critical at this junction that the concept >> and the picture of..." What is shown is not really a crossbar fabric... I'm >> not certain calling this a crossbar is helpful here (?). >> > > add crossbar to glossary: physical arrangement of ports in a switching > matrix without implying any further scheduling or buffering disciplines > > >> == >> Section 5.1.4: "This happens naturally in single plane design but needs >> additional considerations in multiplane fabrics." When aggregation is used, >> as well -- but not when no aggregation of reachability information is >> deployed in the fabric. It might be worth adding that qualification here. >> > > when aggregation is used > > >> >> == >> Section 5.1.4 >> >> Sentence beginning: "In more detail, by reserving two ports on each >> Top-of-Fabric node it is possible to connect them together in an interplane >> bi-directional ring as illustrated in Figure 13..." >> >> I might have missed it, but this document does not seem to address how >> traffic will be prevented from flowing through these links. It's probably >> important to note either that traffic will flow along these links, hence >> they need to be sized appropriately, or how traffic is prevented from >> flowing along these links. >> >> > defect: spec the computation or "the computation has to prevent" . Enhace > section 5.2.4 by explaining how ToFs MUST not use the ring @ the top to > forward" > > >> == >> Section 5.1.5 >> >> Sentence beginning: "The effect of a positive disaggregation..." If >> positive deaggregation takes place, it could draw all the traffic to the >> deaggregated destinations along a small set of links, causing a hot spot in >> the fabric. This might be taken care of "naturally" in the way RIFT does >> positive deaggregation, but it might be good to explain this in the >> document. >> > > in the doc already > > >> >> A more general observation: positive deaggregation, as described in the >> document, can potentially cause cascading failures where a group of links >> or nodes fail, causing a lot of new information to be pushed into lower >> levels rather quickly, which could potentially cause those nodes to overrun >> their table or processing space and fail in turn, causing more information >> to be dumped into the control plane, etc. It may be worth connsidering >> having some form of "hysteresis," timer or other mechanism to prevent >> cascading failures of this kind. >> > > positive is NOT transitive hence comment needs no addressing. > > Clarify maybe once more the fact. > > >> >> == >> Section 5.2.2 >> >> "All RIFT routers MUST support IPv4 forwarding and MAY support IPv6..." >> >> So an IPv6 only fabric is not possible? >> > > based on multiple thread router requirements are removed > > >> >> == >> Section 5.2.6 >> >> "After the SPF is run, it is necessary to attach according prefixes." >> >> I think this might want to mean -- >> >> "After SPF is run, it is necessary to attach the resulting reachability >> information in the form of prefixes." >> >> > agreed > > >> (?) >> >> == >> Section 5.2.6 >> >> It seems like the first and third paragraphs describe the same procedure? >> Are both descriptions necessary? >> > > need to be merged > > >> == >> Section 5.2.6 >> >> "An exemplary implementation..." should probably be "an example >> implementation..." ?? >> >> == >> Section 5.2.6 >> >> "After the positive prefixes are attached and tie-broken, negative >> prefixes are attached and used in case of northbound computation, ideally >> from the shortest length to the longest." >> > > >> >> It seems like the prefixes should be "tie-broken" before being attached. >> This sentence is generally confusing, perhaps: >> > >> "After the positive prefixes have been attached, the negative prefixes >> will be attached in their prefix-length order (from the shortest to the >> longest). These negative prefixes will only be used when computing >> northbound reachability." >> > > "Attach all prefixes. Then build next-hop per type. Thne within types > tie-break by sequence in the model. if only negative remains run the > according algorithm" > > Special Consideration for securtity Host implementation: suggestion to > throttle aggressively, implementation on the ToR to max-prefix/max-state > knobbing > > >> >> == >> Section 5.2.6 >> >> "Observe that despite seeming quite computationally expensive the >> operations are only necessary if the only available advertisements for a >> prefix are negative since tie-breaking always prefers positive information >> for the prefix which stops any kind of recursion since positive information >> never inherits next hops." >> >> Just because something is not done all that often does not mean it's not >> computationally expensive. :-) Maybe something like this would be better: >> >> "Although these operations can be computationally expensive, the overall >> load on devices in the network is low because these computations are not >> run very often, as positive route advertisements are always preferred over >> negative ones. This prevents recursion in most cases because positive >> reachability information never inherits next hops." >> >> Or something like this? >> > > ack > > >> >> == >> Section 5.2.6 >> >> "Negative 2001:db8::/32 entry inherits from" >> >> Probably wants to be "The negative 2001:db8::/32 entry..." >> > > nit > > >> == >> Section 5.2.6 >> >> "Finally, let us look at the case where S3 becomes available again as >> default gateway, ..." >> >> "...the default gateway..." or "...a default gateway..." >> >> ack > > >> == >> Section 5.2.7 >> >> "Each RIFT node can optionally operate in zero touch provisioning..." >> >> "optionally" is not needed here >> > > "RIFT nodes MAY use ZTP or be configured, both modes being mixed in the > same instance if desired" > > remove optionally > > >> >> == >> Section 5.2.7.2 >> >> "RIFT identifies each node via a SystemID which is a 64 bits wide >> integer. It is relatively simple to derive a, for all practical purposes >> collision free, value for each node on startup. For that purpose, a node >> MUST use as system ID EUI-64 MA-L format [EUI64] where the organizationally >> governed 24 bits can be used to generate system IDs for multiple RIFT >> instances running on the system." >> >> This seems too wordy -- maybe: >> >> "RIFT nodes require a 64 bit SystemID in the EUI-64 MA-L format formed as >> described in [EUI64]. The organizationally goverened portion of this ID >> (24 bits) can be used to generate multiple IDs if required to indicate more >> than one RIFT instance." >> > > "As prescribed by EUI ... RIFT nodes MUST form a 64 bit SID > > >> >> == >> Section 5.3.1 >> >> "Overload Bit MUST be respected..." >> >> should be >> >> "The overload bit MUST be respected..." >> >> > ack > > >> == >> Section 5.3.2 >> >> "Since the leafs do see only "one hop away" they do not need to run a >> full SPF but can simply gather prefix candidates from their neighbors and >> build the according routing table." >> >> Seems like this should be >> >> "Since leaf nodes only have one hop of topology information, they do not >> need to run SPF. Instead, they can gather the available prefix candidates >> and build the routing table accordingly." >> > > ack > > >> >> == >> Section 5.3.3 >> >> "time stamp: With this method, the infrastructure memorizes the..." >> >> I think the correct word here is "records," rather than "memorizes" (?) >> > > ack > > >> >> == >> Section 5.3.3 >> >> "sequence counter: With this method, a mobile node notifies its point..." >> >> Another problem here is the node might not "know" it has been moved from >> one location to another in the fabric, particularly if a layer 2 only >> attached process is moved along with it's ARP cache, and something like the >> EVPN default MAC is used to enable the move. I don't know if this is worth >> mentioning, but it does seem like a case that needs to be thought about to >> make certain it works with the process described. >> >> > no change necessary > > >> == >> Section 5.3.3 >> >> "The leaf node MUST advertise a time stamp of the latest sighting..." >> >> Is this for all prefixes, or just for prefixes that are somehow marked or >> configured as potentially mobile on the fabric? >> >> > A leaf node MAY ... > > >> == >> Section 5.3.3 >> >> "RIFT also defines an abstract negative clock (ANSC) that compares as >> less than any other clock." >> >> Does this mean lower priority, or always older than any other clock? It >> seems like this means "older" based on 5.3.3.1, but the language could >> probably be clearer. >> >> > nothing to do > > >> == >> Section 5.3.3 >> >> One thing not covered here is what happens if some workload moves between >> two RIFT fabrics interconnected via some other protocol, such as BGP. It >> seems the most obvious solution is to have RIFT treat redistributed >> destinations the same as directly attached destinations in terms of clocks, >> etc., but this is not called out in the document. It might be worth >> mentioning. >> >> > no action > > >> == >> Section 5.3.3.3 >> >> This section seems a little ... muddled? You want to both have the >> ability to use only the most recently available versions of any anycast >> destination, while also not requiring all the anycast advertisements to >> have the same time stamp (?). I'm not certain this section entirely solves >> that problem. This sentence might be creating the confusion in my reading >> of the section, as I don't really understand what it is saying: >> >> "An anycast prefix does not carry a clock or all clock attributes MUST be >> the same under the rules of Section 5.3.3.1." >> >> MUST be treated the same as prefixes advertised under the rules of >> 5.3.3.1? I'm not certain. >> > > it's complex for a reason. left as that. > > >> >> == >> Section 5.3.3.4 >> >> "RIFT is agnostic whether any overlay technology like [MIP, LISP, VxLAN, >> NVO3] and the associated signaling is deployed over it. But it is expected >> that leaf nodes, and possibly Top-of-Fabric nodes can perform the according >> encapsulation." >> >> "according" should be "correct," I think. >> > > fine > > >> >> == >> Section 5.3.6.1 >> >> "All the multiplications and additions are saturating, i.e. when >> exceeding range of the bandwidth type are set to highest possible value of >> the type." >> >> Maybe better: >> >> "If a calculation produces a result exceeding the range of the type, the >> result is set to the highest possible value for that type." >> >> > fine > > >> >> _______________________________________________ >> RIFT mailing list >> RIFT@ietf.org >> https://www.ietf.org/mailman/listinfo/rift >> >
- [RTG-DIR] RtgDir Early review: draft-ietf-rift-ri… russ
- Re: [RTG-DIR] [Rift] RtgDir Early review: draft-i… Tony Przygienda
- Re: [RTG-DIR] [Rift] RtgDir Early review: draft-i… Tony Przygienda
- Re: [RTG-DIR] [Rift] RtgDir Early review: draft-i… Tony Przygienda