Re: [Rift] RtgDir Early review: draft-ietf-rift-rift-06

Tony Przygienda <tonysietf@gmail.com> Thu, 25 July 2019 13:41 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5B53B120073; Thu, 25 Jul 2019 06:41:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id B9kBIKbf8pY7; Thu, 25 Jul 2019 06:41:47 -0700 (PDT)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B6796120261; Thu, 25 Jul 2019 06:41:43 -0700 (PDT)
Received: by mail-ed1-x52a.google.com with SMTP id w20so50251498edd.2; Thu, 25 Jul 2019 06:41:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uDvxWwHakzKd6zOyadV1WAlugRVA33bpwGc2rhaLvMY=; b=uj54h+buiLFQOJqSX4lQ6ULa1n2eHw87D+GU2NGPHZfZW+vc5WoGjY7xxId5PbD19Y VIZFMjyOBZ8rryfumo/NMO+cO8n/6Jqlhb2tDE8ChDyaEIKmd9IbQLYex49ZYIrzooQW rIZ3KVi2G1UgGDrluL1lZ3zVjo0ZQeVx6XR/bKiaR8cHUYUe3aVwIgDha8kKA76blbDJ rh7D63BZtVdq9UOpqFTyMG6A3SvrSaqZA5nGrLxY6UV05R3XzaEfsefJMn+ZX4zC8LTc h2RokByRc6Z3AJVDVg/Dx2uCKTD1G+r++UGiGWYR0MmhzuKOm7qBb27H+jSREV4wFwkI O0eg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uDvxWwHakzKd6zOyadV1WAlugRVA33bpwGc2rhaLvMY=; b=XGuQFHwDJUS7yCBKgSJmikcfnXkTirxm0nn04sKdTOtt1bxvafbyy2IWNXtNLIPLai d61+tLzM+pz6gAWAEoHuBxY2FyNnFSOd1HgkCvX9HAuvc1dLZRE8xSe5cZKbtSgkMBLq xFlhXdCH6PxzXKabgXHcB8SRyIegDHLHLE09rLj8uHdSJdzMby7KZ+pGsWNPARAcyGjB CE2yhJZUYwDtH0idWWtXCajqpfQKSMLPOg+9KFA6sfN/hRqLiLRHkClTaPkUfCa9RxNS fVZaHrz85uVco1Hr+ZmrGFGil0buLlaMU/BaWvEIY1nPE/yagURBk+O3gGaDOKwUwVyx 41xw==
X-Gm-Message-State: APjAAAWGSWdceXfhYO+GEHzNgPQWEoZE4t6Up1LdXJj3fy/jsDS3zwPI Z7jjO4MtWJK7mSg6rM99ln5apaku0waQeGc6tGI=
X-Google-Smtp-Source: APXvYqxF/145XSDKXvOYJ2HXZ3AmU98UdJbjcJVnOpl1Ubki7fYuaiaJULNWR9UX2JMT8UvDRjC2r4TS+CssN9tyxtk=
X-Received: by 2002:a50:b13b:: with SMTP id k56mr79339278edd.192.1564062102154; Thu, 25 Jul 2019 06:41:42 -0700 (PDT)
MIME-Version: 1.0
References: <097e01d5418d$847ee300$8d7ca900$@riw.us> <CA+wi2hO8_-u64-4vo-64GY7KZtdL4s-qKprvLzcjoSmn2jC7XA@mail.gmail.com>
In-Reply-To: <CA+wi2hO8_-u64-4vo-64GY7KZtdL4s-qKprvLzcjoSmn2jC7XA@mail.gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Thu, 25 Jul 2019 09:41:06 -0400
Message-ID: <CA+wi2hOaDevFPN1eSZGSGzr081REp=Oq6=H6r+OxMp3a9hfYcQ@mail.gmail.com>
To: Russ White <russ@riw.us>
Cc: rtg-wg-chairs@ietf.org, draft-ietf-rift-rift@ietf.org, rtg-dir@ietf.org, rift@ietf.org
Content-Type: multipart/alternative; boundary="0000000000000173f6058e819654"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/xmKHWTjJntD7V4k79HGMCBXKJEs>
Subject: Re: [Rift] RtgDir Early review: draft-ietf-rift-rift-06
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Jul 2019 13:41:53 -0000

BTW, if desired recordings of the ssions on my laptop and I can share them
if desired. very tedious, very boring ...

On Thu, Jul 25, 2019 at 9:40 AM Tony Przygienda <tonysietf@gmail.com> wrote:

> Meeting minutes of the author/review meet on the comments
>
> On Tue, Jul 23, 2019 at 3:33 PM <russ@riw.us> wrote:
>
>> Hello
>>
>> I have been selected to do a routing directorate “early” review of this
>> draft.
>> https://datatracker.ietf.org/doc/draft-foo-name/
>>
>> The routing directorate will, on request from the working group chair,
>> perform an “early” review of a draft before it is submitted for publication
>> to the IESG. The early review can be performed at any time during the
>> draft’s lifetime as a working group document. The purpose of the early
>> review depends on the stage that the document has reached.
>>
>> As this document has recently been adopted by the working group, my focus
>> for the review is on providing a new perspective on the work, with the
>> intention of catching any issues early on in the document's life cycle.
>>
>> For more information about the Routing Directorate, please see
>> http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir
>>
>> Document: draft-ietf-rift-rift-06
>> Reviewer: Russ White
>> Review Date: 23 July 2019
>> Intended Status: Standards Track
>>
>> Summary:
>>
>> I have significant concerns about this document. It needs more work
>> before being submitted to the IESG.
>>
>> Comments:
>>
>> Overall this draft is very readable, although I have suggested wording
>> changes below. The protocol described is useful and interesting. I largely
>> have questions about the structure and wording of the draft, although there
>> are a couple of technical questions in the comments below (probably just
>> things I missed the explanation for, however).
>>
>> Structurally, this document almost feels like two different documents.
>> The first specifies a set of use cases and requirements, while the second
>> specifies a protocol. I wonder if it wouldn't be better to split this
>> document into two pieces, making each one more specific, and possibly a bit
>> easier to read? We might too late in the process to make such a change --
>> if so, that's okay, just thought it might be worth considering.
>>
>
> On the intro add sentence "this doc encompasses reqs, secs and specs"
>
>
>>
>> Throughout -- "fat tree," "Clos," and other terms are used in various
>> places. As a reader, given the different meanings for these terms, I found
>> this a bit confusing. It might be better to use "spine and leaf" throughout.
>>
>
> Add to glossary: When we use term Clos we mean by it variants of Clos such
> as multi-plane and what is loosely called Fat Tree and even more generic
> spine-and-leaf designs.
>
>
>> Throughout -- the term "folded" has two very different meanings... The
>> first is a multistage fabric on which traffic flows bi-directionally, the
>> second is a way of drawing spine and leaf fabrics in blocks. It might be
>> useful to clarify this from the beginning somehow, so readers who are
>> expecting one meaning don't misread things because a different meaning is
>> intended (?). I think it's always used here in the "way of drawing a spine
>> and leaf fabric" here.
>>
>
> add to glossary: folded for us is drawing with spines on the top and
> implicitly assuming bi-directionality on all leafs.
>
>
>>
>> ==
>> Page 9
>>
>> The definitions of the various forms of TIE might be better if they
>> clearly differentiated between topology and reachability information (?).
>> The OSPF router LSA, for instance, contains both kinds of information. In
>> IS-IS, adjacencies, which are called Node TIES (I think) are tlv22, while
>> the Prefix TIE would be a tlv128 and 236. I don't think there is an
>> equivalent separation between topology (reachable neighbors) and
>> reachability (reachable prefixes) in OSPF.
>>
>
> remove the "largely equivalent to OSPF Router LSA" part
>
>
>>
>> ==
>> REQ13: "Taking a path through the spine in cases where a shorter path is
>> available is highly undesirable."
>>
>> This doesn't seem to be related to the previous statements in the
>> requirement... Not certain this needs to be included? Or maybe it wants to
>> be someplace else in the list?
>>
>
> Split off the sentence
>
>
>> ==
>> REQ10 and REQ13 both seem to say the same thing -- although REQ13 is
>> better worded. Maybe both of these are not needed?
>>
>
> no action
>
>
>> ==
>> REQ15: "...links of a single node having same addresses." -- might be
>> better as "...links of a single node sharing the same address."
>>
>>
> accepted
>
>
>> ==
>> REQ18: "...security mechanisms that allow to restrict nodes, especially
>> leafs without proper credentials from forming three-way adjacencies."
>>
>> Might be better as:
>>
>> "...security mechanisms that allow the operator to restrict nodes,
>> especially leaf nodes without proper credentials, from forming a three-way
>> adjacency."
>>
>>
> accepted
>
> ==
>> PEND2: 500,000 seems way too small for the scale numbers I've heard in
>> the past -- especially if we are including the underlay protocol running on
>> the hosts in the mix. If there are 300k ports is at least 1 prefix per edge
>> port, and potentially 10-15 prefixes per edge port. If you guess around 20%
>> of the workloads attached to the fabric might be mobile, then the number
>> here is more likely 1 million at the base, with a top end of around 2-3
>> million prefixes. It might be worth adjusting this number a bit larger (?).
>>
>>
> PEND2 to be killed
>
>
>> ==
>> Section 5, paragraph after the header
>>
>> "...when "pointing north" and path vector [RFC4271] protocol when
>> "pointing south". While this is an unusual combination,..."
>>
>> Elsewhere in the document "distance-vector" is used. It might be better
>> to be consistent?
>>
>>
> replace with distance vector
>
>
>> ==
>> Section 5.1.1
>>
>> In the first paragraph ("The most singular property..."), omitting
>> east-west control plane information flow is mentioned twice; could probably
>> just be once.
>>
>
> Consult with Deborah whether we should tighten the prose or make it
> "looser" ?
>
>
>> The first sentence in the second paragraph is a bit of a run-on; might be
>> best to omit some of this, or break it into two sentences.
>>
>> Second paragraph: "The application of those principle lead to RIFT..."
>> doesn't seem right. Might be better as: "The application of these
>> principles lead to RIFT..."
>>
>
> accepted
>
>
>>
>> ==
>> Section 5.1.2, paragraph beginning: "Given the difficulty of visualizing
>> multi plane design which are..." First, this is one long sentence. Second,
>> what's illustrated looks like a crossbar, but it's not a crossbar. I find
>> this a little confusing (?). The problem is -- I don't know of another term
>> to use here.
>>
>
> sentence needs splitting
>
> crossbar stands
>
>
>>
>> ==
>> Section 5.1.2, paragraph beginning: "The typical topology for which RIFT
>> is defined is built of a number P..." This seems to describe a five stage
>> butterfly or pod-and-spine fabric, but does not seem to describe a seven
>> stage (?).
>>
>
> by a number of "top of fabric" nodes ...
>
>
>>
>> Overall -- I'm not certain trying to describe these topologies at this
>> level in a protocol specification draft is all that useful. The terminology
>> is fluid (used by different people in different ways), and this depth of
>> explanation does not seem to be useful to describe the protocol operation.
>> This is one place where splitting the document into two pieces might be
>> helpful.
>>
>
> Word of caution sentence: please follow the glossary tightly, you get lost
> quickly otherwise ...
>
>
>>
>> ==
>> Paragraph beginning: "It is critical at this junction that the concept
>> and the picture of..." What is shown is not really a crossbar fabric... I'm
>> not certain calling this a crossbar is helpful here (?).
>>
>
> add crossbar to glossary:  physical arrangement of ports in a switching
> matrix without implying any further scheduling or buffering disciplines
>
>
>> ==
>> Section 5.1.4: "This happens naturally in single plane design but needs
>> additional considerations in multiplane fabrics." When aggregation is used,
>> as well -- but not when no aggregation of reachability information is
>> deployed in the fabric. It might be worth adding that qualification here.
>>
>
> when aggregation is used
>
>
>>
>> ==
>> Section 5.1.4
>>
>> Sentence beginning: "In more detail, by reserving two ports on each
>> Top-of-Fabric node it is possible to connect them together in an interplane
>> bi-directional ring as illustrated in Figure 13..."
>>
>> I might have missed it, but this document does not seem to address how
>> traffic will be prevented from flowing through these links. It's probably
>> important to note either that traffic will flow along these links, hence
>> they need to be sized appropriately, or how traffic is prevented from
>> flowing along these links.
>>
>>
> defect: spec the computation or "the computation has to prevent" . Enhace
> section 5.2.4 by explaining how ToFs MUST not use the ring @ the top to
> forward"
>
>
>> ==
>> Section 5.1.5
>>
>> Sentence beginning: "The effect of a positive disaggregation..." If
>> positive deaggregation takes place, it could draw all the traffic to the
>> deaggregated destinations along a small set of links, causing a hot spot in
>> the fabric. This might be taken care of "naturally" in the way RIFT does
>> positive deaggregation, but it might be good to explain this in the
>> document.
>>
>
> in the doc already
>
>
>>
>> A more general observation: positive deaggregation, as described in the
>> document, can potentially cause cascading failures where a group of links
>> or nodes fail, causing a lot of new information to be pushed into lower
>> levels rather quickly, which could potentially cause those nodes to overrun
>> their table or processing space and fail in turn, causing more information
>> to be dumped into the control plane, etc. It may be worth connsidering
>> having some form of "hysteresis," timer or other mechanism to prevent
>> cascading failures of this kind.
>>
>
> positive is NOT transitive hence comment needs no addressing.
>
> Clarify maybe once more the fact.
>
>
>>
>> ==
>> Section 5.2.2
>>
>> "All RIFT routers MUST support IPv4 forwarding and MAY support IPv6..."
>>
>> So an IPv6 only fabric is not possible?
>>
>
> based on multiple thread router requirements are removed
>
>
>>
>> ==
>> Section 5.2.6
>>
>> "After the SPF is run, it is necessary to attach according prefixes."
>>
>> I think this might want to mean --
>>
>> "After SPF is run, it is necessary to attach the resulting reachability
>> information in the form of prefixes."
>>
>>
> agreed
>
>
>> (?)
>>
>> ==
>> Section 5.2.6
>>
>> It seems like the first and third paragraphs describe the same procedure?
>> Are both descriptions necessary?
>>
>
> need to be merged
>
>
>> ==
>> Section 5.2.6
>>
>> "An exemplary implementation..." should probably be "an example
>> implementation..." ??
>>
>> ==
>> Section 5.2.6
>>
>> "After the positive prefixes are attached and tie-broken, negative
>> prefixes are attached and used in case of northbound computation, ideally
>> from the shortest length to the longest."
>>
>
>
>>
>> It seems like the prefixes should be "tie-broken" before being attached.
>> This sentence is generally confusing, perhaps:
>>
>
>> "After the positive prefixes have been attached, the negative prefixes
>> will be attached in their prefix-length order (from the shortest to the
>> longest). These negative prefixes will only be used when computing
>> northbound reachability."
>>
>
> "Attach all prefixes. Then build next-hop per type. Thne within types
> tie-break by sequence in the model. if only negative remains run the
> according algorithm"
>
> Special Consideration for securtity Host implementation: suggestion to
> throttle aggressively, implementation on the ToR to max-prefix/max-state
> knobbing
>
>
>>
>> ==
>> Section 5.2.6
>>
>> "Observe that despite seeming quite computationally expensive the
>> operations are only necessary if the only available advertisements for a
>> prefix are negative since tie-breaking always prefers positive information
>> for the prefix which stops any kind of recursion since positive information
>> never inherits next hops."
>>
>> Just because something is not done all that often does not mean it's not
>> computationally expensive. :-) Maybe something like this would be better:
>>
>> "Although these operations can be computationally expensive, the overall
>> load on devices in the network is low because these computations are not
>> run very often, as positive route advertisements are always preferred over
>> negative ones. This prevents recursion in most cases because positive
>> reachability information never inherits next hops."
>>
>> Or something like this?
>>
>
> ack
>
>
>>
>> ==
>> Section 5.2.6
>>
>> "Negative 2001:db8::/32 entry inherits from"
>>
>> Probably wants to be "The negative 2001:db8::/32 entry..."
>>
>
> nit
>
>
>> ==
>> Section 5.2.6
>>
>> "Finally, let us look at the case where S3 becomes available again as
>> default gateway, ..."
>>
>> "...the default gateway..." or "...a default gateway..."
>>
>> ack
>
>
>> ==
>> Section 5.2.7
>>
>> "Each RIFT node can optionally operate in zero touch provisioning..."
>>
>> "optionally" is not needed here
>>
>
> "RIFT nodes MAY use ZTP or be configured, both modes being mixed in the
> same instance if desired"
>
> remove optionally
>
>
>>
>> ==
>> Section 5.2.7.2
>>
>> "RIFT identifies each node via a SystemID which is a 64 bits wide
>> integer. It is relatively simple to derive a, for all practical purposes
>> collision free, value for each node on startup. For that purpose, a node
>> MUST use as system ID EUI-64 MA-L format [EUI64] where the organizationally
>> governed 24 bits can be used to generate system IDs for multiple RIFT
>> instances running on the system."
>>
>> This seems too wordy -- maybe:
>>
>> "RIFT nodes require a 64 bit SystemID in the EUI-64 MA-L format formed as
>> described in [EUI64]. The organizationally goverened portion of this ID
>> (24 bits) can be used to generate multiple IDs if required to indicate more
>> than one RIFT instance."
>>
>
> "As prescribed by EUI ... RIFT nodes MUST form a 64 bit SID
>
>
>>
>> ==
>> Section 5.3.1
>>
>> "Overload Bit MUST be respected..."
>>
>> should be
>>
>> "The overload bit MUST be respected..."
>>
>>
> ack
>
>
>> ==
>> Section 5.3.2
>>
>> "Since the leafs do see only "one hop away" they do not need to run a
>> full SPF but can simply gather prefix candidates from their neighbors and
>> build the according routing table."
>>
>> Seems like this should be
>>
>> "Since leaf nodes only have one hop of topology information, they do not
>> need to run SPF. Instead, they can gather the available prefix candidates
>> and build the routing table accordingly."
>>
>
> ack
>
>
>>
>> ==
>> Section 5.3.3
>>
>> "time stamp: With this method, the infrastructure memorizes the..."
>>
>> I think the correct word here is "records," rather than "memorizes" (?)
>>
>
> ack
>
>
>>
>> ==
>> Section 5.3.3
>>
>> "sequence counter: With this method, a mobile node notifies its point..."
>>
>> Another problem here is the node might not "know" it has been moved from
>> one location to another in the fabric, particularly if a layer 2 only
>> attached process is moved along with it's ARP cache, and something like the
>> EVPN default MAC is used to enable the move. I don't know if this is worth
>> mentioning, but it does seem like a case that needs to be thought about to
>> make certain it works with the process described.
>>
>>
> no change necessary
>
>
>> ==
>> Section 5.3.3
>>
>> "The leaf node MUST advertise a time stamp of the latest sighting..."
>>
>> Is this for all prefixes, or just for prefixes that are somehow marked or
>> configured as potentially mobile on the fabric?
>>
>>
> A leaf node MAY ...
>
>
>> ==
>> Section 5.3.3
>>
>> "RIFT also defines an abstract negative clock (ANSC) that compares as
>> less than any other clock."
>>
>> Does this mean lower priority, or always older than any other clock? It
>> seems like this means "older" based on 5.3.3.1, but the language could
>> probably be clearer.
>>
>>
> nothing to do
>
>
>> ==
>> Section 5.3.3
>>
>> One thing not covered here is what happens if some workload moves between
>> two RIFT fabrics interconnected via some other protocol, such as BGP. It
>> seems the most obvious solution is to have RIFT treat redistributed
>> destinations the same as directly attached destinations in terms of clocks,
>> etc., but this is not called out in the document. It might be worth
>> mentioning.
>>
>>
> no action
>
>
>> ==
>> Section 5.3.3.3
>>
>> This section seems a little ... muddled? You want to both have the
>> ability to use only the most recently available versions of any anycast
>> destination, while also not requiring all the anycast advertisements to
>> have the same time stamp (?). I'm not certain this section entirely solves
>> that problem. This sentence might be creating the confusion in my reading
>> of the section, as I don't really understand what it is saying:
>>
>> "An anycast prefix does not carry a clock or all clock attributes MUST be
>> the same under the rules of Section 5.3.3.1."
>>
>> MUST be treated the same as prefixes advertised under the rules of
>> 5.3.3.1? I'm not certain.
>>
>
> it's complex for a reason. left as that.
>
>
>>
>> ==
>> Section 5.3.3.4
>>
>> "RIFT is agnostic whether any overlay technology like [MIP, LISP, VxLAN,
>> NVO3] and the associated signaling is deployed over it. But it is expected
>> that leaf nodes, and possibly Top-of-Fabric nodes can perform the according
>> encapsulation."
>>
>> "according" should be "correct," I think.
>>
>
> fine
>
>
>>
>> ==
>> Section 5.3.6.1
>>
>> "All the multiplications and additions are saturating, i.e. when
>> exceeding range of the bandwidth type are set to highest possible value of
>> the type."
>>
>> Maybe better:
>>
>> "If a calculation produces a result exceeding the range of the type, the
>> result is set to the highest possible value for that type."
>>
>>
> fine
>
>
>>
>> _______________________________________________
>> RIFT mailing list
>> RIFT@ietf.org
>> https://www.ietf.org/mailman/listinfo/rift
>>
>