Re: [Rift] Routing directorate early review of draft-ietf-rift-rift

Tony Przygienda <tonysietf@gmail.com> Mon, 04 November 2019 15:25 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AEB7D12085B; Mon, 4 Nov 2019 07:25:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.997
X-Spam-Level:
X-Spam-Status: No, score=-1.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mji1KbsZ5dz7; Mon, 4 Nov 2019 07:25:17 -0800 (PST)
Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2B81D12011F; Mon, 4 Nov 2019 07:25:17 -0800 (PST)
Received: by mail-il1-x133.google.com with SMTP id f201so9497127ilh.6; Mon, 04 Nov 2019 07:25:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sQ7zgMqoepcPxpM/LFdRbO+xQ+Rq3cOXCsd7kz0Fx3I=; b=lGfOd1lARSvyBjzNFXmZTGRuytyf5x0zAa2Jo/wl2iBFbSy48hOf5ZZ1OfSBfieZPs ydcKBrUInVj1fn4KfjGlj6kP0VM4ZJQmKzFvx8WXiwVUMpIIf/aA2Ct02ntJz2sfcc2r FSyOMRPnsVA7nTNpNiegERxShkT/NwWzAacAuPENeVmyIchT1xQshsGV11C31wotfbQU 0qq8Pz0fCsRgYVanL32G/7fT43+AGeBaziWpm49NbrAieUpPDgXh7fswjODjPVpBE+h1 izz5MoIdr7NO6fDI5WGI8CtvCk8A0HPWU+V2GxwKTc88xvAt7g4LgCiYRZhXOBBSzjyc pElg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sQ7zgMqoepcPxpM/LFdRbO+xQ+Rq3cOXCsd7kz0Fx3I=; b=sxAqKgAFLrLmo8EaFQaKjLYzn82JgOKh0tcOOZ7qXXTt65MXG8ticQfZJiVBp+eBg6 tMJQ4GlUEgpFeWT3A4heNvQoi8UmJUxFdCyVyXfGUwiTWRpfdqpTBeoXndbw/dLF3R8r iJbwD3gG2swnhEVIYPbkghewH85/YBtivVSX3GBcQPHbWT7p+zjC7dFC63CJeoFeLO+8 XiLb70cyiOG3gl8Y4aABbauH/JTY9XZjZPiwoI6hbHohYhQW2vuq1tDy6LAQFnQvXt5o 65AsoNU8Yi/DYTIF/rLsnDjqFF5uxZY5bOd8dQcg6FxTqxVJlNI2skj1ZBmT8kcov3Ro tvmA==
X-Gm-Message-State: APjAAAXxcaLHB3vDSiuo3ZdPu3EMdjVEPLzH/rrwCMFPmb4dD/+YGx10 yu+NmSvs5iRdeW6FR7u5uL1XTmJayGVFh7rpjpE=
X-Google-Smtp-Source: APXvYqzGD2GAk6OgQbSvmQUf3WyxnJ4qR+2FkyHsf+lN/7LOeJi7UdUVQYsAlj0FAtxFF+whM2L2/rOH9X3wZ45miuI=
X-Received: by 2002:a92:5c15:: with SMTP id q21mr29096105ilb.239.1572881116446; Mon, 04 Nov 2019 07:25:16 -0800 (PST)
MIME-Version: 1.0
References: <BL0PR02MB48689FA2D6B7C255DF11045D84630@BL0PR02MB4868.namprd02.prod.outlook.com> <CA+wi2hO=rZ2mbX3ZJVgn9cSvfbot29W+MNnunysPhPv+3Mxykw@mail.gmail.com> <BL0PR02MB48684435784A92180AEE2F87847F0@BL0PR02MB4868.namprd02.prod.outlook.com>
In-Reply-To: <BL0PR02MB48684435784A92180AEE2F87847F0@BL0PR02MB4868.namprd02.prod.outlook.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Mon, 04 Nov 2019 07:24:23 -0800
Message-ID: <CA+wi2hOzCbWh2U9G+AUUb8U+T+v+-7qpPVK7jn4NzxooB=dwgw@mail.gmail.com>
To: Jonathan Hardwick <Jonathan.Hardwick@metaswitch.com>
Cc: "rift-chairs@ietf.org" <rift-chairs@ietf.org>, "draft-ietf-rift-rift.all@ietf.org" <draft-ietf-rift-rift.all@ietf.org>, "rtg-dir@ietf.org" <rtg-dir@ietf.org>, Luc André Burdet <laburdet.ietf@gmail.com>, Min Ye <amy.yemin@huawei.com>, "rift@ietf.org" <rift@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000384761059686ec9d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/1OqcnrCZBRjui3LFGqVdW0Bvo00>
Subject: Re: [Rift] Routing directorate early review of draft-ietf-rift-rift
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Nov 2019 15:25:23 -0000

On Mon, Nov 4, 2019 at 4:40 AM Jonathan Hardwick <
Jonathan.Hardwick@metaswitch.com> wrote:

> Tony, many thanks for your reply – please see [JEH] below.
>
> Jon
>
>
>
>
>
> We are in IETF here where "rough consenus and running code" was the
> receipe of success vs. much heavier handed organizations like OSI and I
> think in this philosophy the spec, if anything, is possibly overspecified
> already ;-) The core pieces that bare no slips like flooding and adjacency
> formation are very precisely written including FSMs.
>
>
>
> [JEH] Sure.  My comments were intended to help improve the use of
> normative language and the delineation between normative passages and
> informative ones.
>
>
>

we're in sync, I am tightening normative language where necessary. Will
take until next rev, I try to put out today intermediate state with lots
review comments accommodate but won't manage to get e'thing in until the
one after.


>
>
>
>    -
>    - The definition of the protocol and some of the normative behaviour
>    is deferred to the appendices, whereas I would expect to encounter it early
>    on in the text, with an in-line discussion of the purposes of the messages
>    and fields.
>
>
>
> Ok, seems like the second directorate reviewers prefers the appendices to
> be pulled into the document. Let me do that thenl
>
>
>
> [JEH] My apologies – it is unfortunate when two different reviewers give
> contradictory opinions!  You should of course weigh my opinion with
> everyone else’s.
>

no, actually you're the second IAB/DIR reviewer who said that. However,
earlier on I had gripes that the spec is "too dry" and it's hard to know
"why" things are done so I try to strike a balance between a descriptive
and prescriptive piece. Since it's always a trade-off not everyone will be
ever fully happy with the balance and I can live with that ;-)


>
>
>
>
>
>
> The other issue is that, because the document is large and I found it
> rather hard going, I did not have time do a thorough review beyond section
> 5.3.  I’d therefore have to recommend another directorate review once we
> have concluded on the issues I’m raising below.
>
>
>
> ok, obviously as much is written as we expect is necessary to "clearly"
> spec out the protocol. The document is more than simply a dry prescriptive
> normative though since very early in the workgroup sessions the input of
> many people was that they would prefer is some more "narrative" explanation
> of "what" and "why" is inserted instead of purely the algorithms. We tried
> to find a balance but obviously opinions will always vary between "this is
> too chatty and should be just a dry normative" and "this does not explain
> WHY that would work and WHY it has been designed that way".. Based on
> Robert Sparks review I will try to simplify the language and cut out some
> superfluous text he pointed out or I find. We'll see where we end.
>
>
>
> [JEH] Thanks.  As it happens I prefer documents to have informative
> passages to help me understand the normative ones, provided they give me
> enough context to understand them and they are sufficiently relevant.  My
> comments were targeted to help improve the context & relevance.  I suggest
> a subsequent RtgDir review only because I was not able to apply as much
> diligence to the later sections of the document as I would have liked.  I
> will leave it to the WG if they want to action this.
>
>
>
...

ack

Jonathan, well, section 5.1.3 "fallen leaf" (4 now given requirements is
> removed) _is_ the overview section. Southern reflection is defined in the
> glossary already and the "negative disaggregation" is a mechanism
> introduced to address the "fallen leaf problem" later and obviously the
> problem itself has to be explained & introduced first. Negative
> disaggregation is arguably (beside flooding scopes) the most complex part
> of the spec and we spent lots of time and effort (especially Pascal) with
> multiple rewrites to give the narrative describing the CLOS inherent
> problem. Moreover we didn't want to mix it up with RIFT specific mechanism
> since the "fallen leaf": problem exists in multi-plane CLOS independent of
> any protocol and BTW, I never saw it explained as clearly as Pascal did in
> the multi-plane introduction section. Also, we clearly state in the section
> that if someone builds a single plane CLOS the section can be disregarded
> to simplify the reading of the spec for many people.
>
>
>
> [JEH] Thanks. Firstly, Pascal is to be congratulated on the text
> describing multi-plane topologies. I had no problem getting to grips with
> them with the help of his text and some Lego models that it inspired me to
> build :-)  I have re-read these sections just now and I do now find them
> easier to follow – having already read the relevant parts of the later
> spec.  On the first read-through I think I was troubled by too many
> questions: What do they mean by “positive” and “negative” in the context of
> disaggregation?  What do they mean by “transitive”?  I have been told what
> southern reflection is, but what relevant information does it provide and
> how is it useful?  In hindsight these were all guessable but I found these
> concepts a barrier to my understanding.  If you have the stomach for
> another iteration of these sections, I would request some additional
> explanation to be included.
>
>
>

Lego blocks is an ingenious idea here. You inspire me to go steal some from
my kids now ;-)

OK, yes, so I will put explanations of what negative and positive is and
transitive in the running descriptive text if that simplifies the "sliding
into the narrative". That's a good input. Obviusly at the end all the
authors/contributors were cmpletely blind to whether it's comprehensible
anymore sicne we spent I think about 3 versions of text and endless
interims chewing on how to describe all that "fallen leaf" stuff,
especially wiht 2-D ASCII art. Most credit goes to Pascal really I think
without diminishing others that were helping with it.


>
>
>
>
> Section 5.2.2
>
>
>
>    A node configured with "undefined" PoD membership MUST, after
>
>    building first northbound three way adjacencies to a node being in a
>
>    defined PoD, advertise that PoD as part of its LIEs.  In case that
>
>    adjacency is lost, from all available northbound three way
>
>    adjacencies the node with the highest System ID and defined PoD is
>
>    chosen.
>
>
>
> It seems odd that the choice of advertised pod is at first
> non-deterministic (race to the first adjacency) and then, only if this
> initial adjacency is lost, the choice of pod becomes deterministic. Why not
> make it deterministic the whole time?
>
>
>
> The first adjacency is simply used to speed up things since otherwise how
> long do you wait until you have all northbound adjacencies?  Observe that
> level ZTP will possibly drop adjacencies while it's converging so the
> consequent set will refine the PoD as well, i.e. the ZTP is guaranteed to
> get the node to the maximum available level @ which point in time the
> northbound available adjacencies will determine the PoD. Obviouly the
> adjacencies can disagree about the PoD and such a scenario can be used by
> an implementation to report miscablings. We talk quickly about miscabling
> detection in the spec since it's such a desirable property _of an
> implementaiton_ but it's not necessary for correct protocol operation so we
> don't make anything normative except disallowing adjacency forming across
> PoDs if defined. Since configurting and converging PoDs is optional we
> allow even to disregard this rule on adjacency formation.
>
>
>
> [JEH] Thanks – makes sense. I had missed that ZTP can drop adjacencies
> when I wrote this comment.
>

ack


>
>
>
>
> Section 5.2.3.2
>
>
>
> In the example TIEs, "Spine21" should be "ToF 21" to agree with the
> nomenclature of figure 2.  Ditto in table 4 (section 5.2.3.4)
>
> In Spine 111's Node-S-TIE, I am not sure that the links(...) should be
> given for each neighbor.
>
>
>
> corrected the ToF 21/22 everywhere.  Yes, on careful reading one wonders
> WHY node south tie should include _all_ links. This is necessary for both
> flood reduction as well as bandwidth balancing since both happen from south
> going up and the node computing needs the northbound neighbors of the level
> up. That's one of the reasons the example is givne. I'll add a clarifying
> sentence.
>
>
>
> [JEH] Thanks. Does that mean the links(…) should be added to Spine121’s
> Node S-TIE in the same example?
>

yes, @ certain point I think it starts to do elipsis to avoid endless
repetitions. I'll comb over it quickly again.


> Section 5.2.4.1
>
> Please define the terms "south prefix" and "north prefix"
>
> "Supersuming" is not a word I recognise.  Use "or a non-default prefix
> which contains this south prefix"
>
> "the node does not..." -> "the computing node does not..."
>
>
>
> Section 5.2.4.2
>
> "S-SPF uses northbound adjacencies in node N-TIEs to verify backlink
> connectivity" - this statement needs to be recast into normative language
> using RFC 2119 terms.  "A node MUST verify backlink connectivity ... Else
> it MUST NOT include the link.... Etc."
>
> Same comment applies in many places throughout the document.
>
>
>
> re-read and applied more normative language to the specific section as
> indicated above.  Re-read the document and normalized more languagte where
> necessary.
>
>
>
>
>
> Section 5.2.4.3
>
> What is a `"ring protection" scheme`?
>
>
>
> Ring based protection scheme just like BLSR. I replace with "ring-based
> protection" which is fairly well understood term in networking.
>
>
>
> Removed the ring based protection of a level to applicability draft which
> multiple authors work on and where it seems to belong rather than in the
> spec. Left only clarification
>
>
>
> <*t*>Using south prefixes over horizontal links MAY occur
>  if the N-SPF is East-West adjacencies in computation.
>     It can
>     protect against pathological fabric partitioning cases that
>     leave only paths to destinations that would necessitate multiple
>     changes of forwarding direction between north and south.
>     </*t*>
>
>
>
> [JEH] Suggest you change “if the N-SPF is East-West adjacencies” to “if
> the N-SPF includes East-West adjacencies”
>

ack, typo. thanks


>
>
> "An implementation could try ... but the details are outside this
> specification" - so why mention it?
>
>
>
> Because of the fact that the question was coming up multiple times in
> meetings/mails and so on. Instead of negative disaggregation people were
> tempted to "forward through the horizontal links on top" when a fallen leaf
> starts forwarding in the wrong plane (i.e. the one where it's fallen). This
> section points out that this should not be attempted due to looping
> problems, i.e. a ToF node that has no reachability to an anycast address
> (since a fallen leaf forwarded to an anycast destination that is also
> fallen) could try to use horizontal links to forward traffic but it may
> have multiple planes that can reach the destination. Obviously when it
> forwards e.g. left on the ring & the traffic arrives on the ToF that seems
> to be able to reach that anycast the ToF may choose to forward it back on
> the ring to "another ToF" that can reach the anycast. Observer that RIFT is
> loop-free i.e. one can forward on any path as long it reaches the
> destination but since horizontal is considered equivalent to northbound
> forwarding and metric can be disregarded (RIFT is not bound by shortest
> path) the traffic may just end up looping in the ring. This is hard to
> describe and would to lots figures hence the spec simply says "don't do it"
> and if one is tempted to one will find out why it's a bad idea when one
> implemented this. And then the said implementer will probably try to fix it
> by the "shortest path" computation @ ToF level which is next layer of the
> onion the document mentions and then explains again that this may work but
> he stop going out there with this spec.
>
>
>
> The "ring" between planes necessary is visualized in figure 13 and
> described in section
>
>
>
> 4.2.5.2.1.  Cabling of Multiple Top-of-Fabric Planes
>
>
>
> again in an example. I don't think that needs further clarification.
>
>
>
> [JEH] Understood. I would suggest moving “An implementation could…” to a
> footnote – if only one could have footnotes in an RFC.
>

yeah, I won't try footnote adventures ;-) I'm trying to migrate to v3 to
include SVG already and feeling like a serious guinea pig right now ;-)


>
>
>
>
>
>
> How can we guarantee that a same-level node does not have a next hop to a
> given prefix that is unknown to the node doing the computation?  If X
> reaches P via N1 and N2, Y (at the same level as X) can reach P via N3 but
> X does not know this and assumes Y cannot reach P because Y is not adjacent
> to N1 and N2, then X unnecessarily disaggregates P positively.  For
> instance if X's link to N3 has failed and Y's links to N1 and N2 have
> failed.
>
>
>
> that cannot be guaranteed. If X can reach prefix via N1 which Y doesn't
> have and Y via N3 that X doesn't have but they only see via a nexthops N0
> (though which the prefix cannot be reached) then both will disaggregate
> since anything else would be assuming necessity of "harmonica routing"
> which RIFT doesn't do since harmonica is opposite to valley free routing
> which RIFT does to guarantee loop free behavior.  That is actually a good
> example why RIFT positive disaggregation guarantees sufficient
> disaggregation to prevent blackholes, loops and bow-ties but possibly more
> than necessary (which is never claimed in the document).
>
>
>
> [JEH] Understood. So there may be redundant disaggregation but it keeps
> the forwarding plane valley free.  I think that’s OK.
>

yes, I was thinking ages ago through implications of making RIFT
"sufficient _and_ necessary only" and you end up with global vs. local
optimality problem which forces you pretty much to have all the info
everywhere to find a global optimum (well, topology info, you could skip
prefixes I think). Or harmonica routing which leads straight back to
traditional link-state with all its problems in the fabric. And all that
would lead to very complex flooding scopes (the ones we have are complex
enough and that's why they are so precisely described in the table &
flooding rules) and so on, IMO it's 80-20 rule here, "sufficient" must be
guaranteed (otherwise you loop or if lucky, bowtie), "necessary only" is a
luxury we should not chase ;-)


>
>
> This constant is provided in appendix D.1
>
>
>
> I'm working on the other directorate reviews and will try to cut a new
> version with all those changes before deadline
>
>
>
> [JEH] Thanks again for considering all my comments.
>
>
>

well, thanks for your time. I know the draft takes time to comprehend given
how far it sometimes stretches IP routing envelope compared to traditional
stuff. Glad you saw it as a puzzle ;-)

And now, for those lego blocks ;-)

--- tony