Re: [Rift] Routing directorate early review of draft-ietf-rift-rift

Tony Przygienda <tonysietf@gmail.com> Mon, 04 November 2019 01:01 UTC

MIME-Version: 1.0
References: <BL0PR02MB48689FA2D6B7C255DF11045D84630@BL0PR02MB4868.namprd02.prod.outlook.com>
In-Reply-To: <BL0PR02MB48689FA2D6B7C255DF11045D84630@BL0PR02MB4868.namprd02.prod.outlook.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Sun, 03 Nov 2019 17:00:43 -0800
Message-ID: <CA+wi2hO=rZ2mbX3ZJVgn9cSvfbot29W+MNnunysPhPv+3Mxykw@mail.gmail.com>
To: Jonathan Hardwick <Jonathan.Hardwick=40metaswitch.com@dmarc.ietf.org>
Cc: "rift-wg-chairs@ietf.org" <rift-wg-chairs@ietf.org>, "draft-ietf-rift-rift.all@ietf.org" <draft-ietf-rift-rift.all@ietf.org>, "rtg-dir@ietf.org" <rtg-dir@ietf.org>, Luc André Burdet <laburdet.ietf@gmail.com>, Min Ye <amy.yemin@huawei.com>, "rift@ietf.org" <rift@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000083468b05967adb47"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/ASB89Up8rCaBXgccUnr815NeJhw>
Subject: Re: [Rift] Routing directorate early review of draft-ietf-rift-rift
Precedence: list

Jonathan, thanks for your review, responses inline

On Thu, Oct 31, 2019 at 11:01 AM Jonathan Hardwick <Jonathan.Hardwick=
40metaswitch.com@dmarc.ietf.org> wrote:

> Hello
>
>
>
> I have been selected to do a Routing Directorate “early review” of this
> draft:
>
> https://datatracker.ietf.org/doc/draft-ietf-rift-rift/
>
>
>
> The routing directorate will, on request from the working group chair,
> perform an “early” review of a draft before it is submitted for publication
> to the IESG. The early review can be performed at any time during the
> draft’s lifetime as a working group document. The purpose of the early
> review depends on the stage that the document has reached.  As this
> document has advanced to working group last call, my focus for the review
> was to determine whether the document is ready to be published. Please
> consider my comments along with the other working group last call comments.
>
>
>
> For more information about the Routing Directorate, please see
> http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir
>
>
>
> Document: draft-ietf-rift-rift
>
> Reviewer: Jon Hardwick
>
> Review Date: 31 Oct 2019
>
> Intended Status: Standards Track
>
>
>
> *Summary*
>
> Thanks for writing this document.  It is a very interesting approach and I
> really enjoyed getting to grips with the ideas presented in the draft!
>

thanks, quite a lot of work

> Unfortunately, I have some concerns about the document and think it needs
> more work before being submitted to the IESG.  The problem is that I found
> the document hard to read, for several reasons.
>
>    - It is very light in its use of normative RFC-2119 style language.
>    An implementer would have to fill in quite a few gaps and/or make
>    assumptions about various passages.
>
> I will address in specifics the sections you raised inline.

Otherwise, in meta terms, as to the question of "is this specification
being precise enough?" I can quote only what I wrote to Robert Sparks
already:

*"we have two interoperable implementations since a bit, one completely
open source which has been produced based on the spec. It was in fact open
source work that helped to refine the document content to make sure we can
have an implementation produced based on the text without further "guessing
things". As example the LIE FSM has been implemented initially in open
source without consulting authors of the spec and interoperat'ed without a
single defect (but discovered a protocol underspecification in case of
misconfiguration that was subsequently added). Please refer to IETF
proceedings for the according presentations if necessary. We have a third
implemenation progressing now where all questions the implementor asked so
far could be answered by pointing directly @  the specification as written.
This seems to answer to me the "suspicion of specification maybe not being
good enough to implement" as an objective measuring stick as far I can
imagine one. *
*"*

We are in IETF here where "rough consenus and running code" was the receipe
of success vs. much heavier handed organizations like OSI and I think in
this philosophy the spec, if anything, is possibly overspecified already
;-) The core pieces that bare no slips like flooding and adjacency
formation are very precisely written including FSMs.

>
>    -
>    - The definition of the protocol and some of the normative behaviour
>    is deferred to the appendices, whereas I would expect to encounter it early
>    on in the text, with an in-line discussion of the purposes of the messages
>    and fields.
>
>
Ok, seems like the second directorate reviewers prefers the appendices to
be pulled into the document. Let me do that thenl

>
>    -
>    - It sometimes refers to concepts or terms that are either not defined
>    or have not yet been introduced to the reader, suggesting an ordering issue
>    within the text.
>
>
>
> I think that the document needs to be refactored somewhat to solve the
> ordering issues, use more normative language, eliminate any text that is
> not actually relevant to the implementation and deployment of the protocol,
> and pull together the normative definition of the protocol into a
> contiguous block early on in the document.
>

further inline

>
>
> The other issue is that, because the document is large and I found it
> rather hard going, I did not have time do a thorough review beyond section
> 5.3.  I’d therefore have to recommend another directorate review once we
> have concluded on the issues I’m raising below.
>

ok, obviously as much is written as we expect is necessary to "clearly"
spec out the protocol. The document is more than simply a dry prescriptive
normative though since very early in the workgroup sessions the input of
many people was that they would prefer is some more "narrative" explanation
of "what" and "why" is inserted instead of purely the algorithms. We tried
to find a balance but obviously opinions will always vary between "this is
too chatty and should be just a dry normative" and "this does not explain
WHY that would work and WHY it has been designed that way". Based on Robert
Sparks review I will try to simplify the language and cut out some
superfluous text he pointed out or I find. We'll see where we end.

>
>
> *Details*
>
> Here are comments on the sections that I was able to review in detail
> before I ran out of time.
>
>
>
> Abstract
>
> Is it possible to reformat this as a list of items on multiple lines? It
> would read more clearly.
>

yes

>
>
> Section 2
>
> "an optimal approach does not seem however": this appears to be a value
> judgment rather than consensus opinion, appearing as it does without
> citation, and may be perceived as treading on the toes of other
> standardization efforts currently in progress at the IETF. I suggest you
> simply state the facts: "RIFT approaches this problem using a mixture of..."
>

done, sure.

>
>
> Section 2.1
>
> The form of words in the Requirements Language boilerplate has changed
> recently - see RFC 8174.
>

thanks, corrected

>
>
> Section 3.1
>
> ZTP - expand acronym on first use.
>

yes, ZTP added. glossary will be rea-arranged based on other reviewers
input.

> There is potential for confusion between N-TIE and Node TIE! I'd prefer
> "North TIE" for the former.
>
> An example of confusion: is the "South Node TIE" referred to in the
> definition of "South Reflection" the same as the S-TIE referred to in the
> definition of "TIE"?
>
> "The document sometimes calls them flood leaders as well." But it would be
> better if you just used one term.
>

OK, I expand N- and S- to North- and South- everywhere in the document

>
>
> Section 4
>
> Personally I could live without this section
>
> Merge PEND1 with NONREQx (or explain the distinction)
>

thanks, there were multiple discussions pro/cons on mike/list about this
section and suggestions along the lines to split it out into a different
document (but standardizing requirement drafts went out of fashion recently
;-) or drop it. I'm dropping it based on your input and others desiring to
shorten the document.

>
>
> Section 5.1.3 - 5.1.5
>
> This discussion is not possible to follow properly until you have been
> introduced to positive & negative disaggregation and southern reflection.
> As such I wonder if it really belongs in a section called "overview".
>

Jonathan, well, section 5.1.3 "fallen leaf" (4 now given requirements is
removed) _is_ the overview section. Southern reflection is defined in the
glossary already and the "negative disaggregation" is a mechanism
introduced to address the "fallen leaf problem" later and obviously the
problem itself has to be explained & introduced first. Negative
disaggregation is arguably (beside flooding scopes) the most complex part
of the spec and we spent lots of time and effort (especially Pascal) with
multiple rewrites to give the narrative describing the CLOS inherent
problem. Moreover we didn't want to mix it up with RIFT specific mechanism
since the "fallen leaf": problem exists in multi-plane CLOS independent of
any protocol and BTW, I never saw it explained as clearly as Pascal did in
the multi-plane introduction section. Also, we clearly state in the section
that if someone builds a single plane CLOS the section can be disregarded
to simplify the reading of the spec for many people.

>
>
> Section 5.2.2
>
>
>
>    A node configured with "undefined" PoD membership MUST, after
>
>    building first northbound three way adjacencies to a node being in a
>
>    defined PoD, advertise that PoD as part of its LIEs.  In case that
>
>    adjacency is lost, from all available northbound three way
>
>    adjacencies the node with the highest System ID and defined PoD is
>
>    chosen.
>
>
>
> It seems odd that the choice of advertised pod is at first
> non-deterministic (race to the first adjacency) and then, only if this
> initial adjacency is lost, the choice of pod becomes deterministic. Why not
> make it deterministic the whole time?
>

The first adjacency is simply used to speed up things since otherwise how
long do you wait until you have all northbound adjacencies?  Observe that
level ZTP will possibly drop adjacencies while it's converging so the
consequent set will refine the PoD as well, i.e. the ZTP is guaranteed to
get the node to the maximum available level @ which point in time the
northbound available adjacencies will determine the PoD. Obviouly the
adjacencies can disagree about the PoD and such a scenario can be used by
an implementation to report miscablings. We talk quickly about miscabling
detection in the spec since it's such a desirable property _of an
implementaiton_ but it's not necessary for correct protocol operation so we
don't make anything normative except disallowing adjacency forming across
PoDs if defined. Since configurting and converging PoDs is optional we
allow even to disregard this rule on adjacency formation.

>
>
> Section 5.2.3.2
>
>
>
> In the example TIEs, "Spine21" should be "ToF 21" to agree with the
> nomenclature of figure 2.  Ditto in table 4 (section 5.2.3.4)
>
> In Spine 111's Node-S-TIE, I am not sure that the links(...) should be
> given for each neighbor.
>

corrected the ToF 21/22 everywhere.  Yes, on careful reading one wonders
WHY node south tie should include _all_ links. This is necessary for both
flood reduction as well as bandwidth balancing since both happen from south
going up and the node computing needs the northbound neighbors of the level
up. That's one of the reasons the example is givne. I'll add a clarifying
sentence.

>
>
> Section 5.2.3.5
>
> "It should only set it in the southbound direction."  - SHOULD?
>

corrected

>
> Section 5.2.3.8
>
> Define N-SPF on first use
>

OK, N-SPF and S-SPF added to glossary.

>
>
> Section 5.2.4
>
> "A node has three sources" - I see only two listed.
>
> "We use simple, familiar SPF algorithms here..." - is the use of those
> algorithms supposed to be normative? Or are you just giving an example and
> leaving me to choose my own algorithm?  If SPF is normative then you need
> to specify it using normative language or include a normative reference to
> it.
>

I tried to clarify that better in the existing text by expanding to

<t>A node has three possible sources of relevant information for
reachability computation.
    A node knows
    the full topology south of it from the received North Node TIEs or
alternately
    north of it from the South Node TIEs.  A node has the
    set of prefixes with their associated distances and bandwidths from
    corresponding prefix TIEs.</t>

<t>To compute prefix reachability, a node runs conceptually a northbound
    and a southbound
    SPF.
    We call that N-SPF and S-SPF denoting the direction in which the computation
    front is progressing.
</t>

<t>Since neither computation can "loop", it is
    possible to compute non-equal-cost or even
    <xref target="EPPSTEIN">k-shortest paths</xref>
    and "saturate" the fabric
    to the extent desired but we use simple, familiar SPF algorithms and
    concepts here as example due to their prevalence in today's routing.
</t>

So the algorithms given are NOT normative but I improved what _is_
normative in the N-SPF and S-SPF section

<section anchor="nspf" title="Northbound SPF">

    <t> N-SPF *MUST use ONLY* northbound and East-West adjacencies in
the computing
        node's node North TIEs (since if the node is a leaf it may not have
        generated a node South TIE)
        when starting SPF. ...

<t>Once progressing, we are using the next higher level's node South TIEs to
    find according adjacencies to verify backlink connectivity.
    Just as in case of IS-IS or OSPF, two unidirectional links* MUST* be
    associated
    together to confirm bidirectional connectivity. ...

<section anchor="sspf" title="Southbound SPF">

    <t> *S-SPF MUST use ONLY* the
        southbound adjacencies in the node South TIEs,
        i.e. progresses towards nodes at lower levels. Observe that
        E-W adjacencies are NEVER used in the computation. This enforces the
        requirement that a packet traversing in a southbound direction must
        never change its direction.</t>
    <t>*S-SPF MUST* use northbound adjacencies in node North TIEs to
verify backlink
        connectivity by checking for presence of the link beside
correct SystemID and
        level. </t>

This is about all that needs to be said here in terms of normative language
beside the one already present.

> Section 5.2.4.1
>
> Please define the terms "south prefix" and "north prefix"
>
> "Supersuming" is not a word I recognise.  Use "or a non-default prefix
> which contains this south prefix"
>
> "the node does not..." -> "the computing node does not..."
>
>
>
> Section 5.2.4.2
>
> "S-SPF uses northbound adjacencies in node N-TIEs to verify backlink
> connectivity" - this statement needs to be recast into normative language
> using RFC 2119 terms.  "A node MUST verify backlink connectivity ... Else
> it MUST NOT include the link.... Etc."
>
> Same comment applies in many places throughout the document.
>

re-read and applied more normative language to the specific section as
indicated above.  Re-read the document and normalized more languagte where
necessary.

>
>
> Section 5.2.4.3
>
> What is a `"ring protection" scheme`?
>

Ring based protection scheme just like BLSR. I replace with "ring-based
protection" which is fairly well understood term in networking.

Removed the ring based protection of a level to applicability draft which
multiple authors work on and where it seems to belong rather than in the
spec. Left only clarification

<t>Using south prefixes over horizontal links MAY occur
 if the N-SPF is East-West adjacencies in computation.
    It can
    protect against pathological fabric partitioning cases that
    leave only paths to destinations that would necessitate multiple
    changes of forwarding direction between north and south.
    </t>

> Are E-W links permitted between planes?
>
> Not sure what this is telling me: "Using south prefixes over horizontal
> links is optional..." - is that OPTIONAL as in RFC 2119?  Do you mean that
> my implementation can ignore them? Or not advertise them? Or that the
> network operator does not have to cable them?
>

Clarified as per section above. If the N-SPF is using horizontal
adjacencies it will pick up those prefixes.

>
>
> Section 5.2.4.4
>
> "Even though a ToF node could
>
>    be tempted to use those links during southbound SPF this MUST NOT be
>
>    attempted since it may lead in, e.g. anycast cases to routing loops."
>
>
>
> This is too verbose and obtuse.  I cannot see how anycast cases lead to
> routing loops and I don't know if I need to understand why or not.  Suggest
>

>
> "A ToF node MUST NOT include east-west links in its south-SPF calculation."
>

This is already said in the S-SPF section very explicitly as

<t> S-SPF

*MUST use ONLY the    southbound adjacencies in the node South TIEs,
 i.e. progresses towards nodes at lower levels. Observe that    E-W
adjacencies are NEVER used in the computation. This enforces the
requirement that a packet traversing in a southbound direction must
never change its direction.*</t

>
>
> This section gives the impression that E-W links at the ToF will never be
> used for forwarding data - is that true?  They are used for control plane
> only?
>

Yes, it is described in text but I clarified the section on horizontal
links in ToF further

<t>E-W ToF links behave in terms of flooding scopes defined in
    <xref target="tiescopes"/> like northbound links and
*MUST be used for control plane    information flooding ONLY*. Even
though a ToF node could be tempted
    to use those links during southbound SPF and carry traffic over them this
    MUST NOT be attempted since it may lead in, e.g. anycast cases to
routing loops.
    An implemention MAY try to resolve the looping problem by
following on the ring strictly
    tie-broken
    shortest-paths only but the details are outside this
specification. And even then,
    the problem of proper capacity provisioning of such links when
they become traffic-bearing in
    case of failures is vexing.</t>

>
>
> "An implementation could try ... but the details are outside this
> specification" - so why mention it?
>

Because of the fact that the question was coming up multiple times in
meetings/mails and so on. Instead of negative disaggregation people were
tempted to "forward through the horizontal links on top" when a fallen leaf
starts forwarding in the wrong plane (i.e. the one where it's fallen). This
section points out that this should not be attempted due to looping
problems, i.e. a ToF node that has no reachability to an anycast address
(since a fallen leaf forwarded to an anycast destination that is also
fallen) could try to use horizontal links to forward traffic but it may
have multiple planes that can reach the destination. Obviously when it
forwards e.g. left on the ring & the traffic arrives on the ToF that seems
to be able to reach that anycast the ToF may choose to forward it back on
the ring to "another ToF" that can reach the anycast. Observer that RIFT is
loop-free i.e. one can forward on any path as long it reaches the
destination but since horizontal is considered equivalent to northbound
forwarding and metric can be disregarded (RIFT is not bound by shortest
path) the traffic may just end up looping in the ring. This is hard to
describe and would to lots figures hence the spec simply says "don't do it"
and if one is tempted to one will find out why it's a bad idea when one
implemented this. And then the said implementer will probably try to fix it
by the "shortest path" computation @ ToF level which is next layer of the
onion the document mentions and then explains again that this may work but
he stop going out there with this spec.

The "ring" between planes necessary is visualized in figure 13 and
described in section

4.2.5.2.1.  Cabling of Multiple Top-of-Fabric Planes

again in an example. I don't think that needs further clarification.

>
>
> Section 5.2.5.1
>
> "A DAG computation" - expand DAG.
>

already expanded in entrance to terminology section but added a more
specific definition

>
>
> "Neither
>
>        is it necessary for the receiving node to reflect the
>
>        disaggregated prefixes back over its adjacencies to nodes at the
>
>        level from which it was received."
>
>
>
> Please restate this using RFC 2119 language.
>

done. It's actually not necessary for this language here to be normative
since the normative part is Table 3 and when it is implemented all the
algorithm behavior and resulting flooding follows straight out of that. I
emphasized that the flooding scopes table is normative.

>
>
> How can we guarantee that a same-level node does not have a next hop to a
> given prefix that is unknown to the node doing the computation?  If X
> reaches P via N1 and N2, Y (at the same level as X) can reach P via N3 but
> X does not know this and assumes Y cannot reach P because Y is not adjacent
> to N1 and N2, then X unnecessarily disaggregates P positively.  For
> instance if X's link to N3 has failed and Y's links to N1 and N2 have
> failed.
>

that cannot be guaranteed. If X can reach prefix via N1 which Y doesn't
have and Y via N3 that X doesn't have but they only see via a nexthops N0
(though which the prefix cannot be reached) then both will disaggregate
since anything else would be assuming necessity of "harmonica routing"
which RIFT doesn't do since harmonica is opposite to valley free routing
which RIFT does to guarantee loop free behavior.  That is actually a good
example why RIFT positive disaggregation guarantees sufficient
disaggregation to prevent blackholes, loops and bow-ties but possibly more
than necessary (which is never claimed in the document).

>
>
> "Each entry is a list of south neighbor of X and a list of nodes
>
>        of X.level that can't reach that neighbor"
>
>
>
> Think this should say
>
>
>
> "Each entry in the set is a south neighbor of X and a list of nodes
>
>        of X.level that can't reach that neighbor"
>

yes, thanks.

>
>
> "X does not to disaggregate any prefixes" -> ""X does not disaggregate any
> prefixes.""
>

yes

>
>
> "The PoD containing the prefix will prefer southbound anyway." - I didn't
> understand the point. Is it necessary for me to understand it? Please
> expand or delete the sentence if it's not necessary.
>

clarified:

<t>all the lower level nodes are flooded the same disaggregated
    prefixes since we don't want to build an South TIE per node and
    complicate things unnecessarily. The lower level node
    that can compute a southbound route to the prefix
    will prefer it to the disaggregated route anyway based on
    route preference rules.</t>

>
>
> Section 5.2.6
>
> "such as mobility per section 5.3.3 necessary" - delete "necessary".
>

yes

> "ties are broken based upon type first and then distance and further
> attributes" - I don't see mention of further attributes in the proposed
> algorithm.
>

corrected to

PrefixAttributes

which are contained in the schema. Mobility tie-breaking is described in
its own section.

The document does not standardize further tie-breaking since .e.g.
tie-breaking on tags is possible but can be completely implementation
dependent given RIFT is loop-free. Neither do I think any kind of
"standardizable agreement" could be possible here.

>
> "The nexthop
>
>    adjacencies for a negative prefix are inherited from the longest
>
>    prefix that aggregates it" - suggest changing to "longest positive
> prefix"
>

ok

>
>
> "all entries of the father" -> "all entries of the parent"
>

ok

> Section 5.2.7.3
>
> "we have to decide whether node Y is at the same level as I, J or at
>
>    the same level as Y and consequently, X is south of it."
>
>
>
> I could not parse this.  I think you might mean this:
>
>
>
> "we have to decide whether node Y is at the same level as I, J
>
>   (and consequently X is south of it) or at the same level as X."
>

yes, correct, somewhat it got garbled, corrected to

<t>First, we must anchor the "top" of the cabling and that's what
    the TOP_OF_FABRIC flag at node A is for. Then things look smooth until
    we have to decide whether node Y is at the same level as I, J
    (and as consequence, X is south of it) or at
    the same level as X. This is
    unresolvable here until we
    "nail down the bottom" of the topology. To achieve that we choose to
    use in this
    example the leaf flags in X and Y. In case where Y would not have a leaf
    flag it will try to elect highest level offered and end up being
    in same level as I and J.
    </t>

>
>
> Section 5.2.7.4
>
> How does a ToF node know what value to advertise in its LEVEL_VALUE?
>

This constant is provided in appendix D.1

I'm working on the other directorate reviews and will try to cut a new
version with all those changes before deadline

[Rift] Routing directorate early review of draft-… Jonathan Hardwick
Re: [Rift] Routing directorate early review of dr… Tony Przygienda
Re: [Rift] Routing directorate early review of dr… Jonathan Hardwick
Re: [Rift] Routing directorate early review of dr… Tony Przygienda