[Rift] Fwd: Re: AD Review of draft-ietf-rift-rift-12 (Part 2a)

Alvaro Retana <aretana.ietf@gmail.com> Thu, 01 April 2021 20:08 UTC

From: Alvaro Retana <aretana.ietf@gmail.com>
MIME-Version: 1.0
Date: Thu, 01 Apr 2021 13:07:47 -0700
Message-ID: <CAMMESsxe1TGf0aT=DzLSo1W7LYfN97u4X08XvY9dY8KZ6wZkkQ@mail.gmail.com>
To: "rift@ietf.org" <rift@ietf.org>
Cc: "draft-ietf-rift-rift@ietf.org" <draft-ietf-rift-rift@ietf.org>, "rift-chairs@ietf.org" <rift-chairs@ietf.org>, "Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net>
Content-Type: multipart/alternative; boundary="00000000000022c38e05beeec9aa"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/3AKRngd9vKl0Bnia1cWHV_MPFrc>
Subject: [Rift] Fwd: Re: AD Review of draft-ietf-rift-rift-12 (Part 2a)
Precedence: list

[Forwarding for the archive.]

On March 7, 2021 at 3:56:09 PM, Tony Przygienda (tonysietf@gmail.com) wrote:

Alvaro, many thanks for the read

inline


On Fri, Mar 5, 2021 at 4:36 PM Alvaro Retana <aretana.ietf@gmail.com> wrote:

>
> Dear authors:
>
> This is Part 2a of my review of this document.  I wanted this second
> installment to be longer, but given the IETF meeting next week, I am
> sending this fragment out now.
>
> Part 2a starts with §4.2 (Specification) and goes just through §4.2.2 -- I
> just started reading the LIE FSM/§4.2.2.1.  I included a couple of comments
> on the schema itself.
>
>
> While reading this part of the document I've been learning thrift and
> applying my programming and modeling experience (*).  In general, the model
> seems straight forward (so far).  It is important to highlight somewhere at
> the start of §4.2 that Appendix B is Normative.  The existing references
> don't explicitly say so.
>

ack


>
> In this first part of the specification there are a number of assumptions
> made that should be explained explicitly and not by reference.  For
> example, it seems to be assumed that the reader knows what a three-way
> handshake is -- while most readers of this specification will be familiar
> with other routing protocols, this document is *the* RIFT spec and should
> not rely on other protocol specifications, much less when the specification
> is not (can't!) be the same -- this is what §4.2.2 says:
>
>    It uses a three-way handshake mechanism which is a cleaned up version of
>    [RFC5303].  Observe that for easier comprehension the terminology of
>    one/two and three-way states does NOT align with OSPF or ISIS FSMs
> albeit
>    they use roughly same mechanisms.
>
> IOW, "similar to IS-IS, but not the same."   There are more specific
> comments inline.
>

there was lots of re-factoring here already since no'one was happy, I will
cut out any comparision to ISIS/OSPF out.


>
>
> "Traditional" specifications usually present packet formats and describe
> the fields before explaining their use.  §4.2.2 jumped right into talking
> about a specific flag (v4_forwarding_capable) without introducing it or the
> place where it is carried (LIEPacket).  It is important to describe the
> different structures, and provide a way to make them simpler to find in
> Appendix B.
>

aha. ok, I will basically start adding references to the model & according
structure before refering to fields.


>
> In my comments I'm asking for a lot of pointers to things that have not
> been discussed.  I don't know (at this point) whether the overall
> organization of the document makes sense as is or if it might be worth
> rethinking (i.e. shuffle things around).  Food for thought.
>

well, throwing model upfront won't help, it's like putting packet formats
in front of a spec. Model cannot be chopped into pieces, that will be
confusing IMO so the forward references to appendix and good glossary in
front of every section is best I can imagine.


>
>
> These comments/questions are for the Chairs/Shepherd:
>
> - It looks like an early TSV review was requested, but I didn't see the
> review come in.  Did I miss it?
>

never seen it either


>
> - §8.1 includes a set of suggested UDP ports, but they are not reflected
> in the registry.  Early allocation has not been requested -- you should
> consider doing so.  See rfc7120.
>

ack


>
>
> Thanks!
>
> Alvaro.
>
> (*) I don't have any. ;-)
>
>
> [Line numbers from idnits.]
>
> ...
> 1341 4.2.  Specification
> ...
> 1350   "On Entry" actions on FSM state are performed every time and right
> 1351   before the according state is entered, i.e. after any transitions
> 1352   from previous state.
>
> [nit] Suggestion>
>    "On Entry" actions at an FSM state are performed right before the
>    corresponding state is entered, i.e. after any transitions from a
>    previous state.
>
>
> 1354   "On Exit" actions are performed every time and immediately when a
> 1355   state is exited, i.e. before any transitions towards target state
> are
> 1356   performed.
>
> [nit] Suggestion>
>    "On Exit" actions are performed immediately when a state is exited,
>    i.e. before a transition towards a target state is performed.
>
>
> 1358   Any attempt to transition from a state towards another on reception
> 1359   of an event where no action is specified MUST be considered an
> 1360   unrecoverable error.
>
> [major] Any type of action, or are you referring to an "on exit" one?
>
>
> [major] "MUST be considered an unrecoverable error"  What is the result of
> that?  Should the adjacencies be reset, the rift process restarted, the
> interfaces shut down, etc.??   There's no other place in the document that
> mentions "unrecoverable".
>

implementation dependent. It's pretty much the same as the protocol
throwing a "core" or memory corruption. From that point behavior is
unspecified but we want it to be considered something that should not
attempt a recovery. In most stark terms don't even start RIFT again but we
can't mandate any such thing. It's implementation guidelines.


>
>
> 1362   The FSMs and procedures are normative in the sense that an
> 1363   implementation MUST implement them either literally or an
> 1364   implementation MUST exhibit externally observable behavior that is
> 1365   identical to the execution of the specified FSMs.
>
> [major] How can you tell the difference?  rfc2119 keywords "MUST only be
> used where it is actually required for interoperation".  In this case there
> are options, and there is no way to verify one or the other as long as the
> implementations "exhibit externally observable behavior that is
> identical".  IOW, I am not comfortable with using MUST if it is not
> required.  Also, I believe the statement to generally be true for any
> specification and not needed.
>


>
>
> Suggestion (borrowing from rfc4271):
>    The data structures and FSMs described in this document are conceptual
>    and do not have to be implemented precisely as described here, as long
>    as the implementations support the described functionality and exhibit
>    the same externally visible behavior.
>

ack, thanks. that's good. great suggestion.


>
>
> 1367   Where a FSM representation is inconvenient, i.e. the amount of
> 1368   procedures and kept state exceeds the amount of transitions, we
> defer
> 1369   to a more procedural description on data structures.
>
> [] This paragraph can also be eliminated.
>

ack


>
> 1371 4.2.1.  Transport
>
> 1373   All packet formats are defined in Thrift [thrift] models in
> 1374   Appendix B.
>
> [] Can we get an index/TOC or some type of guide in the appendix to make
> it easier to find specific parts?  For example, finding where a LIE is
> defined is not straight forward -- I happened to stumble on it in B.2.
> Another option would be to point in §4.2.2 to look for LIEPacket in B.2.
>

yes, as we said there will be a more extensive intro and a reader's guide.
And I'll refer to model & according structure.


>
> [major] It must be stated somewhere that the contents of Appendix B are
> Normative.
>

ack


>
> 1376   The serialized model is carried in an envelope within a UDP frame
> 1377   that provides security and allows validation/modification of
> several
> 1378   important fields without de-serialization for performance and
> 1379   security reasons.
>
> [minor] This is a very short section -- please put references to other
> places where these topics are explained more, if any.
>

aha, ok. I probably expound the section a bit as in "this MD5 protects this
and then this MD5 on top protects this" so e'thing is completely convered.


>
>
> 1381 4.2.2.  Link (Neighbor) Discovery (LIE Exchange)
>
> 1383   RIFT LIE exchange auto-discovers neighbors, negotiates ZTP
> parameters
> 1384   and discovers miscablings.  It uses a three-way handshake mechanism
> 1385   which is a cleaned up version of [RFC5303].  Observe that for
> easier
> 1386   comprehension the terminology of one/two and three-way states does
> 1387   NOT align with OSPF or ISIS FSMs albeit they use roughly same
> 1388   mechanisms.  The formation progresses under normal conditions from
> 1389   one-way to two-way and then three-way state at which point it is
> 1390   ready to exchange TIEs per Section 4.2.3.
>
> [minor] "...which is a cleaned up version of [RFC5303]."  There's no need
> to hint at the fact that other implementations/protocols may not be as
> good/clean as rift.  Please take this part of the sentence out.
>

sure. it was not meant as derogatory, if anything, RIFT learned from all
the lessons ISIS and OSPF went through and had to bolt onto stuff while
preserving backward compatiblity. I will remove and as I said, basically
remove any reference to ISIS/OSPF hello FSMs.


>
> [minor] "Observe that for easier comprehension the terminology of one/two
> and three-way states does NOT align with OSPF or ISIS FSMs albeit they use
> roughly same mechanisms."   I don't know how changing everything I know
> makes comprehension easier. ;-)   Seriously: we don't need to tell the
> reader what is not used.
>

ack. When writing this I was mostly thinking about ISIS implenter looking @
this and thinking "hmm, how does that compare?". Per above.


>
> 1392   LIE exchange happens over well-known administratively locally
> scoped
> 1393   and configured or otherwise well-known IPv4 multicast address
> 1394   [RFC2365] and/or link-local multicast scope [RFC4291] for IPv6
> 1395   [RFC8200] using a configured or otherwise a well-known destination
> 1396   UDP port defined in Appendix C.1.  LIEs SHOULD be sent with an IPv4
> 1397   Time to Live (TTL) / IPv6 Hop Limit (HL) of 1 to prevent RIFT
> 1398   information reaching beyond a single L3 next-hop in the topology.
> 1399   LIEs SHOULD be sent with network control precedence.
>
> [] The first sentence is long and convoluted.  I think I understood that
> the assigned destination IP address and UDP port are used, but that they
> also can be configured.  Is that right?
>

yes, will simplify


> Suggestion>
>
>    LIEs are exchanged over the well-known multicast addresses and UDP
>    port, as summarized in Appendix C.1.  An implementation MAY allow
>    for the local configuration of these parameters.
>
>
> [] If configured, do both sides need to be configured beforehand or is
> there a discovery mechanism?
>

it's like OSPF, both have to  agree on multicast & port which is basically
a protocol constant and then rest just comes up.  An implementon could
indiscriminately enable RIFT on all interfaces but some may choose to let
user configure on which interfaces to run RIFT.

The multicast/port being variable is super handy for multi-instance things.
Comes for free baiscally.


>
>
> [major] "LIEs SHOULD be sent with..."  When is it ok to not use these
> values?  IOW, why is this action recommended and not required?  (The next
> comment is related.)
>


>
> [minor] "IPv4 Time to Live (TTL) / IPv6 Hop Limit (HL) of 1"  Was GTSM
> (rfc5082) considered?  Several documents (for example, rfc8085 and
> draft-ietf-opsec-v6) suggest its use.  You may be asked about it later.  It
> would be good to preempt those questions by adding some text in the
> Security Considerations about any risks...or using it.
>

There is a religious war about 255 vs. 1 since a long time. I personally in
deployment was hurt @ least twice by implementations ending up not checking
TTL/misprocessing and forwarding one-hop packets with 254 on
unicast/multicast. Also, having a buggy FIB looping tightly packets with
TTL 255 is seldom fun. I never saw a thing not decrementing TTL and
dropping on 1. I prefer the 1 with possibly a multi-hop spoof (never saw
that in my life frankly, such an attack is really, really hard to engineer)
than a rogue packet running away in the network or micro looping 255 times
on fast links.

What I saw were replay attacks and against those RIFT is extremely well
protected as far it's practically implementable. Plus, if one _really_
wants security one configures adjacency keys (and RIFT allows a full
plethora of that) rather than rely on the 255 hack to have "kind of a
little lick of security".

So if the security folks push on it we can always go to 255 or write that
the TLL received MUST be either 255 or 1 to satisfy both camps.


>
>
> [major] "LIEs SHOULD be sent with network control precedence."  When is it
> ok to not do so?  IOW, when is this behavior recommended and not required.
> BTW, please add a reference.
>

hard to implement often that's why it's not MUST. will add ref to
precedence.


>
>
> 1401   Originating port of the LIE has no further significance other than
> 1402   identifying the origination point.  LIEs are exchanged over all
> links
> 1403   running RIFT.
>
> [nit] s/Originating port/The originating port
>
>
> 1405   An implementation MAY listen and send LIEs on IPv4 and/or IPv6
> 1406   multicast addresses.  A node MUST NOT originate LIEs on an address
> 1407   family if it does not process received LIEs on that family.  LIEs
> on
> 1408   same link are considered part of the same negotiation independent
> of
> 1409   the address family they arrive on.  Observe further that the LIE
> 1410   source address may not identify the peer uniquely in unnumbered or
> 1411   link-local address cases so the response transmission MUST occur
> over
> 1412   the same interface the LIEs have been received on.  A node MAY use
> 1413   any of the adjacency's source addresses it saw in LIEs on the
> 1414   specific interface during adjacency formation to send TIEs.  That
> 1415   implies that an implementation MUST be ready to accept TIEs on all
> 1416   addresses it used as source of LIE frames.
>
> [major] "An implementation MAY listen and send LIEs on IPv4 and/or IPv6
> multicast addresses.  A node MUST NOT originate LIEs on an address family
> if it does not process received LIEs on that family."
>
> The first sentence says that it is optional to listen and send LIEs over
> either IPv4/IPv6 (IOW, a router doesn't have to do either).
>

yes


>   The second one gives the impression that originating a LIE on a specific
> AF means that the router is willing to only receive/process LIEs on it
> (e.g. if an IPv6 LIE is not generated then there's no positive indication
> that the router can receive/process IPv6 LIEs).  The combination results in
> a potential lock at startup:
>

yes, it does. we could probably say that if one listens on an AF one should
also send on it. But it's far trickier than that in real life. Someone can
configure the protocol to listen on the AF (and on some platforms it's one
socket for all interfaces). But then a specific interface may have an AF
configured but  e.g. have martians so you don't want to send that out. So
you end up listening on an AF but not sending on it on a certain
interface.  You also have the issue of passive interfaces you neither
listen nor send.

So, generally I prefer this flexiblity the spec has now since practically,
all kind of combinations arise (add v4ov6 to the whole thing) & that's also
why we have AFs carried on the link information as well to be able to
figure out during SPF/from topology what AFs are on and detect
discontinuity if so desired or even allow to fwd. v4 normal then jump a
v4ov6 and get there.


> For example, if a router chooses to only listen on IPv4 (MAY = optional)
> and it's neighbor on IPv6, then they will be sending LIEs and never talk to
> each other.
>
> I'm assuming that the intent is to allow for maximum flexibility: allow to
> send/receive on either (or both).  But, what if only one AF is supported?
> Should an implementation be capable to control which AF is used?  If I have
> an IPv6-only deployment, should the router be expected to receive LIEs over
> IPv4?
>

Per above, v4ov6 is absolutely desired and built in. So you can build v6
_and_ v4 and implementaiton can use either nexthop or you just send v6 but
indicate that you support v4ov6 on LIE in `

v4_forwarding_capable

` (how would you know otherwise?). Add to that different chipsets
struggling with v6ov4 in different forms and as you see, this flexibility
is necessary and yes, if you're in a config where you listen on AF and not
send or send but not listen the AF will not come up and if AFs are
mismatched on both sides nothing will come up. Realities of deployment and
that's e.g. why we carry "miscabled links". Such stuff could be flagged to
detect one-side broken links (which are a fact of life anyway due to some
very funky HW failures).


>
> [nit] s/LIEs on an address family/LIEs using an address family
>
>
> [nit] s/LIEs on same link/All LIEs on a link
>
>
> [minor] "part of the same negotiation"  Negotiation of what?  The text at
> the start of this section mentions that LIEs negotiate ZTP parameters -- is
> that it?  Other actions don't seem to be negotiations: discovery, for
> example.   I'm not sure if something like "session" would capture the
> intended meaning.
>

well, it's not really a session.  yes, I'll describe better what is
negotiated. "Negotiation" is also not a good thing, it's really a
distributed, unsynchronized state diffusion via a front (basically like
RIP) but that does not read well ;-)  Described in ZTP section.


>
> [major] "MAY use any of the adjacency's source addresses it saw in
> LIEs...to send TIEs."  This use of "MAY" makes using the source address
> from a LIE optional.  §4.2.3.3 says that "TIEs...using the destination
> address on which the LIE adjacency has been formed", which doesn't make the
> use optional.
>
> OLD>
>    A node MAY use any of the adjacency's source addresses it saw in
>    LIEs on the specific interface during adjacency formation to send
>    TIEs.
>
> NEW>
>    A node may use any of the adjacency's source addresses from the
>    LIEs received on the specific interface during adjacency formation
>    to send TIEs (Section 4.2.3.3).
>
> [See related question in 4.2.3.3.]
>

ack


>
> 1418   A three-way adjacency over any address family implies support for
> 1419   IPv4 forwarding if the `v4_forwarding_capable` flag is set to true
> 1420   and a node can use [RFC5549] type of forwarding in such a
> situation.
> 1421   It is expected that the whole fabric supports the same type of
> 1422   forwarding of address families on all the links.  Operation of a
> 1423   fabric where only some of the links are supporting forwarding on an
> 1424   address family and others do not is outside the scope of this
> 1425   specification.
>
> [minor] "three-way adjacency"   I know there's a definition in the
> terminology section, but it is not a specification of what a three-way
> adjacency is.  Please add a forward reference to where it is specified.
>

ack


>
>
> [major] "`v4_forwarding_capable` flag"   (This comment is not specific to
> this flag.)
>
> The specific LIE packet format has not been presented, introduced or even
> referenced up to now.  Pointing to a flag, which happens to be optional in
> an optional part of the LIEPacket, comes out of the blue and feels
> completely out of place without prior explanation.
>

I can try to describe all the different scenarios of v4/v6/v4ov6/mix but
fact is, it will read more and more confused so I try to keep it fairly dry
and short. It's obvious when one deploys/implements.

```

<t>A three-way adjacency over any address family implies support
    for IPv4 forwarding if the `v4_forwarding_capable` flag is set to
true and a node
    can use <xref target="RFC5549"/> type of forwarding in such a situation.
    It is expected that the whole fabric
    supports the same type of forwarding of address families on all the links.
    Operation of a fabric where only
    some of the links are supporting forwarding on an address family and
    others do not is outside the scope of this specification.
    </t>```

reads completely clear to me with ref to 5549 so what am I supposed to
add that will clarify it further without muddying things.



>
> I can see that some of the fields are described in Appendix B, but there
> should at least be a pointer to that.
>
> I also expect a description somewhere of expectations and error
> conditions. For example, see my comments in the appendix related to
> LinkCapabilities.
>
>
> [major] "a node can use [RFC5549] type of forwarding"
>
> First of all, rfc5549 is a BGP-specific RFC.  How do the BGP mechanisms
> defined there (and in rfc8950) map to RIFT?  IOW, please specify how RIFT
> works.
>
> Because you listed rfc5549 as a Normative reference, I assume you want
> this behavior to be normative.  s/can use/MUST/SHOULD/MAY ??
>

Sigh, Alvaro, is it really the job of RIFT to start describe forwarding
path all major chip vendors implemented? I do NOT think it's a wise idea.
It refers to RFC5549 because the forwarding behavior is ultimately
precisely the same but I don't want to write models of forwarding paths,
ARP tables etc.

The protocol spec describes all the control plane necessary to interop
properly, it does not explain LPM or v4ov6 forwarding or any such thing.

I would expect some more guidance here but again, this is not an
implementation primer for a DC switch including fwd path, it's a control
protocol spec.


>
> [major] "It is expected that the whole fabric supports the same type of
> forwarding of address families on all the links.  Operation of a fabric
> where only some of the links are supporting forwarding on an address family
> and others do not is outside the scope of this specification."
>
> Please remind me, is IPv4 optional or required?  I'm assuming that IPv6 is
> required and IPv4 optional, but I can't find anything definitive in the
> document.
>

neither is required. RIFT will work with any combination (v4 only, v6 only,
v4 and v6, v4 over v6 only). If neither is implemented then there is no
forwarding obviously. LIEs basically exchange enough info over both AF and
the 'v4 capable' flag to negotiate _any_ viable combination. So what do you
want me to say?  People deploy all of the above variants & are religious
about it. Why would I say "v6 is required" if protocol works just fine with
v4 only. And why have "v4 required" if it works fine with v6 only as well.


>
> Even if IPv4 is optional, I'm assuming that most deployments will support
> both.  But as time goes on that may (slowly) change to IPv6-only.  Is that
> a fair assumption?
>

nope. read above.


>
> Because the ability to support IPv4 is signaled per link
> (v4_forwarding_capable), then the expectation that "the whole fabric
> supports the same type of forwarding of address families" may be "easily"
> broken if one link doesn't support IPv4 (including the case where a rogue
> node may not advertise v4_forwarding_capable to mess things up).  What
> should a node that detects an inconsistency (e.g. some links use
> dual-stack, but others only one AF) do?  I'm assuming that it SHOULD/MUST
> (?) go as far as disabling the adjacencies as described below, right?
> Please be explicit and specific on the actions to be taken.
>

I will add "such a thing is outside the scope of this document". this is
not a deployment guideline, we have day one books for it, we have
applicability spec.

You cannot figure that out BTW, a fabric may partition along AFs
supported/capabilities and unless you run reachabiliuty per AF the fabric
will blackhole likely. Running computona per AF is a very bad idea,
partitioned (partially connected) networks are operationally a nightmare. I
learned that the hard way in MT years ago.

Applicability draft may want to loose a word or two about it.


>
> 1427   The protocol does NOT support selective disabling of address
> 1428   families, disabling v4 forwarding capability or any local address
> 1429   changes in three-way state, i.e. if a link has entered three-way
> IPv4
> 1430   and/or IPv6 with a neighbor on an adjacency and it wants to stop
> 1431   supporting one of the families or change any of its local addresses
> 1432   or stop v4 forwarding, it has to tear down and rebuild the
> adjacency.
> 1433   It also has to remove any information it stored about the adjacency
> 1434   such as LIE source addresses seen.
>
> 1436   Unless ZTP as described in Section 4.2.7 is used, each node is
> 1437   provisioned with the level at which it is operating.  It MAY be
> also
> 1438   provisioned with its PoD.  If any of those values is undefined,
> then
> 1439   accordingly a default level and/or an "undefined" PoD are assumed.
> 1440   This means that leaves do not need to be configured at all if
> initial
> 1441   configuration values are all left at "undefined" value.  Nodes
> above
> 1442   ToP MUST remain at "any" PoD value which has the same value as
> 1443   "undefined" PoD.  This information is propagated in the LIEs
> 1444   exchanged.
>
> [major] There are references made in this paragraph to things that are
> defined elsewhere (""any" PoD"..."propagated in the LIEs").  Please put
> pointers to where these things are defined.  As mentioned above, the LIE
> format hasn't been presented yet.
>

ack, yes LIE will be fwd referenced with according fields etc.


>
> The text above gives the impression that the type of the PoD is somehow
> related to the level, so I went in search of that and found this in the
> LIEPacket:
>
>    struct LIEPacket {
>   ...
>        /** Node's PoD. */
>      7: optional common.PodType            pod =
>              common.default_pod;
>
> ...but then I can't find anything that corresponds to PodType matching the
> values above: undefined, any...
>


>
> ...but I did find this:
>
>    /** Common RIFT packet header. */
>    struct PacketHeader {
> ...
>      /** Level of the node sending the packet, required on everything
>          except LIEs. Lack of presence on LIEs indicates UNDEFINED_LEVEL
>          and is used in ZTP procedures.
>       */
>      4: optional common.LevelType            level;
>    }
>
> ...and a couple of constants:
>
>   const LevelType   top_of_fabric_level         = 24
>   const LevelType   leaf_level                  = 0
>   const LevelType   default_level               = leaf_level
>
> If ZTP is not used, are there operational/deployment considerations for an
> operator to provision the levels?  Should they simply start counting from
> 0, is incrementing by one good enough or should it be by 2?  Or maybe there
> are considerations about the use of ZTP?  Please take a look at rfc5706.
>

Alvaro, those are applicability draft questions, not protocol spec. Let's
get a spec out then we can improve the applicability draft. On this note,
LIE rules will e.g. prevent you from numbering in 2 steps. Also leaf-level
is described in LIE rules as pretty special so if you read those it is
clear you cannot number ToF with 0 or things with -1 or other ideas that
may come up.


>
>
> [minor] "It MAY be also provisioned with its PoD."  What does this mean?
> Does it mean that all the nodes in the PoD have the same level?  ...
>

I don't understand how PoD and level can be confused. The glossary is super
explicit about those things. There is never anywhere anything that somehow
implies PoD numbering has anything to do with level.

PoD checking rules are in the LIE processing rules.


>
>
> [major] BTW, this reminds me, what are the manageability considerations
> related to RIFT?  Among other things, it would be nice to have a summary of
> parameters that are expected to be configured and their defaults.  Please
> take a look at rfc5706.
>

Alvaro, since when are manageability considerations part of the base
protocol spec? we have yang model spec, we have applicability spec, we can
have a manageablity spec, fine with me. Base spec is here to say enough to
build an interoperable implementation. That has been proven by a clean room
open source that inter'oped day one.


>
> 1446   Further definitions of leaf flags are found in Section 4.2.7 given
> 1447   they have implications in terms of level and adjacency forming
> here.
>
> [] "leaf flags"   What are those?  I think this is the first mention.
>

yeah, I'll refer forward to model.


>
> [] Suggestion>
>    Further definitions of leaf flags are found in Section 4.2.7.
>
>
> 1449   A node tries to form a three-way adjacency if and only if
>
> [nit] s/tries to form/starts to form
>
>
> [major] "if and only if"   Can this be translated into a Normative
> statement?  Are all these required (MUST) before the formation of a 3-way
> adjacency started?
>

I do NOT want to touch the LIE rules, those have been ironed over 2
implementations super carefully and held. mathematicians use IIF
religiously, good enough for me. Beside me @ least 2 or 3 people
implemented those rules correctly (based on interop) already so the
description clearly works.


>
>
> 1451   1.  the node is in the same PoD or either the node or the neighbor
> 1452       advertises "undefined/any" PoD membership (PoD# = 0) AND
>
> [minor] "the node is in the same PoD"  As what?   I guess you mean in the
> same PoD as what a neighbor indicated in its LIE, right?
>

as "its neighbor", reads clear in context to me but I'll add.


>
>
> 1454   2.  the neighboring node is running the same MAJOR schema version
> AND
>
> [] The formats have not been introduced yet.
>
> I peeked into Appendix B and found the PacketHeader.  As with #1, you're
> assuming a LIE packet has been received from the neighbor, but that is not
> mentioned.
>
>
ok, I'll fwd' reference.


>
> [] The versioning system has not been introduced.  Please put a reference
> to it.
>

Ok, I'll fwd reference.


>
>
> [nit] This is the only place where "MAJOR" (all caps) is used.  Please be
> consistent.
>

ack


>
> [minor] Are there any requirements related to the minor version?
>

the schema versioning does describe that in excruciating details @
beginning of appendix B. I'll fwd reference


>
>
> 1456   3.  the neighbor is not member of some PoD while the node has a
> 1457       northbound adjacency already joining another PoD AND
>
> [nit] s/not member of some PoD/not a member of the same PoD
>
>
> [major] "not [a] member of [the] some PoD"  #1 says that the "node is in
> the same PoD" -- it looks like there's some context missing -- I guess
> about the ability to join other PoDs.  Is this only a northbound
> characteristic?
>

no, there is no such thing. You can basically configure PoD ooptionally and
that's it if you choose. That gives protection against cross-PoD miswiring,
there is nothing more here.


>
>
> 1459   4.  the neighboring node uses a valid System ID AND
>
> [minor] "System ID" and "SystemID" are both used.  Please pick one.
>

ack

>
>
> [major] What is a "valid System ID"?  The only clue that I found is in
> §4.2.7.2, which makes we wonder whether it is specific to ZTP or not (??):
>
>    RIFT nodes require a 64 bit SystemID which SHOULD be derived as
>    EUI-64 MA-L derive according to [EUI64].  The organizationally
>    governed portion of this ID (24 bits) can be used to generate
>    multiple IDs if required to indicate more than one RIFT instance."
>
> If this is a recommendation (SHOULD), and assuming that it applies beyond
> ZTP, what is a "valid System ID"?
>



>
> I also only found one place that talks about an IllegalSystemID, but that
> is probably not the same as an invalid one:
>
>    /** 0 is illegal for SystemID */
>    const SystemIDType IllegalSystemID        = 0
>

yes, that's it. I'll fwd reference


>
>
> 1461   5.  the neighboring node uses a different System ID than the node
> 1462       itself
>
> [minor] Is there a way to resolve/address duplicate System IDs?  It seems
> to me that a rogue node may reflect the sender's System ID and prevent an
> adjacency from forming (along with a number of other values).
>

there is a million way a rogue node can prevent adjacency from forming. I
would describe it as "implementation bug". This is actually quite commonly
used to detect self-looped links BTW and a good implementaiton will flag
miscabling.

System IDs are 8 bytes and use Chassis MAC basically. If we are concern4ed
about duplicate System IDs we should first solve duplicate chassis MAC
problem.

Realistically, only time I saw duplicate system ID was on buggy
virtualization open-source that happily reused MACs. That caused enough
other problems RIFT wasn't a problem. What to do in such stiuation is not
protocol spec, it's again either applicability or deployment guide or
something else.


>
>
> [minor] Also, I hope that the selection/assignment of the System ID is
> discussed elsewhere.  A pointer might be nice.
>

section "

 Automatic SystemID Selection"

tells _precisely_ how system ID should be formed. But anything
implementation feels confident won't collide is fine by the protocol.


>
> 1464   6.  the advertised MTUs match on both sides AND
>
> [] Ahhh...this in in the LIEPacket too.
>

well, the rules are detailed, the schema is here. I don';t think
introducing e'thing in all the detail upfront in the LIE section would
beneift any clarity of the document. There is a flow where lots of stuff
may need a 2nd read or becomes clear on implementation. We tried for a spec
that reads bit superficial but well on first run but not for one that on a
single read makes someone a RIFT implementation guru. Very much like BIER
really. RIFT is quite a lot to digest to implement albeit for the user of
it it's a one-button-protocol.


>
> [major]
>    struct LIEPacket {
> ...
>      /** Layer 3 MTU, used to discover to mismatch. */
>      4: optional common.MTUSizeType        link_mtu_size =
>              common.default_mtu_size;
>
> If MTUSizeType is not included, where is it specified that the
> default_mtu_size must be considered?  Without that specification it is not
> possible to satisfy this condition.
>

all in schemas

/** @note MUST be interpreted in implementation as unsigned */
typedef i32      MTUSizeType

/** default MTU link size to use */
const MTUSizeType     default_mtu_size           = 1400

encoding model includes common model


>
> 1466   7.  both nodes advertise defined level values AND
>
> [major] The advertisement of LevelType in PacketHeader is optional.  That
> means that even if there is a default value a node may not "*advertise*
> defined level values".  Also, there are only 2 level values defined:
>
>    const LevelType   top_of_fabric_level         = 24
>    const LevelType   leaf_level                  = 0
>
> What am I missing?
>

its own section explains ZTP algorithm in detail, advertisement etc

It's impossible to describe in LIE processing section ZTP already, it's all
a big muddle then. Level is optional because a ZTP node coming up has no
level (except ToF) so it cannot show anything. Then ZTP starts to
propagate, people get level offers, tie-break them, start to pick up levels
& see whether that converges and so on.



>
>
> 1468   8.  [
>
> 1470          i) the node is at level 0 and has no three way adjacencies
> 1471          already to nodes at Highest Adjacency Three-Way level (HAT
> as
> 1472          defined later in Section 4.2.7.1) with level different than
> 1473          the adjacent node OR
>
> [nit] s/three way/three-way/g
>

ack

>
>
> ...
> 1478          iii) both nodes are at level 0 AND both indicate support for
> 1479          Section 4.3.8 OR
>
> [minor] "support for Section 4.3.8"  Is there a name for that?  How is it
> indicated?  §4.3.8 says that a leaf can "advertise the LEAF_2_LEAF flag in
> its node capabilities"; there is nothing called "LEAF_2_LEAF" in the
> schema, but "leaf_only_and_leaf_2_leaf_procedures" does show up.  Please be
> consistent in the terminology.  [Also, there are places where "leaf-2-leaf"
> is used.]
>

yes, leaf-2-leaf procedures but it's a mouthful and it's not precise either
since it's also leaf-only

leaf-2-leaf is basically a common term for two flags in `

HierarchyIndications

`
called

leaf_only_and_leaf_2_leaf_procedures

leaf_only

whereas first includes second (it's a complex relationship we modelled
like this)

hence section 4.3.8 is bascially " anything around leaf only or
leaf-2-leaf or both"

So, I think refering to the section is still least confusing thing.

Scheam uses leaf_2_leaf because of thrift id rules but people prefer
leaf-2-leaf in the document so far.



>
> ...
> 1486   The rules checking PoD numbering MAY be optionally disregarded by a
> 1487   node if PoD detection is undesirable or has to be ignored.  This
> will
> 1488   not affect the correctness of the protocol except preventing
> 1489   detection of certain miscabling cases.
>
> [nit] s/rules checking/rules for checking
>
>
> [nit] s/MAY be optionally/MAY be    (redundant)
>
>
> [minor] What are that the "rules [for] checking PoD numbering"?  Where are
> they specified?
>

in the LIE processing rules you mentioned before.


>
> [major] "MAY be optionally disregarded by a node if PoD detection is
> undesirable or has to be ignored."
>
> This is the only place where "PoD detection" is used.  What is it?  Where
> is it specified?  Why would be it undesirable?  When would it have to be
> ignored?  Are there operational considerations for making that decision?
>

in the LIE processing. you check whethe the other side matches your PoD or
either one is ANY_POD=0 and that's it. No further magic.


>
>
> [major] "...except preventing detection of certain miscabling cases."
>  Where is miscabling detection described?  Which cases may not be covered?
>  §4.2.7.6 says that "internal ruleset flags a possible miscabling", but
> that is the closest that I came to finding a description (and rulesets are
> not mentioned anywhere else).
>

miscabling computation is not part of the protocol spec. We allow to
indicate "miscabling" but what is declared miscabling is impossible to
specify. Peoiple do ALL kind of things in fabric, e.g. when your link
starts to loose 5% some folks declare link "faulty" or "miscabled". There
are no rules. An implementation can declare miscabling and do whatever it
wants with the adjacency (probably hold down) and then it can put it into

 /** If any local links are miscabled, the indication is flooded. */
10: optional set<common.LinkIDType>      miscabled_links;

so you can look @ the ToF LSDB and see your whole fabric with links
with problems. It's pretty well liked.



>
>
> 1491   A node configured with "undefined" PoD membership MUST, after
> 1492   building first northbound three way adjacencies to a node being in
> a
> 1493   defined PoD, advertise that PoD as part of its LIEs.  In case that
> 1494   adjacency is lost, from all available northbound three way
> 1495   adjacencies the node with the highest System ID and defined PoD is
> 1496   chosen.  That way the northmost defined PoD value (normally the ToP
> 1497   nodes) can diffuse southbound towards the leaves "forcing" the PoD
> 1498   value on any node with "undefined" PoD.
>
> [nit] s/building first...adjacencies/building its first...adjacency
>
>
> [nit] s/to a node being in a defined PoD/to a node in a defined PoD
>
>
> [minor] "...defined PoD is chosen."  ...chosen as the PoD value to
> advertise.  (or somehting along those lines)
>
>
> [nit] s/northmost/northernmost
>
>
> [] I'm missing how using the System ID to select will always result in the
> north direction...
>
>
> [minor] "...diffuse southbound towards the leaves "forcing" the PoD value
> on any node with "undefined" PoD."  An example of this would be nice.
>




>
>
> 1500   LIEs arriving with IPv4 Time to Live (TTL) / IPv6 Hop Limit (HL)
> 1501   larger than 1 MUST be ignored.
>
> [major] "MUST be ignored"  The specification of the sender (§4.2.2) says
> that "LIEs SHOULD be sent with an IPv4 Time to Live (TTL) / IPv6 Hop Limit
> (HL) of 1".  This is a Normative conflict because it is only recommended to
> the sender, but required by the receiver.
>

aha, yes, very good. I'm reluctant to say MUST since it's not always
possible on a platform to force TTL (this is UDP). So I make it SHOULD on
both sides.

your comment on the 255/1 stands of course.


>
>
> 1503   A node SHOULD NOT send out LIEs without defined level in the header
> 1504   but in certain scenarios it may be beneficial for trouble-shooting
> 1505   purposes.
>
> [] See the related comments in #7 above.
>
>
> [nit] s/trouble-shooting/troubleshooting
>
>
> [major] #7 above makes advertising defined values a requirement to forming
> a three-way adjacency, but it is only recommended here.  Unless there are
> specific troubleshooting cases described, we should retain the
> requirement.  I believe that a packet can be crafted in any way for
> troubleshooting/testing, but that doesn't have to be indicated in a
> specification which should deal with the correct operation of the
> protocol.  IOW, I think we can do without this paragraph.
>

ack


>
>
> 1507 4.2.2.1.  LIE FSM
>
> 1509   This section specifies the precise, normative LIE FSM and can be
> 1510   omitted unless the reader is pursuing an implementation of the
> 1511   protocol.
>
> [nit] s/and can be omitted..../and can be omitted.
>
>
> [] It would be very nice if there was an upfront description (at least a
> list) of the different elements: states, events, etc.. to make reading the
> diagrams easier.  Also, the terminology (PUSH, etc.) should also be upfront.
>

ok, I'll add a bit.


>
>
> 1513   Initial state is `OneWay`.
>
> 1515   Event `MultipleNeighbors` occurs normally when more than two nodes
> 1516   see each other on the same link or a remote node is quickly
> 1517   reconfigured or rebooted without regressing to `OneWay` first.
> Each
> 1518   occurrence of the event SHOULD generate a clear, according
> 1519   notification to help operational deployments.
>
> [] This event is the only one described.  Is there anything special about
> it?  Is this the only one for which generating a notification is
> recommended?  Is that why it is described here?
>

yes, rest is really vanilla.


>
>
> [minor] "occurs normally when..."   Are there other conditions (which are
> not considered "normal")?
>

yes, weird ones like e.g. a node reboots & changes system ID and all of a
sudden it looks like 3 nodes on a link (which is hinted). This is all too
far for a protocol spec, those are day one/applicability things if one
wants to go into detailed details of all the sceanrios.


>
>
> [major] "more than two nodes see each other on the same link"  The
> description below is different (and I think more accurate): "more than one
> neighbor seen on interface".  Also, it doesn't include the second part
> about a "reconfigured or rebooted" node.  What is the characteristic of
> that behavior?
>

ack.


>
>
> [major] "SHOULD generate a clear, according notification to help
> operational deployments."   Normatively, what is "clear"?  Is there
> information that you have in mind that should be included to make it
> "clear"?  I assume that a notification is a message logged on the node, is
> that true?
>
> Suggestion>
>    ...SHOULD log a message including the System IDs of each node.
>
>    (Anything else?)
>

logging is the old days. today people want telemetry, yang, open config,
splunk channel, dancing bears, hence the "notification"


>
>
> [] This seems to be the only occurrence of a log resulting from a message
> or other action.  I think I'm missing the significance of this event.
>

yeah, this is the only time people _really_ wanted a notification. And the
problem is that it's hard to declare as "miscabling" because the link flaps
up and down persistently, people back off, try again so the link is up for
a bit and not miscabled etc. Hence the "notification" is really kind of
best one can do AFAIS.


>
>
> [Ok.  This is where I am in my reading.  There are more comments below.]
>
>
> ...
> 2069 4.2.3.3.  Flooding
> ...
> 2082   As described before, TIEs themselves are transported over UDP with
> 2083   the ports indicated in the LIE exchanges and using the destination
> 2084   address on which the LIE adjacency has been formed.  For unnumbered
> 2085   IPv4 interfaces same considerations apply as in equivalent OSPF
> case.
>
> [major] (Related to a comment in 4.2.2.) "using the destination address on
> which the LIE adjacency has been formed"  If multiple addresses are used
> (from different AFs), which one is selected?  Is there a "primary" address
> that is considered when the adjacency is formed?
>

very good catch.

No, the spec does not give the rules. An implementation will pick what it
likes and the other side has to follow because it sees it as LIE source
address.

Trying to start to build "prefrences" is a futile exercise. everybody has a
different opinion what is "best" address on an interface and then the
"unnumbered" wars start ;-)


>
>
> ...
> 6165 10.2.  Informative References
> ...
> 6175   [DOT]      Ellson, J. and L. Koutsofios, "Graphviz: open source
> graph
> 6176              drawing tools", Springer-Verlag , 2001.
>
> [major] Is there an URI that can be used as a stable pointer?
>

I can try, not that I can think of one.


>
>
> ...
> 6362 B.1.  common.thrift
> ...
> 6730 /** Link capabilities. */
> 6731 struct LinkCapabilities {
> 6732     /** Indicates that the link is supporting BFD. */
> 6733     1: optional bool                           bfd =
> 6734             common.bfd_default;
> 6735     /** Indicates whether the interface will support v4 forwarding.
>
> 6737         @note: This MUST be set to true when LIEs from a v4 address
> are
> 6738                sent and MAY be set to true in LIEs on v6 address. If
> v4
> 6739                and v6 LIEs indicate contradicting information the
> 6740                behavior is unspecified. */
>
> 6742     2: optional bool                           v4_forwarding_capable
> =
> 6743             true;
> 6744 }
>
> [major] Not knowing thrift, and assuming that this structure is some sort
> of enum/list, what should a receiver do if the undefined/unknown value 3 is
> received?
>

the packet won't even be decoded. you'll get an encoder/decoder error, it's
all handled internally by thrift.

That's the beauty of modelling. 80%+ of problems/worries we bean count
manually in protocols just vanish magically.

Remember, we had _no_ bugs in first interop of LIEs with a blind 2nd
implementation.


>
>
> [major] "contradicting information the behavior is unspecified."  How can
> the behavior be unspecified if this *is* the specification?   More to the
> point: if the indication is required in IPv4 LIEs, but only optional in
> IPv6 LIEs, why wouldn't the information from the IPv4 LIEs be considered as
> the truth?
>

because _sooo_ many things can go wrong. LIEs on different AFs can be
buffered, arrive out of sequence and do a myriad crazy things. We were
ruminating for a long time whether we should have one LIE on a single AF
but then like I told you there is no "required AF" in RIFT based on
customer input, e'one runs what they want on their fabric. And customers
prefer AF fate that is not shared. So de facto v4 and v6 run as ships in
the night (except the v4 capable v4ov6) and we end up advertising the
bfd/v4 capable twice (because again, we don't know wehther we'll have
either AF or both on an adjacency). Works fine in practice, an
implementation seeing persistent mismatch should probably just declare link
mismatch and shut down adjacency but that's really implementation specific.


>
>
> [nit] s/interface will support v4 forwarding/interface supports IPv4
> forwarding/g
>
>
> [nit] s/v4/IPv4/g  s/v6/IPv6/g
>
>
> [major] "Indicates that the link is supporting BFD."  I was going to ask
> if setting bfd to TRUE means that the link can support BFD (is capable) or
> if it BFS is actively enabled (supporting), but I found this somewhere else:
>


>
>    /** default link being BFD capable */
>    const bool            bfd_default                = true
>
> ...so I'm assuming that it just means that the link is capable, which
> makes sense given that is it in LinkCapablities.
>

yes, it just shows it's capable. up to implementation do what they want
with it just like upstream label, not for RIFT base spec to define IMO.
There may be BFD configuration on the system they use, they may do some
autoconfig of BFD, again, opinions here are endless given so much BFD work
is already done.


>
> s/Indicates that the link is supporting BFD./Indicates that the link is
> BFD-capable./g
>
> Also, note that §4.3.5 says that "RIFT MAY incorporate BFD", which is not
> completely aligned with the default indication of the link being
> BFD-capable.  Using BFD is different than being BFD-capable, so it is ok
> for its use to be optional (§4.3.5).  However, there is no stated
> requirement for a link to be BFD-capable, as indicated in the default
> setting.
>
>
no, it's fully optional. And again, what an implementation does with the
information is outside the scope of the spec. I'll add a sentence or two.


uff, long one

-- tony

[Rift] AD Review of draft-ietf-rift-rift-12 (Part… Alvaro Retana
[Rift] Fwd: Re: AD Review of draft-ietf-rift-rift… Alvaro Retana
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Pascal Thubert (pthubert)
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Pascal Thubert (pthubert)
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head
[Rift] RIFT UDP Ports/Multicast Address Allocatio… Alvaro Retana
Re: [Rift] RIFT UDP Ports/Multicast Address Alloc… Antoni Przygienda
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head