[Rift] Fwd: Re: AD Review of draft-ietf-rift-rift-12 (Part 2a)
Alvaro Retana <aretana.ietf@gmail.com> Thu, 01 April 2021 20:08 UTC
Return-Path: <aretana.ietf@gmail.com>
X-Original-To: rift@ietfa.amsl.com
Delivered-To: rift@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 915A63A218C; Thu, 1 Apr 2021 13:08:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.097
X-Spam-Level:
X-Spam-Status: No, score=-1.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mp9Ll5SSB42y; Thu, 1 Apr 2021 13:07:54 -0700 (PDT)
Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF13E3A2189; Thu, 1 Apr 2021 13:07:53 -0700 (PDT)
Received: by mail-ej1-x633.google.com with SMTP id u5so4628818ejn.8; Thu, 01 Apr 2021 13:07:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:mime-version:date:message-id:subject:to:cc; bh=fJrpYUHBYW4d7X8ValmJdLwd65qNIa8xjephXAZGcVc=; b=S2z6ha/D0iIFPMHrKI0ZMU/yjdGd18XWXsd4aqLsPJSs9F/E9P+RvRZUa2J32x+Gng wKGsDn+36eTB0MnRCTHGdHY7QsixPfeEpRhpcibPhQUGug82IngXKetxmJ3dxw22AnI1 VGgvKvSxQLqJA7yUcsWjEstVkoZqWV3vNFwerkiBjXzWZC3Hqa1tjh7YuRFECDj+oOrk P/Vusp6z9ffquloJ1sj4Qj3jM5GP4A+Gl3TLrKc/J5FljrQptu/qYhD5e8jsKnuriUqx NugCDz+kRyhGR/tmqoDHP1kF3tTvB0vDilrwqbexYa/Qu3RgI7hZwPk+QQs+FjyA2N4D iPdA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:date:message-id:subject:to:cc; bh=fJrpYUHBYW4d7X8ValmJdLwd65qNIa8xjephXAZGcVc=; b=DGsYS3RsCZrGGijXKsiRmGX4lUH001LOOZLamIU8A5O5aQLtDVbsu599nXebiuSQRy Ya4QSVYvBKJuxqk2+vssOuppMIpTkobs9mK2ROU7e3/OlHALV463ndzBSHIV2nXyDp62 3hogtXG6RExuS2oLM0je8nu3JfgXE2bp4Matr5Fh46gjouXoz5XXoZnJHfQtErbdYE6C Xz40dLLnzK8d9tYD6NCWMQ0aeI9aZhQySXm6aeJNT0Yth+8ZgTukK5TxYT1g8zJ63E8a DfJJzcv3Jfql1kIcxEyi6dLCtCcDt5dVYtFOB79YmC8VplvmVXbAv884g7wVH15/pi1y PZTA==
X-Gm-Message-State: AOAM533dU058Clf2pkaqJAbeRJs886OUs+xahGeSwBcxn3axz2pq+uuR F8LA/f4B1EFszeAPCkarl6eYI3LpaM7E29uGa2c3rgin
X-Google-Smtp-Source: ABdhPJyEZ005S9gO4ZmerxjxUbbkf6bc0E7A/96hP/0lSG+s8DG41cRqxhjoOfKiTsdTUMcdTU51mJrl3c3fo+jkP5o=
X-Received: by 2002:a17:906:1c13:: with SMTP id k19mr10870500ejg.457.1617307669544; Thu, 01 Apr 2021 13:07:49 -0700 (PDT)
Received: from 1058052472880 named unknown by gmailapi.google.com with HTTPREST; Thu, 1 Apr 2021 13:07:47 -0700
From: Alvaro Retana <aretana.ietf@gmail.com>
MIME-Version: 1.0
Date: Thu, 01 Apr 2021 13:07:47 -0700
Message-ID: <CAMMESsxe1TGf0aT=DzLSo1W7LYfN97u4X08XvY9dY8KZ6wZkkQ@mail.gmail.com>
To: "rift@ietf.org" <rift@ietf.org>
Cc: "draft-ietf-rift-rift@ietf.org" <draft-ietf-rift-rift@ietf.org>, "rift-chairs@ietf.org" <rift-chairs@ietf.org>, "Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net>
Content-Type: multipart/alternative; boundary="00000000000022c38e05beeec9aa"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rift/3AKRngd9vKl0Bnia1cWHV_MPFrc>
Subject: [Rift] Fwd: Re: AD Review of draft-ietf-rift-rift-12 (Part 2a)
X-BeenThere: rift@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of Routing in Fat Trees <rift.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rift>, <mailto:rift-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rift/>
List-Post: <mailto:rift@ietf.org>
List-Help: <mailto:rift-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rift>, <mailto:rift-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Apr 2021 20:08:03 -0000
[Forwarding for the archive.] On March 7, 2021 at 3:56:09 PM, Tony Przygienda (tonysietf@gmail.com) wrote: Alvaro, many thanks for the read inline On Fri, Mar 5, 2021 at 4:36 PM Alvaro Retana <aretana.ietf@gmail.com> wrote: > > Dear authors: > > This is Part 2a of my review of this document. I wanted this second > installment to be longer, but given the IETF meeting next week, I am > sending this fragment out now. > > Part 2a starts with §4.2 (Specification) and goes just through §4.2.2 -- I > just started reading the LIE FSM/§4.2.2.1. I included a couple of comments > on the schema itself. > > > While reading this part of the document I've been learning thrift and > applying my programming and modeling experience (*). In general, the model > seems straight forward (so far). It is important to highlight somewhere at > the start of §4.2 that Appendix B is Normative. The existing references > don't explicitly say so. > ack > > In this first part of the specification there are a number of assumptions > made that should be explained explicitly and not by reference. For > example, it seems to be assumed that the reader knows what a three-way > handshake is -- while most readers of this specification will be familiar > with other routing protocols, this document is *the* RIFT spec and should > not rely on other protocol specifications, much less when the specification > is not (can't!) be the same -- this is what §4.2.2 says: > > It uses a three-way handshake mechanism which is a cleaned up version of > [RFC5303]. Observe that for easier comprehension the terminology of > one/two and three-way states does NOT align with OSPF or ISIS FSMs > albeit > they use roughly same mechanisms. > > IOW, "similar to IS-IS, but not the same." There are more specific > comments inline. > there was lots of re-factoring here already since no'one was happy, I will cut out any comparision to ISIS/OSPF out. > > > "Traditional" specifications usually present packet formats and describe > the fields before explaining their use. §4.2.2 jumped right into talking > about a specific flag (v4_forwarding_capable) without introducing it or the > place where it is carried (LIEPacket). It is important to describe the > different structures, and provide a way to make them simpler to find in > Appendix B. > aha. ok, I will basically start adding references to the model & according structure before refering to fields. > > In my comments I'm asking for a lot of pointers to things that have not > been discussed. I don't know (at this point) whether the overall > organization of the document makes sense as is or if it might be worth > rethinking (i.e. shuffle things around). Food for thought. > well, throwing model upfront won't help, it's like putting packet formats in front of a spec. Model cannot be chopped into pieces, that will be confusing IMO so the forward references to appendix and good glossary in front of every section is best I can imagine. > > > These comments/questions are for the Chairs/Shepherd: > > - It looks like an early TSV review was requested, but I didn't see the > review come in. Did I miss it? > never seen it either > > - §8.1 includes a set of suggested UDP ports, but they are not reflected > in the registry. Early allocation has not been requested -- you should > consider doing so. See rfc7120. > ack > > > Thanks! > > Alvaro. > > (*) I don't have any. ;-) > > > [Line numbers from idnits.] > > ... > 1341 4.2. Specification > ... > 1350 "On Entry" actions on FSM state are performed every time and right > 1351 before the according state is entered, i.e. after any transitions > 1352 from previous state. > > [nit] Suggestion> > "On Entry" actions at an FSM state are performed right before the > corresponding state is entered, i.e. after any transitions from a > previous state. > > > 1354 "On Exit" actions are performed every time and immediately when a > 1355 state is exited, i.e. before any transitions towards target state > are > 1356 performed. > > [nit] Suggestion> > "On Exit" actions are performed immediately when a state is exited, > i.e. before a transition towards a target state is performed. > > > 1358 Any attempt to transition from a state towards another on reception > 1359 of an event where no action is specified MUST be considered an > 1360 unrecoverable error. > > [major] Any type of action, or are you referring to an "on exit" one? > > > [major] "MUST be considered an unrecoverable error" What is the result of > that? Should the adjacencies be reset, the rift process restarted, the > interfaces shut down, etc.?? There's no other place in the document that > mentions "unrecoverable". > implementation dependent. It's pretty much the same as the protocol throwing a "core" or memory corruption. From that point behavior is unspecified but we want it to be considered something that should not attempt a recovery. In most stark terms don't even start RIFT again but we can't mandate any such thing. It's implementation guidelines. > > > 1362 The FSMs and procedures are normative in the sense that an > 1363 implementation MUST implement them either literally or an > 1364 implementation MUST exhibit externally observable behavior that is > 1365 identical to the execution of the specified FSMs. > > [major] How can you tell the difference? rfc2119 keywords "MUST only be > used where it is actually required for interoperation". In this case there > are options, and there is no way to verify one or the other as long as the > implementations "exhibit externally observable behavior that is > identical". IOW, I am not comfortable with using MUST if it is not > required. Also, I believe the statement to generally be true for any > specification and not needed. > > > > Suggestion (borrowing from rfc4271): > The data structures and FSMs described in this document are conceptual > and do not have to be implemented precisely as described here, as long > as the implementations support the described functionality and exhibit > the same externally visible behavior. > ack, thanks. that's good. great suggestion. > > > 1367 Where a FSM representation is inconvenient, i.e. the amount of > 1368 procedures and kept state exceeds the amount of transitions, we > defer > 1369 to a more procedural description on data structures. > > [] This paragraph can also be eliminated. > ack > > 1371 4.2.1. Transport > > 1373 All packet formats are defined in Thrift [thrift] models in > 1374 Appendix B. > > [] Can we get an index/TOC or some type of guide in the appendix to make > it easier to find specific parts? For example, finding where a LIE is > defined is not straight forward -- I happened to stumble on it in B.2. > Another option would be to point in §4.2.2 to look for LIEPacket in B.2. > yes, as we said there will be a more extensive intro and a reader's guide. And I'll refer to model & according structure. > > [major] It must be stated somewhere that the contents of Appendix B are > Normative. > ack > > 1376 The serialized model is carried in an envelope within a UDP frame > 1377 that provides security and allows validation/modification of > several > 1378 important fields without de-serialization for performance and > 1379 security reasons. > > [minor] This is a very short section -- please put references to other > places where these topics are explained more, if any. > aha, ok. I probably expound the section a bit as in "this MD5 protects this and then this MD5 on top protects this" so e'thing is completely convered. > > > 1381 4.2.2. Link (Neighbor) Discovery (LIE Exchange) > > 1383 RIFT LIE exchange auto-discovers neighbors, negotiates ZTP > parameters > 1384 and discovers miscablings. It uses a three-way handshake mechanism > 1385 which is a cleaned up version of [RFC5303]. Observe that for > easier > 1386 comprehension the terminology of one/two and three-way states does > 1387 NOT align with OSPF or ISIS FSMs albeit they use roughly same > 1388 mechanisms. The formation progresses under normal conditions from > 1389 one-way to two-way and then three-way state at which point it is > 1390 ready to exchange TIEs per Section 4.2.3. > > [minor] "...which is a cleaned up version of [RFC5303]." There's no need > to hint at the fact that other implementations/protocols may not be as > good/clean as rift. Please take this part of the sentence out. > sure. it was not meant as derogatory, if anything, RIFT learned from all the lessons ISIS and OSPF went through and had to bolt onto stuff while preserving backward compatiblity. I will remove and as I said, basically remove any reference to ISIS/OSPF hello FSMs. > > [minor] "Observe that for easier comprehension the terminology of one/two > and three-way states does NOT align with OSPF or ISIS FSMs albeit they use > roughly same mechanisms." I don't know how changing everything I know > makes comprehension easier. ;-) Seriously: we don't need to tell the > reader what is not used. > ack. When writing this I was mostly thinking about ISIS implenter looking @ this and thinking "hmm, how does that compare?". Per above. > > 1392 LIE exchange happens over well-known administratively locally > scoped > 1393 and configured or otherwise well-known IPv4 multicast address > 1394 [RFC2365] and/or link-local multicast scope [RFC4291] for IPv6 > 1395 [RFC8200] using a configured or otherwise a well-known destination > 1396 UDP port defined in Appendix C.1. LIEs SHOULD be sent with an IPv4 > 1397 Time to Live (TTL) / IPv6 Hop Limit (HL) of 1 to prevent RIFT > 1398 information reaching beyond a single L3 next-hop in the topology. > 1399 LIEs SHOULD be sent with network control precedence. > > [] The first sentence is long and convoluted. I think I understood that > the assigned destination IP address and UDP port are used, but that they > also can be configured. Is that right? > yes, will simplify > Suggestion> > > LIEs are exchanged over the well-known multicast addresses and UDP > port, as summarized in Appendix C.1. An implementation MAY allow > for the local configuration of these parameters. > > > [] If configured, do both sides need to be configured beforehand or is > there a discovery mechanism? > it's like OSPF, both have to agree on multicast & port which is basically a protocol constant and then rest just comes up. An implementon could indiscriminately enable RIFT on all interfaces but some may choose to let user configure on which interfaces to run RIFT. The multicast/port being variable is super handy for multi-instance things. Comes for free baiscally. > > > [major] "LIEs SHOULD be sent with..." When is it ok to not use these > values? IOW, why is this action recommended and not required? (The next > comment is related.) > > > [minor] "IPv4 Time to Live (TTL) / IPv6 Hop Limit (HL) of 1" Was GTSM > (rfc5082) considered? Several documents (for example, rfc8085 and > draft-ietf-opsec-v6) suggest its use. You may be asked about it later. It > would be good to preempt those questions by adding some text in the > Security Considerations about any risks...or using it. > There is a religious war about 255 vs. 1 since a long time. I personally in deployment was hurt @ least twice by implementations ending up not checking TTL/misprocessing and forwarding one-hop packets with 254 on unicast/multicast. Also, having a buggy FIB looping tightly packets with TTL 255 is seldom fun. I never saw a thing not decrementing TTL and dropping on 1. I prefer the 1 with possibly a multi-hop spoof (never saw that in my life frankly, such an attack is really, really hard to engineer) than a rogue packet running away in the network or micro looping 255 times on fast links. What I saw were replay attacks and against those RIFT is extremely well protected as far it's practically implementable. Plus, if one _really_ wants security one configures adjacency keys (and RIFT allows a full plethora of that) rather than rely on the 255 hack to have "kind of a little lick of security". So if the security folks push on it we can always go to 255 or write that the TLL received MUST be either 255 or 1 to satisfy both camps. > > > [major] "LIEs SHOULD be sent with network control precedence." When is it > ok to not do so? IOW, when is this behavior recommended and not required. > BTW, please add a reference. > hard to implement often that's why it's not MUST. will add ref to precedence. > > > 1401 Originating port of the LIE has no further significance other than > 1402 identifying the origination point. LIEs are exchanged over all > links > 1403 running RIFT. > > [nit] s/Originating port/The originating port > > > 1405 An implementation MAY listen and send LIEs on IPv4 and/or IPv6 > 1406 multicast addresses. A node MUST NOT originate LIEs on an address > 1407 family if it does not process received LIEs on that family. LIEs > on > 1408 same link are considered part of the same negotiation independent > of > 1409 the address family they arrive on. Observe further that the LIE > 1410 source address may not identify the peer uniquely in unnumbered or > 1411 link-local address cases so the response transmission MUST occur > over > 1412 the same interface the LIEs have been received on. A node MAY use > 1413 any of the adjacency's source addresses it saw in LIEs on the > 1414 specific interface during adjacency formation to send TIEs. That > 1415 implies that an implementation MUST be ready to accept TIEs on all > 1416 addresses it used as source of LIE frames. > > [major] "An implementation MAY listen and send LIEs on IPv4 and/or IPv6 > multicast addresses. A node MUST NOT originate LIEs on an address family > if it does not process received LIEs on that family." > > The first sentence says that it is optional to listen and send LIEs over > either IPv4/IPv6 (IOW, a router doesn't have to do either). > yes > The second one gives the impression that originating a LIE on a specific > AF means that the router is willing to only receive/process LIEs on it > (e.g. if an IPv6 LIE is not generated then there's no positive indication > that the router can receive/process IPv6 LIEs). The combination results in > a potential lock at startup: > yes, it does. we could probably say that if one listens on an AF one should also send on it. But it's far trickier than that in real life. Someone can configure the protocol to listen on the AF (and on some platforms it's one socket for all interfaces). But then a specific interface may have an AF configured but e.g. have martians so you don't want to send that out. So you end up listening on an AF but not sending on it on a certain interface. You also have the issue of passive interfaces you neither listen nor send. So, generally I prefer this flexiblity the spec has now since practically, all kind of combinations arise (add v4ov6 to the whole thing) & that's also why we have AFs carried on the link information as well to be able to figure out during SPF/from topology what AFs are on and detect discontinuity if so desired or even allow to fwd. v4 normal then jump a v4ov6 and get there. > For example, if a router chooses to only listen on IPv4 (MAY = optional) > and it's neighbor on IPv6, then they will be sending LIEs and never talk to > each other. > > I'm assuming that the intent is to allow for maximum flexibility: allow to > send/receive on either (or both). But, what if only one AF is supported? > Should an implementation be capable to control which AF is used? If I have > an IPv6-only deployment, should the router be expected to receive LIEs over > IPv4? > Per above, v4ov6 is absolutely desired and built in. So you can build v6 _and_ v4 and implementaiton can use either nexthop or you just send v6 but indicate that you support v4ov6 on LIE in ` v4_forwarding_capable ` (how would you know otherwise?). Add to that different chipsets struggling with v6ov4 in different forms and as you see, this flexibility is necessary and yes, if you're in a config where you listen on AF and not send or send but not listen the AF will not come up and if AFs are mismatched on both sides nothing will come up. Realities of deployment and that's e.g. why we carry "miscabled links". Such stuff could be flagged to detect one-side broken links (which are a fact of life anyway due to some very funky HW failures). > > [nit] s/LIEs on an address family/LIEs using an address family > > > [nit] s/LIEs on same link/All LIEs on a link > > > [minor] "part of the same negotiation" Negotiation of what? The text at > the start of this section mentions that LIEs negotiate ZTP parameters -- is > that it? Other actions don't seem to be negotiations: discovery, for > example. I'm not sure if something like "session" would capture the > intended meaning. > well, it's not really a session. yes, I'll describe better what is negotiated. "Negotiation" is also not a good thing, it's really a distributed, unsynchronized state diffusion via a front (basically like RIP) but that does not read well ;-) Described in ZTP section. > > [major] "MAY use any of the adjacency's source addresses it saw in > LIEs...to send TIEs." This use of "MAY" makes using the source address > from a LIE optional. §4.2.3.3 says that "TIEs...using the destination > address on which the LIE adjacency has been formed", which doesn't make the > use optional. > > OLD> > A node MAY use any of the adjacency's source addresses it saw in > LIEs on the specific interface during adjacency formation to send > TIEs. > > NEW> > A node may use any of the adjacency's source addresses from the > LIEs received on the specific interface during adjacency formation > to send TIEs (Section 4.2.3.3). > > [See related question in 4.2.3.3.] > ack > > 1418 A three-way adjacency over any address family implies support for > 1419 IPv4 forwarding if the `v4_forwarding_capable` flag is set to true > 1420 and a node can use [RFC5549] type of forwarding in such a > situation. > 1421 It is expected that the whole fabric supports the same type of > 1422 forwarding of address families on all the links. Operation of a > 1423 fabric where only some of the links are supporting forwarding on an > 1424 address family and others do not is outside the scope of this > 1425 specification. > > [minor] "three-way adjacency" I know there's a definition in the > terminology section, but it is not a specification of what a three-way > adjacency is. Please add a forward reference to where it is specified. > ack > > > [major] "`v4_forwarding_capable` flag" (This comment is not specific to > this flag.) > > The specific LIE packet format has not been presented, introduced or even > referenced up to now. Pointing to a flag, which happens to be optional in > an optional part of the LIEPacket, comes out of the blue and feels > completely out of place without prior explanation. > I can try to describe all the different scenarios of v4/v6/v4ov6/mix but fact is, it will read more and more confused so I try to keep it fairly dry and short. It's obvious when one deploys/implements. ``` <t>A three-way adjacency over any address family implies support for IPv4 forwarding if the `v4_forwarding_capable` flag is set to true and a node can use <xref target="RFC5549"/> type of forwarding in such a situation. It is expected that the whole fabric supports the same type of forwarding of address families on all the links. Operation of a fabric where only some of the links are supporting forwarding on an address family and others do not is outside the scope of this specification. </t>``` reads completely clear to me with ref to 5549 so what am I supposed to add that will clarify it further without muddying things. > > I can see that some of the fields are described in Appendix B, but there > should at least be a pointer to that. > > I also expect a description somewhere of expectations and error > conditions. For example, see my comments in the appendix related to > LinkCapabilities. > > > [major] "a node can use [RFC5549] type of forwarding" > > First of all, rfc5549 is a BGP-specific RFC. How do the BGP mechanisms > defined there (and in rfc8950) map to RIFT? IOW, please specify how RIFT > works. > > Because you listed rfc5549 as a Normative reference, I assume you want > this behavior to be normative. s/can use/MUST/SHOULD/MAY ?? > Sigh, Alvaro, is it really the job of RIFT to start describe forwarding path all major chip vendors implemented? I do NOT think it's a wise idea. It refers to RFC5549 because the forwarding behavior is ultimately precisely the same but I don't want to write models of forwarding paths, ARP tables etc. The protocol spec describes all the control plane necessary to interop properly, it does not explain LPM or v4ov6 forwarding or any such thing. I would expect some more guidance here but again, this is not an implementation primer for a DC switch including fwd path, it's a control protocol spec. > > [major] "It is expected that the whole fabric supports the same type of > forwarding of address families on all the links. Operation of a fabric > where only some of the links are supporting forwarding on an address family > and others do not is outside the scope of this specification." > > Please remind me, is IPv4 optional or required? I'm assuming that IPv6 is > required and IPv4 optional, but I can't find anything definitive in the > document. > neither is required. RIFT will work with any combination (v4 only, v6 only, v4 and v6, v4 over v6 only). If neither is implemented then there is no forwarding obviously. LIEs basically exchange enough info over both AF and the 'v4 capable' flag to negotiate _any_ viable combination. So what do you want me to say? People deploy all of the above variants & are religious about it. Why would I say "v6 is required" if protocol works just fine with v4 only. And why have "v4 required" if it works fine with v6 only as well. > > Even if IPv4 is optional, I'm assuming that most deployments will support > both. But as time goes on that may (slowly) change to IPv6-only. Is that > a fair assumption? > nope. read above. > > Because the ability to support IPv4 is signaled per link > (v4_forwarding_capable), then the expectation that "the whole fabric > supports the same type of forwarding of address families" may be "easily" > broken if one link doesn't support IPv4 (including the case where a rogue > node may not advertise v4_forwarding_capable to mess things up). What > should a node that detects an inconsistency (e.g. some links use > dual-stack, but others only one AF) do? I'm assuming that it SHOULD/MUST > (?) go as far as disabling the adjacencies as described below, right? > Please be explicit and specific on the actions to be taken. > I will add "such a thing is outside the scope of this document". this is not a deployment guideline, we have day one books for it, we have applicability spec. You cannot figure that out BTW, a fabric may partition along AFs supported/capabilities and unless you run reachabiliuty per AF the fabric will blackhole likely. Running computona per AF is a very bad idea, partitioned (partially connected) networks are operationally a nightmare. I learned that the hard way in MT years ago. Applicability draft may want to loose a word or two about it. > > 1427 The protocol does NOT support selective disabling of address > 1428 families, disabling v4 forwarding capability or any local address > 1429 changes in three-way state, i.e. if a link has entered three-way > IPv4 > 1430 and/or IPv6 with a neighbor on an adjacency and it wants to stop > 1431 supporting one of the families or change any of its local addresses > 1432 or stop v4 forwarding, it has to tear down and rebuild the > adjacency. > 1433 It also has to remove any information it stored about the adjacency > 1434 such as LIE source addresses seen. > > 1436 Unless ZTP as described in Section 4.2.7 is used, each node is > 1437 provisioned with the level at which it is operating. It MAY be > also > 1438 provisioned with its PoD. If any of those values is undefined, > then > 1439 accordingly a default level and/or an "undefined" PoD are assumed. > 1440 This means that leaves do not need to be configured at all if > initial > 1441 configuration values are all left at "undefined" value. Nodes > above > 1442 ToP MUST remain at "any" PoD value which has the same value as > 1443 "undefined" PoD. This information is propagated in the LIEs > 1444 exchanged. > > [major] There are references made in this paragraph to things that are > defined elsewhere (""any" PoD"..."propagated in the LIEs"). Please put > pointers to where these things are defined. As mentioned above, the LIE > format hasn't been presented yet. > ack, yes LIE will be fwd referenced with according fields etc. > > The text above gives the impression that the type of the PoD is somehow > related to the level, so I went in search of that and found this in the > LIEPacket: > > struct LIEPacket { > ... > /** Node's PoD. */ > 7: optional common.PodType pod = > common.default_pod; > > ...but then I can't find anything that corresponds to PodType matching the > values above: undefined, any... > > > ...but I did find this: > > /** Common RIFT packet header. */ > struct PacketHeader { > ... > /** Level of the node sending the packet, required on everything > except LIEs. Lack of presence on LIEs indicates UNDEFINED_LEVEL > and is used in ZTP procedures. > */ > 4: optional common.LevelType level; > } > > ...and a couple of constants: > > const LevelType top_of_fabric_level = 24 > const LevelType leaf_level = 0 > const LevelType default_level = leaf_level > > If ZTP is not used, are there operational/deployment considerations for an > operator to provision the levels? Should they simply start counting from > 0, is incrementing by one good enough or should it be by 2? Or maybe there > are considerations about the use of ZTP? Please take a look at rfc5706. > Alvaro, those are applicability draft questions, not protocol spec. Let's get a spec out then we can improve the applicability draft. On this note, LIE rules will e.g. prevent you from numbering in 2 steps. Also leaf-level is described in LIE rules as pretty special so if you read those it is clear you cannot number ToF with 0 or things with -1 or other ideas that may come up. > > > [minor] "It MAY be also provisioned with its PoD." What does this mean? > Does it mean that all the nodes in the PoD have the same level? ... > I don't understand how PoD and level can be confused. The glossary is super explicit about those things. There is never anywhere anything that somehow implies PoD numbering has anything to do with level. PoD checking rules are in the LIE processing rules. > > > [major] BTW, this reminds me, what are the manageability considerations > related to RIFT? Among other things, it would be nice to have a summary of > parameters that are expected to be configured and their defaults. Please > take a look at rfc5706. > Alvaro, since when are manageability considerations part of the base protocol spec? we have yang model spec, we have applicability spec, we can have a manageablity spec, fine with me. Base spec is here to say enough to build an interoperable implementation. That has been proven by a clean room open source that inter'oped day one. > > 1446 Further definitions of leaf flags are found in Section 4.2.7 given > 1447 they have implications in terms of level and adjacency forming > here. > > [] "leaf flags" What are those? I think this is the first mention. > yeah, I'll refer forward to model. > > [] Suggestion> > Further definitions of leaf flags are found in Section 4.2.7. > > > 1449 A node tries to form a three-way adjacency if and only if > > [nit] s/tries to form/starts to form > > > [major] "if and only if" Can this be translated into a Normative > statement? Are all these required (MUST) before the formation of a 3-way > adjacency started? > I do NOT want to touch the LIE rules, those have been ironed over 2 implementations super carefully and held. mathematicians use IIF religiously, good enough for me. Beside me @ least 2 or 3 people implemented those rules correctly (based on interop) already so the description clearly works. > > > 1451 1. the node is in the same PoD or either the node or the neighbor > 1452 advertises "undefined/any" PoD membership (PoD# = 0) AND > > [minor] "the node is in the same PoD" As what? I guess you mean in the > same PoD as what a neighbor indicated in its LIE, right? > as "its neighbor", reads clear in context to me but I'll add. > > > 1454 2. the neighboring node is running the same MAJOR schema version > AND > > [] The formats have not been introduced yet. > > I peeked into Appendix B and found the PacketHeader. As with #1, you're > assuming a LIE packet has been received from the neighbor, but that is not > mentioned. > > ok, I'll fwd' reference. > > [] The versioning system has not been introduced. Please put a reference > to it. > Ok, I'll fwd reference. > > > [nit] This is the only place where "MAJOR" (all caps) is used. Please be > consistent. > ack > > [minor] Are there any requirements related to the minor version? > the schema versioning does describe that in excruciating details @ beginning of appendix B. I'll fwd reference > > > 1456 3. the neighbor is not member of some PoD while the node has a > 1457 northbound adjacency already joining another PoD AND > > [nit] s/not member of some PoD/not a member of the same PoD > > > [major] "not [a] member of [the] some PoD" #1 says that the "node is in > the same PoD" -- it looks like there's some context missing -- I guess > about the ability to join other PoDs. Is this only a northbound > characteristic? > no, there is no such thing. You can basically configure PoD ooptionally and that's it if you choose. That gives protection against cross-PoD miswiring, there is nothing more here. > > > 1459 4. the neighboring node uses a valid System ID AND > > [minor] "System ID" and "SystemID" are both used. Please pick one. > ack > > > [major] What is a "valid System ID"? The only clue that I found is in > §4.2.7.2, which makes we wonder whether it is specific to ZTP or not (??): > > RIFT nodes require a 64 bit SystemID which SHOULD be derived as > EUI-64 MA-L derive according to [EUI64]. The organizationally > governed portion of this ID (24 bits) can be used to generate > multiple IDs if required to indicate more than one RIFT instance." > > If this is a recommendation (SHOULD), and assuming that it applies beyond > ZTP, what is a "valid System ID"? > > > I also only found one place that talks about an IllegalSystemID, but that > is probably not the same as an invalid one: > > /** 0 is illegal for SystemID */ > const SystemIDType IllegalSystemID = 0 > yes, that's it. I'll fwd reference > > > 1461 5. the neighboring node uses a different System ID than the node > 1462 itself > > [minor] Is there a way to resolve/address duplicate System IDs? It seems > to me that a rogue node may reflect the sender's System ID and prevent an > adjacency from forming (along with a number of other values). > there is a million way a rogue node can prevent adjacency from forming. I would describe it as "implementation bug". This is actually quite commonly used to detect self-looped links BTW and a good implementaiton will flag miscabling. System IDs are 8 bytes and use Chassis MAC basically. If we are concern4ed about duplicate System IDs we should first solve duplicate chassis MAC problem. Realistically, only time I saw duplicate system ID was on buggy virtualization open-source that happily reused MACs. That caused enough other problems RIFT wasn't a problem. What to do in such stiuation is not protocol spec, it's again either applicability or deployment guide or something else. > > > [minor] Also, I hope that the selection/assignment of the System ID is > discussed elsewhere. A pointer might be nice. > section " Automatic SystemID Selection" tells _precisely_ how system ID should be formed. But anything implementation feels confident won't collide is fine by the protocol. > > 1464 6. the advertised MTUs match on both sides AND > > [] Ahhh...this in in the LIEPacket too. > well, the rules are detailed, the schema is here. I don';t think introducing e'thing in all the detail upfront in the LIE section would beneift any clarity of the document. There is a flow where lots of stuff may need a 2nd read or becomes clear on implementation. We tried for a spec that reads bit superficial but well on first run but not for one that on a single read makes someone a RIFT implementation guru. Very much like BIER really. RIFT is quite a lot to digest to implement albeit for the user of it it's a one-button-protocol. > > [major] > struct LIEPacket { > ... > /** Layer 3 MTU, used to discover to mismatch. */ > 4: optional common.MTUSizeType link_mtu_size = > common.default_mtu_size; > > If MTUSizeType is not included, where is it specified that the > default_mtu_size must be considered? Without that specification it is not > possible to satisfy this condition. > all in schemas /** @note MUST be interpreted in implementation as unsigned */ typedef i32 MTUSizeType /** default MTU link size to use */ const MTUSizeType default_mtu_size = 1400 encoding model includes common model > > 1466 7. both nodes advertise defined level values AND > > [major] The advertisement of LevelType in PacketHeader is optional. That > means that even if there is a default value a node may not "*advertise* > defined level values". Also, there are only 2 level values defined: > > const LevelType top_of_fabric_level = 24 > const LevelType leaf_level = 0 > > What am I missing? > its own section explains ZTP algorithm in detail, advertisement etc It's impossible to describe in LIE processing section ZTP already, it's all a big muddle then. Level is optional because a ZTP node coming up has no level (except ToF) so it cannot show anything. Then ZTP starts to propagate, people get level offers, tie-break them, start to pick up levels & see whether that converges and so on. > > > 1468 8. [ > > 1470 i) the node is at level 0 and has no three way adjacencies > 1471 already to nodes at Highest Adjacency Three-Way level (HAT > as > 1472 defined later in Section 4.2.7.1) with level different than > 1473 the adjacent node OR > > [nit] s/three way/three-way/g > ack > > > ... > 1478 iii) both nodes are at level 0 AND both indicate support for > 1479 Section 4.3.8 OR > > [minor] "support for Section 4.3.8" Is there a name for that? How is it > indicated? §4.3.8 says that a leaf can "advertise the LEAF_2_LEAF flag in > its node capabilities"; there is nothing called "LEAF_2_LEAF" in the > schema, but "leaf_only_and_leaf_2_leaf_procedures" does show up. Please be > consistent in the terminology. [Also, there are places where "leaf-2-leaf" > is used.] > yes, leaf-2-leaf procedures but it's a mouthful and it's not precise either since it's also leaf-only leaf-2-leaf is basically a common term for two flags in ` HierarchyIndications ` called leaf_only_and_leaf_2_leaf_procedures leaf_only whereas first includes second (it's a complex relationship we modelled like this) hence section 4.3.8 is bascially " anything around leaf only or leaf-2-leaf or both" So, I think refering to the section is still least confusing thing. Scheam uses leaf_2_leaf because of thrift id rules but people prefer leaf-2-leaf in the document so far. > > ... > 1486 The rules checking PoD numbering MAY be optionally disregarded by a > 1487 node if PoD detection is undesirable or has to be ignored. This > will > 1488 not affect the correctness of the protocol except preventing > 1489 detection of certain miscabling cases. > > [nit] s/rules checking/rules for checking > > > [nit] s/MAY be optionally/MAY be (redundant) > > > [minor] What are that the "rules [for] checking PoD numbering"? Where are > they specified? > in the LIE processing rules you mentioned before. > > [major] "MAY be optionally disregarded by a node if PoD detection is > undesirable or has to be ignored." > > This is the only place where "PoD detection" is used. What is it? Where > is it specified? Why would be it undesirable? When would it have to be > ignored? Are there operational considerations for making that decision? > in the LIE processing. you check whethe the other side matches your PoD or either one is ANY_POD=0 and that's it. No further magic. > > > [major] "...except preventing detection of certain miscabling cases." > Where is miscabling detection described? Which cases may not be covered? > §4.2.7.6 says that "internal ruleset flags a possible miscabling", but > that is the closest that I came to finding a description (and rulesets are > not mentioned anywhere else). > miscabling computation is not part of the protocol spec. We allow to indicate "miscabling" but what is declared miscabling is impossible to specify. Peoiple do ALL kind of things in fabric, e.g. when your link starts to loose 5% some folks declare link "faulty" or "miscabled". There are no rules. An implementation can declare miscabling and do whatever it wants with the adjacency (probably hold down) and then it can put it into /** If any local links are miscabled, the indication is flooded. */ 10: optional set<common.LinkIDType> miscabled_links; so you can look @ the ToF LSDB and see your whole fabric with links with problems. It's pretty well liked. > > > 1491 A node configured with "undefined" PoD membership MUST, after > 1492 building first northbound three way adjacencies to a node being in > a > 1493 defined PoD, advertise that PoD as part of its LIEs. In case that > 1494 adjacency is lost, from all available northbound three way > 1495 adjacencies the node with the highest System ID and defined PoD is > 1496 chosen. That way the northmost defined PoD value (normally the ToP > 1497 nodes) can diffuse southbound towards the leaves "forcing" the PoD > 1498 value on any node with "undefined" PoD. > > [nit] s/building first...adjacencies/building its first...adjacency > > > [nit] s/to a node being in a defined PoD/to a node in a defined PoD > > > [minor] "...defined PoD is chosen." ...chosen as the PoD value to > advertise. (or somehting along those lines) > > > [nit] s/northmost/northernmost > > > [] I'm missing how using the System ID to select will always result in the > north direction... > > > [minor] "...diffuse southbound towards the leaves "forcing" the PoD value > on any node with "undefined" PoD." An example of this would be nice. > > > > 1500 LIEs arriving with IPv4 Time to Live (TTL) / IPv6 Hop Limit (HL) > 1501 larger than 1 MUST be ignored. > > [major] "MUST be ignored" The specification of the sender (§4.2.2) says > that "LIEs SHOULD be sent with an IPv4 Time to Live (TTL) / IPv6 Hop Limit > (HL) of 1". This is a Normative conflict because it is only recommended to > the sender, but required by the receiver. > aha, yes, very good. I'm reluctant to say MUST since it's not always possible on a platform to force TTL (this is UDP). So I make it SHOULD on both sides. your comment on the 255/1 stands of course. > > > 1503 A node SHOULD NOT send out LIEs without defined level in the header > 1504 but in certain scenarios it may be beneficial for trouble-shooting > 1505 purposes. > > [] See the related comments in #7 above. > > > [nit] s/trouble-shooting/troubleshooting > > > [major] #7 above makes advertising defined values a requirement to forming > a three-way adjacency, but it is only recommended here. Unless there are > specific troubleshooting cases described, we should retain the > requirement. I believe that a packet can be crafted in any way for > troubleshooting/testing, but that doesn't have to be indicated in a > specification which should deal with the correct operation of the > protocol. IOW, I think we can do without this paragraph. > ack > > > 1507 4.2.2.1. LIE FSM > > 1509 This section specifies the precise, normative LIE FSM and can be > 1510 omitted unless the reader is pursuing an implementation of the > 1511 protocol. > > [nit] s/and can be omitted..../and can be omitted. > > > [] It would be very nice if there was an upfront description (at least a > list) of the different elements: states, events, etc.. to make reading the > diagrams easier. Also, the terminology (PUSH, etc.) should also be upfront. > ok, I'll add a bit. > > > 1513 Initial state is `OneWay`. > > 1515 Event `MultipleNeighbors` occurs normally when more than two nodes > 1516 see each other on the same link or a remote node is quickly > 1517 reconfigured or rebooted without regressing to `OneWay` first. > Each > 1518 occurrence of the event SHOULD generate a clear, according > 1519 notification to help operational deployments. > > [] This event is the only one described. Is there anything special about > it? Is this the only one for which generating a notification is > recommended? Is that why it is described here? > yes, rest is really vanilla. > > > [minor] "occurs normally when..." Are there other conditions (which are > not considered "normal")? > yes, weird ones like e.g. a node reboots & changes system ID and all of a sudden it looks like 3 nodes on a link (which is hinted). This is all too far for a protocol spec, those are day one/applicability things if one wants to go into detailed details of all the sceanrios. > > > [major] "more than two nodes see each other on the same link" The > description below is different (and I think more accurate): "more than one > neighbor seen on interface". Also, it doesn't include the second part > about a "reconfigured or rebooted" node. What is the characteristic of > that behavior? > ack. > > > [major] "SHOULD generate a clear, according notification to help > operational deployments." Normatively, what is "clear"? Is there > information that you have in mind that should be included to make it > "clear"? I assume that a notification is a message logged on the node, is > that true? > > Suggestion> > ...SHOULD log a message including the System IDs of each node. > > (Anything else?) > logging is the old days. today people want telemetry, yang, open config, splunk channel, dancing bears, hence the "notification" > > > [] This seems to be the only occurrence of a log resulting from a message > or other action. I think I'm missing the significance of this event. > yeah, this is the only time people _really_ wanted a notification. And the problem is that it's hard to declare as "miscabling" because the link flaps up and down persistently, people back off, try again so the link is up for a bit and not miscabled etc. Hence the "notification" is really kind of best one can do AFAIS. > > > [Ok. This is where I am in my reading. There are more comments below.] > > > ... > 2069 4.2.3.3. Flooding > ... > 2082 As described before, TIEs themselves are transported over UDP with > 2083 the ports indicated in the LIE exchanges and using the destination > 2084 address on which the LIE adjacency has been formed. For unnumbered > 2085 IPv4 interfaces same considerations apply as in equivalent OSPF > case. > > [major] (Related to a comment in 4.2.2.) "using the destination address on > which the LIE adjacency has been formed" If multiple addresses are used > (from different AFs), which one is selected? Is there a "primary" address > that is considered when the adjacency is formed? > very good catch. No, the spec does not give the rules. An implementation will pick what it likes and the other side has to follow because it sees it as LIE source address. Trying to start to build "prefrences" is a futile exercise. everybody has a different opinion what is "best" address on an interface and then the "unnumbered" wars start ;-) > > > ... > 6165 10.2. Informative References > ... > 6175 [DOT] Ellson, J. and L. Koutsofios, "Graphviz: open source > graph > 6176 drawing tools", Springer-Verlag , 2001. > > [major] Is there an URI that can be used as a stable pointer? > I can try, not that I can think of one. > > > ... > 6362 B.1. common.thrift > ... > 6730 /** Link capabilities. */ > 6731 struct LinkCapabilities { > 6732 /** Indicates that the link is supporting BFD. */ > 6733 1: optional bool bfd = > 6734 common.bfd_default; > 6735 /** Indicates whether the interface will support v4 forwarding. > > 6737 @note: This MUST be set to true when LIEs from a v4 address > are > 6738 sent and MAY be set to true in LIEs on v6 address. If > v4 > 6739 and v6 LIEs indicate contradicting information the > 6740 behavior is unspecified. */ > > 6742 2: optional bool v4_forwarding_capable > = > 6743 true; > 6744 } > > [major] Not knowing thrift, and assuming that this structure is some sort > of enum/list, what should a receiver do if the undefined/unknown value 3 is > received? > the packet won't even be decoded. you'll get an encoder/decoder error, it's all handled internally by thrift. That's the beauty of modelling. 80%+ of problems/worries we bean count manually in protocols just vanish magically. Remember, we had _no_ bugs in first interop of LIEs with a blind 2nd implementation. > > > [major] "contradicting information the behavior is unspecified." How can > the behavior be unspecified if this *is* the specification? More to the > point: if the indication is required in IPv4 LIEs, but only optional in > IPv6 LIEs, why wouldn't the information from the IPv4 LIEs be considered as > the truth? > because _sooo_ many things can go wrong. LIEs on different AFs can be buffered, arrive out of sequence and do a myriad crazy things. We were ruminating for a long time whether we should have one LIE on a single AF but then like I told you there is no "required AF" in RIFT based on customer input, e'one runs what they want on their fabric. And customers prefer AF fate that is not shared. So de facto v4 and v6 run as ships in the night (except the v4 capable v4ov6) and we end up advertising the bfd/v4 capable twice (because again, we don't know wehther we'll have either AF or both on an adjacency). Works fine in practice, an implementation seeing persistent mismatch should probably just declare link mismatch and shut down adjacency but that's really implementation specific. > > > [nit] s/interface will support v4 forwarding/interface supports IPv4 > forwarding/g > > > [nit] s/v4/IPv4/g s/v6/IPv6/g > > > [major] "Indicates that the link is supporting BFD." I was going to ask > if setting bfd to TRUE means that the link can support BFD (is capable) or > if it BFS is actively enabled (supporting), but I found this somewhere else: > > > /** default link being BFD capable */ > const bool bfd_default = true > > ...so I'm assuming that it just means that the link is capable, which > makes sense given that is it in LinkCapablities. > yes, it just shows it's capable. up to implementation do what they want with it just like upstream label, not for RIFT base spec to define IMO. There may be BFD configuration on the system they use, they may do some autoconfig of BFD, again, opinions here are endless given so much BFD work is already done. > > s/Indicates that the link is supporting BFD./Indicates that the link is > BFD-capable./g > > Also, note that §4.3.5 says that "RIFT MAY incorporate BFD", which is not > completely aligned with the default indication of the link being > BFD-capable. Using BFD is different than being BFD-capable, so it is ok > for its use to be optional (§4.3.5). However, there is no stated > requirement for a link to be BFD-capable, as indicated in the default > setting. > > no, it's fully optional. And again, what an implementation does with the information is outside the scope of the spec. I'll add a sentence or two. uff, long one -- tony
- [Rift] AD Review of draft-ietf-rift-rift-12 (Part… Alvaro Retana
- [Rift] Fwd: Re: AD Review of draft-ietf-rift-rift… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Pascal Thubert (pthubert)
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Pascal Thubert (pthubert)
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Tony Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head
- [Rift] RIFT UDP Ports/Multicast Address Allocatio… Alvaro Retana
- Re: [Rift] RIFT UDP Ports/Multicast Address Alloc… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Alvaro Retana
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Antoni Przygienda
- Re: [Rift] AD Review of draft-ietf-rift-rift-12 (… Jordan Head