[babel] Alvaro Retana's Discuss on draft-ietf-babel-rfc6126bis-12: (with DISCUSS and COMMENT)

Alvaro Retana via Datatracker <noreply@ietf.org> Wed, 07 August 2019 13:29 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: babel@ietf.org
Delivered-To: babel@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 77B0F12015F; Wed, 7 Aug 2019 06:29:21 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Alvaro Retana via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-babel-rfc6126bis@ietf.org, Donald Eastlake <d3e3e3@gmail.com>, babel-chairs@ietf.org, babel@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.100.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Alvaro Retana <aretana.ietf@gmail.com>
Message-ID: <156518456148.8400.6644665367614468260.idtracker@ietfa.amsl.com>
Date: Wed, 07 Aug 2019 06:29:21 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/babel/o-yj5D9vXa2Th9orObWDh-heEU8>
Subject: [babel] Alvaro Retana's Discuss on draft-ietf-babel-rfc6126bis-12: (with DISCUSS and COMMENT)
X-BeenThere: babel@ietf.org
X-Mailman-Version: 2.1.29
List-Id: "A list for discussion of the Babel Routing Protocol." <babel.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/babel>, <mailto:babel-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/babel/>
List-Post: <mailto:babel@ietf.org>
List-Help: <mailto:babel-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/babel>, <mailto:babel-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Aug 2019 13:29:22 -0000

Alvaro Retana has entered the following ballot position for
draft-ietf-babel-rfc6126bis-12: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-babel-rfc6126bis/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I really enjoyed reading this document!  Thank you for the work and time that
has gone into it.

However, I don't think that this specification is ready to be published as a
Proposed Standard.  In general, I don't think that the document is clear or
specific enough to be considered in the Standards Track -- that is the main
reason for this DISCUSS.

(A) Clear Defaults and Operational Guidance

While I appreciate Babel's flexibility in terms of the ability to use different
strategies, I believe that both defaults and clear guidance should be provided.
 Given that "not all...strategies will give good results" and that in most
cases these are listed as possible choices, I don't think that this document
"has resolved known design choices" [BCP9/rfc7127].  The cost/metric
computation and route selection specially concern me because I believe that a
robust/clear specification is at the heart of any routing protocol.

In general what I am looking for to resolve this part of the DISCUSS are two
items:

(A1) Clear defaults.  For example, Appendix B talks about constants/default
values.  I would assume that, given the existing experience, that the values
there are probably sensible defaults.  Is that not the case?

(A2) Operational Considerations.  Given that Babel can be (and is) used in
different environments, I would like to see guidance to operators as they
deploy the protocol in their networks.  An example of the type of discussion I
would like to see expanded is: "a mobile node that is low on battery may choose
to use larger time constants (hello and update intervals, etc.) than a node
that has access to wall power" (§1.1).  Consider §2 in rfc5706 (Operational
Considerations - How Will the New Protocol Fit into the Current Environment?).

I believe that both items are important, specially in a protocol as flexible as
Babel.  Some of this guidance could have been included in
draft-ietf-babel-applicability -- but this information is not there either.

(B) Error Handling

Many sections of the document describe functionality, or even Normatively
mandate it, but there is no discussion about Error Handling.

(B1) Router-Id Setting

§4.5:
   o  the current router-id; this is undefined at the start of the
      packet, and is updated by each Router-ID TLV (Section 4.6.7) and
      by each Update TLV with Router-Id flag set.

It took me some time to figure out the reason for being able to carry the
router-id in two different places inside the same packet, which is my
interpretation of the "and" above.  Let me see if I understood:  a packet can
carry multiple updates...updates contain routes that were either originated by
the local node, OR, learned from other routers...the router-id matches the
originator...  So...if a packet carries multiple updates, some locally
originated and some learned, then it is possible for the packet to first
include (for example) a Router-ID TLV (indicating router-id_A), followed by
some Update TLVs (without the R-bit set), than then some other Update TLVs
(with the R-bit set)...

Did I understand correctly?  If so, I think there are significant pieces of
this operation that are not clearly specified in the document.  There is
mention of the effect of the Router-ID TLV (or the Update TLV w/R=1) on
subsequent Update TLVs...there is an very subtle hint (for my taste) in §4.5
(Parser state) about the state learned for each packet from those TLVs...but
there is no explicit text that talks about the need for strict ordering when
sending and later when processing...it is all simply implied.

What should happen if no Router-Id has been defined?  For example, an Update (R
= 0) is received but no Router-ID TLV is present...  What if the Router-ID TLV
is present, but *after* the Update?  There are many possible combinations...

(B2) Default Prefix

Similar comments as above...  "P (Prefix) flag...establishes a new default
prefix for subsequent Update TLVs with a matching address encoding within the
same packet" (§4.6.9).   What if an update with an AE that allows compression
is received *before* the one that sets the new default prefix?

(B3) Next Hop

§4.6.9:

   The next-hop address for this update is taken from the last preceding
   Next Hop TLV with a matching address family (IPv4 or IPv6) in the
   same packet even if it was otherwise ignored due to an unknown
   mandatory sub-TLV; if no such TLV exists, it is taken from the
   network-layer source address of this packet.

What if the Next Hop TLV doesn't exist and the network-layer doesn't correspond
to the address family in the Update?  For example, let's say IPv6 is used as
the network-layer protocol and the Update contains IPv4 prefixes...

(B4) For the Normative behavior listed here (I may have missed other
instances), I have basically the same question: what should a receiver do if it
is not the case?

- §3.8.1.2: "A node MUST NOT increase its sequence number by more than 1 in
response to a seqno request."

- §4: "A Babel packet MUST be sent as the body of a UDP datagram, with
network-layer hop count set to 1..."

- §4.6.9: "If the metric is finite, AE MUST NOT be 0.  If the metric is
infinite and AE is 0, Plen and Omitted MUST both be 0."

- §4.6.10: "...if AE is 0 (in which case Plen MUST be 0 and Prefix is of length
0)."

- §4.6.10/§4.6.11: Is AE 3 a valid value in a request?  I assume it isn't. 
What should a receiver do if AE = 3.

(C) Mandatory Bit

§4.4: "The most-significant bit of the sub-TLV, called the mandatory bit..." 
The most significant bit of which part of the sub-TLV?  As written, that bit
would be the first one in the Type, which corresponds to the text in the IANA
section.  Please be specific.

In the IANA considerations section, please include the whole registry in the
table to avoid confusion.

Note that because of the mandatory bit, the 128-239 range should be
Reserved...but it is currently marked as Unassigned.  Even worse, value 128 is
assigned already [draft-ietf-babel-source-specific].  The impact may not be too
bad because I doubt that Pad1 would need to be mandatory, but it at least
causes confusion and inconsistency, and (as currently specified) there would be
no way to differentiate between Pad1 and the Source Prefix sub-TLV.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

(a) §3.1 introduces the term "urgent TLVs".

(1) It might be a good idea to explicitly mention/list which are these TLVs. 
There are some references in subsequent sentences, which may or may not be
enough for most readers.

(2) There are some Normative actions applied to them, for example "MUST be sent
in a timely manner".  While the intent may be ok, the Normative enforcement of
"in a timely manner" is not clear at all -- how do you comply with that? 
Appendix B says:

   The amount of jitter applied to a packet depends on whether it
   contains any urgent TLVs or not (Section 3.1).  Urgent triggered
   updates and urgent requests are delayed by no more than 200ms;
   acknowledgments, by no more than the associated deadline; and other
   TLVs by no more than one-half the Multicast Hello interval.

I think it would help if this text is moved to §3.1 to make it explicitly clear
what a timely delay is...and the text was changed to (something like) "MUST NOT
be delayed more than 200ms".

(b) §3.2.2: "SHOULD NOT increment its sequence number (seqno) spontaneously" 
When it is ok to increase the seqno spontaneously?  IOW, why not use MUST NOT? 
I think it would be better if there was a clear indication of when the seqno is
increased.  Scanning the rest of the document, it seems that those indications
are in place.

(c) There seems to be no specific explanation of how the timers are handled,
what happens when they expire, etc..  For example, §3.2.4 includes this text:

   There are three timers associated with each neighbour entry -- the
   multicast hello timer, which is initialised from the interval value
   carried by scheduled Multicast Hello TLVs, the unicast hello timer,
   which is initialised from the interval value carried by scheduled
   Unicast Hello TLVs, and the IHU timer, which is initialised to a
   small multiple of the interval carried in IHU TLVs.

But there is no explanation (that I could find) about how to manage those
timers.  The only place where hello timers are mentioned is in Appendix
A.1...but that is just an example.

(d) §3.7.2 includes two instances of "SHOULD make a reasonable attempt at
ensuring that all [reachable] neighbours receive this update/retraction".  What
does making "a reasonable attempt" mean?  How can that be Normatively enforced?

(e) §3.7.2

   Finally, a node MAY send a triggered update when the metric for a
   given prefix changes in a significant manner, due to a received
   update, because a link's cost has changed, or because a different
   next hop has been selected.  A node SHOULD NOT send triggered updates
   for other reasons, such as when there is a minor fluctuation in a
   route's metric, when the selected next hop changes, or to propagate a
   new sequence number (except to satisfy a request, as specified in
   Section 3.8).

How much is "a significant manner"?  What about "a minor fluctuation"?  Are the
modifiers (next hop change, for example) the only conditions to take into
account, or are they just examples of when these significant/minor changes may
occur?   How can these terms be Normatively enforced?

(f) §3.8.1.1: "When a node receives a wildcard route request, it SHOULD send a
full route table dump."  When is it ok to not send a full table dump?  IOW, why
is MUST not used?

(g) §3.8.2.1: "a node SHOULD repeat such a request a small number of times if
no route becomes feasible within a short time."  What does "a small number of
times" and "within a short time" mean?  How can that be Normatively enforced? 
Please be specific.

(h) §4.6.9: "Omitted...that should be taken from a preceding Update TLV in the
same address family with the Prefix flag set."  What if that Update TLV is not
in the packet?

(i) Security Considerations

The initial vulnerability listed ("attacker can misdirect data traffic by
advertising routes with a low metric or a high seqno") is only one of several
actions an attacker can take.  More importantly, if the attacker happens to be
in control of an authenticated node, then the mitigation proposed doesn't help.
 This type of rogue node can, for example, set the mandatory bit in an unknown
TLV (as in completely made up!) to cause whole TLVs to be ignored, resulting in
loss of routes, etc..

I am not sure what can be done to mitigate this type of vulnerability...but I
think it is important that it is at least called out.

(j) Are the appendices intended to be Normative or not?  I'm assuming the
answer is no...but I can base that only on the references in the text to
Appendix A.*, pointing to them as examples.  What about the others?  They are
not even referenced in the text.  Some comments:

- Appendix B talks about constants/default values.  See my DISCUSS comments
above.

- Appendix C "is intended to guide designers of protocol extensions in chosing
a particular encoding."  I think is is valuable information.  It would be very
nice if there was a reference (or perhaps several, from where the different
extensibility methods are presented) in the main body of the specification.  I
can see how this is an informative section.

- Appendix D defines a "stub implementation".  This is also valuable
information.  But...there's no reference from the text, and Normative language
is used...  Why is this type of implementation (which I would think might be
relatively common) not normative?

- Appendix E simply points to the sample implementation.  Personally, I would
prefer to see an rfc7942 section instead -- it would have been nice to also
mention other implementations.

(k) "The length of..." is used everywhere in the document, but no units are
mentioned.  Some seem to obviously be in octets, but others could easily be in
bits...

(l) §4: s/SHOULD attempt to maximise the size of the packets/SHOULD maximise
the size of the packets

(m) §4.1.3: The description of AE 1 and 2 says that "Compression is allowed."
-- but it looks like the only place where it can happen is in an Update.  It
might be nice to indicate that...and avoid indicating that compression is not
allowed where it can't be done anyway.

(n) rfc8126 should be a Normative reference.

(o) Please include Informative references to rfc6126 and rfc7557.

(p) s/Bellman-Ford protocol/Bellman-Ford algorithm

(q) §2.4: Include an Informative reference to AODV (rfc3561).

(r) §2.4: "if A has selected B as its successor"  This is the only place where
"successor" is used.  For clarity, perhaps use a different word/description.