[secdir] secdir review of draft-ietf-rtgwg-backoff-algo-07

Benjamin Kaduk <kaduk@mit.edu> Wed, 14 February 2018 21:10 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8E89B12D868; Wed, 14 Feb 2018 13:10:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.231
X-Spam-Level:
X-Spam-Status: No, score=-4.231 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bKKtKo790Vv7; Wed, 14 Feb 2018 13:10:32 -0800 (PST)
Received: from dmz-mailsec-scanner-3.mit.edu (dmz-mailsec-scanner-3.mit.edu [18.9.25.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8321212D867; Wed, 14 Feb 2018 13:10:28 -0800 (PST)
X-AuditID: 1209190e-cefff700000063b2-48-5a84a5c2dc20
Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-3.mit.edu (Symantec Messaging Gateway) with SMTP id 65.1B.25522.2C5A48A5; Wed, 14 Feb 2018 16:10:27 -0500 (EST)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id w1ELAMuI031370; Wed, 14 Feb 2018 16:10:23 -0500
Received: from mit.edu (24-107-191-124.dhcp.stls.mo.charter.com [24.107.191.124]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id w1ELAIrZ001829 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 14 Feb 2018 16:10:20 -0500
Date: Wed, 14 Feb 2018 15:10:17 -0600
From: Benjamin Kaduk <kaduk@mit.edu>
To: draft-ietf-rtgwg-backoff-algo.all@ietf.org, iesg@ietf.org, secdir@ietf.org
Message-ID: <20180214211017.GI12363@mit.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrBIsWRmVeSWpSXmKPExsUixG6nont4aUuUQdMEJovr22+wWcz4M5HZ 4sPChywOzB5LlvxkCmCM4rJJSc3JLEst0rdL4MpYvX4qc8Fiw4oPr78xNjC+U+9i5OSQEDCR 2LG8k7WLkYtDSGAxk8TlxgksIAkhgY2MEifaqyASZ5kkrr6fygqSYBFQlfj6fTEbiM0moCLR 0H2ZGcQWEfCTuLv0JpgtLGAlMeHdNUYQm1dAR2Lr9assELagxMmZT8BsZgEtiRv/XjJ1MXIA 2dISy/9xgIRFBZQl9vYdYp/AyDsLSccsJB2zEDoWMDKvYpRNya3SzU3MzClOTdYtTk7My0st 0jXWy80s0UtNKd3ECAouTkm+HYyTGrwPMQpwMCrx8N6wbIkSYk0sK67MPcQoycGkJMo7gxMo xJeUn1KZkVicEV9UmpNafIhRgoNZSYTX+HxzlBBvSmJlVWpRPkxKmoNFSZzX3UQ7SkggPbEk NTs1tSC1CCYrw8GhJMF7cgnQUMGi1PTUirTMnBKENBMHJ8hwHqDhM0BqeIsLEnOLM9Mh8qcY jTluvHjdxszxa9PeTmYhlrz8vFQpcd5qkFIBkNKM0jy4aaAEIZG9v+YVozjQc8K8K0GqeIDJ BW7eK6BVTECrdLUbQVaVJCKkpBoYF4SsaZd7sCi3hu1aVVdXvL7+hqOH80+7Hjom4Sl7v1/X 3kJkvsfSySXvknNX5KkHTlNQ+PI/9iSjdOrkK09c+E4s0M3kll5xMeqJyIej7TtnPE1rv63+ an3UM/m3Xw/zvLeuays7d+vXna51d/Yr2szLe/q2QClrl4uthFrvqpBLq2L/3tE7rsRSnJFo qMVcVJwIAEP17yDrAgAA
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/PwLbKQaZ70E5LtFi1lFUkYHA9GY>
Subject: [secdir] secdir review of draft-ietf-rtgwg-backoff-algo-07
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Feb 2018 21:10:33 -0000

Hi all,

I have reviewed this document as part of the security directorate's 
ongoing effort to review all IETF documents being processed by the 
IESG.  These comments were written primarily for the benefit of the 
security area directors.  Document editors and WG chairs should treat 
these comments just like any other last call comments.

>From a security perspective, this document is Ready.  It specifies a
standard scheme that can be used to back off SPF calculations during
periods of frequent IGP events, avoiding excessive resource
consumption performing calculations that would be rendered redundant
(or just be useless) soon.  The security considerations correctly
note that an attacker that can generate IGP events would be able to
delay the IGP convergence time, which is true both for this scheme
and all schemes previously in use.  (I might use more words to say
the same thing if I was writing it, but that probably reflects more
on me than the document.)


I do have some questions about the actual proposed FSM, though -- I
suspect that I am just making some implicit assumptions that may not
be grounded in reality.  In particular, I am basically assuming that
INITIAL_SPF_DELAY < SHORT_SPF_DELAY < LONG_SPF_DELAY <
HOLDDOWN_INTERVAL.  (The draft itself only has it as RECOMMENDED for
SPF_INITIAL_DELAY <= SPF_SHORT_DELAY <= SPF_LONG_DELAY in Section 6,
yes, with the different spellings, but has a MUST for
HOLDDOWN_INTERVAL > TIME_TO_LEARN_INTERVAL.)
This would potentially affect the state machine for events 1, 6, and
7.

In transition 1, we say to only start the SPF_TIMER if it is not already
running, but I do not see a way for it to already be running unless
the HOLDDOWN_TIMER value is less than one or more of the SPF_TIMER
values.

Similarly, I don't see how transition 7 could ever happen, since an IGP
event moves us out of the QUIET state, and I assume that the
SPF_TIMER would fire before the HOLDDOWN_TIMER, since the latter is
reset on every IGP event and I assume the latter has a larger value.

Transition 6 is a little less clear, but is also similar -- if
HOLDDOWN_TIMER is larger than LEARN_TIMER, then LEARN_TIMER must
fire before HOLDDOWN_TIMER, and we leave SHORT_WAIT to go to
LONG_WAIT before we could consider leaving SHORT_WAIT to go back to
QUIET.

So, am I making some flawed assumptions?  (Are there examples of
situations that clearly demonstrate the flaw?)


Also, to confirm my understanding, suppose a scenario happens where
an IGP participant sees an event, then a gap of 100ms, then three
IGP events at equal 10ms intervals, with the SPF delays at the
example values of 0/50/2000 ms and TIME_TO_LEARN of 1s.  The first
event triggers a SPF computation immediately, then we go to
SHORT_WAIT, the first of the three events kicks off a SPF_TIMER for
50ms, which is reset by the next two events, the SPF timer fires and
we recompute the SPF, then TIME_TO_LEARN fires and we go to
LONG_WAIT until the HOLDDOWN_TIMER fires.  Or maybe the SPF
calculation takes more than 200ms, so when the second IGP event
fires, we abort the currently in-progress calculation and don't
start another one until 50ms after the last event?  I bring this up
because of the text in the second paragraph of Section 4 that talks
of computing the post-failure routing table in "a single route
computation".  But if I understand correctly, the *single*
computation only happens in the second case here, when the
calculation takes some hundreds of milliseconds; otherwise we still
have *two* computations (one triggered while we're in QUIET and the
second triggered in SHORT_DELAY).  So I'm not sure I fully
understand the expected scenario.


I also am probably having some problems with terminology, presumably
just my misunderstanding, which hopefully can be set straight
easily.

In the Introduction, we have a "desire to compute a new Shortest
Path First (SPF) as soon as a failure is detected", which is using
SPF as it is a data structure (e.g., the result of an algorithm),
whereas my intuition has SPF referring to the algorithm [class] but
not its output.

In section 3, we talk of "computation of the routing table, by the
IGP", which gets me confused about whether "the IGP" represents a
network protocol for conveying (e.g.) link state information, an
algorithm for SPF computation, or a router that performs SPF
computations.

In section 6 we talk of "the number of protocols
reactions/computations triggered by IGP SPF".  Is this just in the sense
of "each SPF calculation triggers a bunch of other stuff"?  I think
this is another case about me being confused whether "SPF" means an
algorithm, a specific computation using that algorithm, etc.



Some other editorial notes:

It's probably better to cite RFC 8174 instead of/in addition to RFC
2119, especially since there is at least a lowercase "may" present.

It's unclear that "temporally close" in "multiple temporally close
failures over a short time" really adds any value, in the
Introduction.

In section 2, last bullet point on page 3, "SPF_DELAY timers values"
probably doesn't need the plural "timers" (so, either "timer" or
the possessive "timers'"), though I am mindful of the recent
discussion on ietf@ about (non-)American English.  The second
sentence of the bullet is also a sentence fragment and not a
complete sentence.

SRLG is used without expansion in multiple places, but does not
appear on https://www.rfc-editor.org/materials/abbrev.expansion.txt
as a "well-known" abbreviation.

In section 6, we find the awkward construction "play it safe and
start with safe, i.e., longer timers".  Probably we want to say
"safe values" as the noun, and maybe consider rewording to avoid the
duplicate "safe" and/or the colloquialism "play it safe".

Section 8 says:

   [...]. FIBs
   are installed after multiple steps such as flooding of the IGP event
   across the network, SPF wait time, SPF computation, FIB distribution
   across line cards, and FIB update.  This document only addresses the
   first contribution.

which makes me try to match up "the first contribution" with the
flooding, when I assume it's meant to match up with the SPF wait
time.

-Benjamin