[secdir] secdir review of draft-ietf-tsvwg-circuit-breaker-11

Benjamin Kaduk <kaduk@MIT.EDU> Fri, 12 February 2016 04:21 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D7B171B3F4A; Thu, 11 Feb 2016 20:21:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.502
X-Spam-Level:
X-Spam-Status: No, score=-1.502 tagged_above=-999 required=5 tests=[BAYES_50=0.8, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G8KxmIW4enet; Thu, 11 Feb 2016 20:21:11 -0800 (PST)
Received: from dmz-mailsec-scanner-7.mit.edu (dmz-mailsec-scanner-7.mit.edu [18.7.68.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 479DB1B3F53; Thu, 11 Feb 2016 20:21:09 -0800 (PST)
X-AuditID: 12074424-713ff70000000fca-42-56bd5db337a7
Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by (Symantec Messaging Gateway) with SMTP id BC.F6.04042.3BD5DB65; Thu, 11 Feb 2016 23:21:07 -0500 (EST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id u1C4L6jR015362; Thu, 11 Feb 2016 23:21:07 -0500
Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id u1C4L34W004136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 11 Feb 2016 23:21:06 -0500
Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id u1C4L2il028240; Thu, 11 Feb 2016 23:21:02 -0500 (EST)
Date: Thu, 11 Feb 2016 23:21:02 -0500
From: Benjamin Kaduk <kaduk@MIT.EDU>
To: iesg@ietf.org, secdir@ietf.org, draft-ietf-tsvwg-circuit-breaker.all@ietf.org
Message-ID: <alpine.GSO.1.10.1602111737220.26829@multics.mit.edu>
User-Agent: Alpine 1.10 (GSO 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLIsWRmVeSWpSXmKPExsUixCmqrLs5dm+Ywc6pRhafnq9ktpjxZyKz xYeFD1kcmD2WLPnJFMAYxWWTkpqTWZZapG+XwJXR/n0Ka8GblIrj2wobGBcEdTFyckgImEgc ftHC2MXIxSEk0MYk0fWigQnC2cgo0f3kG5RziEli5vWjLBBOA6PEzdOXmEH6WQS0JQ686mQF sdkEVCRmvtnIBmKLCERJfD96CMwWFrCVeLXiJZjNK+AoMfvDHUYQW1RAR2L1/iksEHFBiZMz n4DZzAJaEsunb2OZwMg7C0lqFpLUAkamVYyyKblVurmJmTnFqcm6xcmJeXmpRbrmermZJXqp KaWbGEGhxO6isoOx+ZDSIUYBDkYlHt4b1/eECbEmlhVX5h5ilORgUhLlZVTdGybEl5SfUpmR WJwRX1Sak1p8iFGCg1lJhFeCCyjHm5JYWZValA+TkuZgURLnffRrZ5iQQHpiSWp2ampBahFM VoaDQ0mC90AMUKNgUWp6akVaZk4JQpqJgxNkOA/Q8CKQGt7igsTc4sx0iPwpRkUpcd73IAkB kERGaR5cLzjWdzOpvmIUB3pFmDcYpIoHmCbgul8BDWYCGrzj+y6QwSWJCCmpBkaDo787lQu8 LI4EMUiwXSmOvZLApv9oRmSvT8D/ww/r3ua9DWz5fpeNLXpBlv6lJdeKZqlc3Fbc8vfGWrsU BkfdBcXRLisXzrE/45i+1MRS7UmxRNTG7LqvKWXdqzzmazOwvukTXSlcYDGBX9mhUbbyiUr5 rpWf9KfabZqikTqLUyO4ML7nnRJLcUaioRZzUXEiAJTvFLrQAgAA
Archived-At: <http://mailarchive.ietf.org/arch/msg/secdir/dMQvkJ5LaxusOnluulu7Tyl2TEQ>
Subject: [secdir] secdir review of draft-ietf-tsvwg-circuit-breaker-11
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Feb 2016 04:21:15 -0000

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG.  These comments were written primarily for the benefit of the
security area directors.  Document editors and WG chairs should treat
these comments just like any other last call comments.

This document is ready with nits (the rest of this paragraph), modulo one
question I have (the following paragraph).  Since it's more of a
requirements doc than a full protocol specification, there are not too
many requirements for security considerations.  This document correctly
notes the risk of an attacker using the circuit breaker mechanism for
denial of service and the need for integrity and authenticity of control
messages.  It states that there is a trade-off between the cost of crypto
and the need to authenticate control messages when there is a risk of
on-path attack; I am a little uncomfortable with this statement (it is
perhaps "too weak"), especially since it does not give guidance on
determining the level of risk, but neither do I have a concrete objection
to it.  (Given the availability of physical network taps to at least
nation-state-level actors, there seems to always be a risk of on-path
attack.)  Likewise, I am somewhat uneasy with the claim that just
randomization of source port (or similar randomization in the packet
header) suffices to deter an off-path attacker -- for example, in crypto,
we usually talk of reducing the attacker's success probability to below
2^-32 or 2^-64 or something like that, but there are only ~2^16 ports
numbers to randomize in, so the success probability from just port number
randomization would not meet the usual criteria.  So, perhaps that's not a
good example to use on its own in the requirements document; other fields
in the packet header could have a larger search space and be more
reasonable for this purpose.  The rest of the security considerations are
good, covering the issues related to capacity and robustness, and
mentioning the need for per-mechanism analysis.

The one question I have relates to the possibility of circuit breakers
becoming ubiquitous.  It seems pretty clear that going from a network with
no circuit breakers to a network with one circuit breaker is worthwhile,
offering a local improvement for the flows in question.  But if all, or
nearly all routes through the network traversed one or more circuit
breakers -- is there a risk of cascading failure, either accidental or
purposefully triggered by an attacker?  In a network where circuit
breakers occupy what a topologist would call a dense or fully-connected
subset of the network, would one circuit breaker tripping cause subsequent
breaker trips and near-complete network shutdown?  There seems to even be
something of an analogue (though not a perfect one) in electrical circuit
breakers, where an extreme failure in a device can cause the breaker in a
power strip to fuse open, causing the breaker for the particular circuit
in the building that it's on to fuse open, tripping the main breaker for
the whole panel.  It is uncommon for the cascade to continue to the mains
for the building or the local substation, but is a known risk.  So, has
anyone thought about the behavior of circuit breakers if/when they are
ubiquitous in the network?

(Also, it's amusing to see "CB" used for "circuit breaker" in this
context, as I'm so used to seeing it expand to "channel binding".  It
seems that the RFC Editor's abbreviation list
https://www.rfc-editor.org/materials/abbrev.expansion.txt includes neither
form...)


Section 7.3 as written does not seem terribly connected to circuit
breakers or the rest of the document.  Should it be removed?


This secdir review also comes with a bonus copyediting pass; iesg@ and
secdir@ feel free to stop reading now.

The last sentence of the first paragraph of section 1 ("Just ...
appliance.") does not have an independent clause, and leaves the reader
hanging.

In Section 1, second paragraph, "countered by the requirement to use
congestion control by the transmission control protocol" would probably
flow better as "in the [TCP]" or "with the [TCP]", since although the TCP
specification is what requires the use of congestion control, the TCP
protocol itself is just using congestion control.

In Section 1, second paragraph, penultimate sentence, "applications of the
Unix Datagram Protocol" suffers from the dual meaning of "applications" as
"software programs" and "instances where it is used".  The first time I
read it, I flagged it to be changed to "applications using [UDP]", but of
course it is the latter meaning that was intended.  Not a big deal, but
perhaps this could be rearranged to avoid the potential confusion.  (I
don't think there's a good word to just replace the single word with,
though.)

Section 1, third paragraph, penultimate sentence has a comma splice.

Section 1, fourth paragraph, second sentence: a timescale is inherently an
order-of-magnitude thing, and different paths have a different RTT, so
there is not a single timescale on which congestion control operates.  I
suggest just saying "operates on the timescale of a packet RTT", but
"operates on a timescale on the order of a packet RTT" is probably fine,
too.

In the following sentence, the concept of "packet loss/marking" is calmly
used with no introduction.  I'm not personally familiar with packet
marking in this sense, and though the usage later in the document gave me
some rough sense of what it means, maybe a bit more introduction (e.g., an
informational reference) would be useful.  Or maybe it's a term of art in
transport and I'm just not a practitioner; that's possible, too.

In a similar vein, "5-tuple" at the end of that same paragraph may want an
informational reference to RFC 6146, or may be considered common knowledge
for the target audience.

In the next paragraph, there's a comma splice in the penultimate sentence.

The long paragraph in the middle of page 4 seems to introduce a new term
"control function" without much explanation; this phrase does not seem to
be used anyplace else in the document (thought "control plane function"
has one occurrence), so it seems likely that a slight rewording here would
improve the document.  (I'm not actually entirely sure what it's trying to
say, so I don't have any concrete suggestions.)  In this and the following
sentence, it would be good to make more clear that the text is talking
about circuit breakers and not other forms of congestion control

The second bullet point for examples of situations that could trigger
circuit breakers ("traffic generated by an application...utilised for
other purposes") confused me the first time I read it.  Perhaps shuffle
things around a bit to clarify that it is that "the network capacity
provisioned for that application is being utilised for other purposes",
though upon re-reading the existing text may suffice as-is.

The second sentence of the first (full, i.e., non-bullet-point) paragraph
on page 5 seems to suffer from a bit of pronoun/antecedent confusion.  In
particular, "will generate elastic traffic that may be expected to
regulate the load" reads as if it is the generated traffic itself that
will regulate the load, whereas a common way of thinking about it would be
that it is the application that is regulating the load produced by the
traffic that the application generates.  Also, in "the load it introduces"
there is ambiguity as to whether "it" refers to the application or the
traffic.  (Perhaps this ambiguity is irrelevant, but in general ambiguity
in a spec is to be avoided.)

In the following paragraph, the second sentence is a bit long, and heavily
broken up by qualifiers that are not really needed ("all but impossible",
"may further be the case", "may have some difficulty", "has in fact been
tripped").  As copyeditor, I would suggest splitting this into two
sentences and removing some of the unneeded words.

Should "Circuit Breaker" be uniformly capitalized throughout the document?
It is not capitalized in the first sentence of Section 1.1.  (Perhaps the
plural "Breakers" is also appropriate?)

On pages 8/9, it would be good to maintain parallel structure across the
enumerated items, most notably by including "that" in the first sentences
("An ingress meter that records the number of packets", "A measurement
function that combines", ...).  Item 3 does not currently fit into that
structure, and it may not be worth the drastic changes that would be
needed to stuff it into place, since it is describing an action as opposed
to the functions that are described in the other items.  But it's probably
worth making the easy changes.

In item 3 of that list, the capital "An" is not needed after a semicolon,
and there is another list within the second sentence that could gain a
more parallel structure if "be sending another in-band" were replaced with
"sending an in-band".

In Section 4, fourth bullet point, "adjust the traffic to experienced
congetsion" might be better as "adjust the traffic when congestion is
experienced".

The fifth (i.e., next) bullet point seems to lack a subject for the first
sentence.  Presumably it refers to the circuit breaker in question, but
it's best to be explicit about it.

The eighth bullet point (top of page 11), I'd put "it is" before the
"triggered" in the parenthetical.

In the sixteenth bullet point (second one on page 12), you refer to the
"source" of control messages, which I think would more conventionally be
written as the "authenticity" of those messages.  ("Source" is used in
this fashion in at least one other place in the document, so please change
all occurrences if changing any.)

I am a bit hazy on what exactly is going on in the example in Section 5.1
(the last three paragraphs), but I will chalk that up to my lack of
knowledge about multicast routing.  It's probably worth expanding and/or
putting an informational reference for PIM-SM, though, and offsetting the
"however" in the last sentence with commas.

In the first paragraph of Section 5.2, please us the plural "paths" in
"paths provisioned using the Resource reservation protocol".

Given the success of UDP-based protocols like QUIC, mosh, BitTorrent,
etc., it seems a little strong to have this claim in Section 6.1 that "all
applications ought to use a full-featured transport" when the meaning
seems to really just be that all applications ought to have congestion
control functionality for their traffic, whether obtained via a
full-featured transport or built directly into the application [protocol].

I would also consider removing the comma in the penultimate sentence of
the first paragraph of Section 6.1, though I do not think I can claim that
it is actually incorrect.

In the next paragraph, "tailored *to* the type of traffic", and probably
"when multiple congestion-controlled flows *combined* lead to short-term
overload", since otherwise one could read it as saying that (multiple)
(congestion-controlled flows lead to short-term overload), in which case
the "multiple" is seemingly irrelevant.

In the next paragraph (last one on page 15), there's a singular/plural
mismatch in "a RTP-aware network devices"; I'd go with the plural, but
it's your call.

In Section 6.1.1, item 3 doesn't seem quite right -- I don't think that
the breaker ought to trigger just by the act of using a TFRC-style check
with a hard upper limit; I'd expect that the observed traffic would need
to exceed that limit, too.  (Also, expand "TFRC".)

>From a document structure perspective, it's slightly jarring to not have a
subsection 6.2.1 with a dedicated example, but I can understand why the
document is currently the way it is.

The last sentence of Section 6.3.1 seems to come without much lead-in; it
would be nice to get a better transition into it, and maybe a mention of
"circuit breaker" and its releation thereto.

In Section 7.1, third paragraph, I don't think I understand what "other
sharing network traffic" is supposed to mean, or really, what the example
is saying in general.

In Section 7.2, second paragraph, "For sure" seems a rather informal way
of starting a sentence.

The third sentence in that paragraph contains a comma splice.

The last sentence of Section 7.2 could benefit from avoiding the pronoun
in "this protects other network traffic" to clarify what exactly is
providing the protection ("the network configuration", perhaps?).

In Section 8, first paragraph, it's probably worth covering the failure
mode when the interval is too short, just for completeness (even though
it's ~obvious and covered elsewhere in the document).

-Ben