Re: [secdir] secdir review of draft-ietf-tsvwg-circuit-breaker-11

gorry@erg.abdn.ac.uk Fri, 12 February 2016 14:43 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9E881A040C; Fri, 12 Feb 2016 06:43:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.202
X-Spam-Level:
X-Spam-Status: No, score=-4.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MVRDb2NgTp-C; Fri, 12 Feb 2016 06:43:08 -0800 (PST)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [139.133.204.173]) by ietfa.amsl.com (Postfix) with ESMTP id 05E3D1A0397; Fri, 12 Feb 2016 06:43:07 -0800 (PST)
Received: from erg.abdn.ac.uk (galactica.erg.abdn.ac.uk [139.133.210.32]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPA id 3F5FD1B001A1; Fri, 12 Feb 2016 14:50:37 +0000 (GMT)
Received: from 212.159.18.54 (SquirrelMail authenticated user gorry) by erg.abdn.ac.uk with HTTP; Fri, 12 Feb 2016 14:43:06 -0000
Message-ID: <123b6fdd10b8f735e3a5e0f9e5e57228.squirrel@erg.abdn.ac.uk>
In-Reply-To: <alpine.GSO.1.10.1602111737220.26829@multics.mit.edu>
References: <alpine.GSO.1.10.1602111737220.26829@multics.mit.edu>
Date: Fri, 12 Feb 2016 14:43:06 -0000
From: gorry@erg.abdn.ac.uk
To: Benjamin Kaduk <kaduk@MIT.EDU>
User-Agent: SquirrelMail/1.4.23 [SVN]
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Archived-At: <http://mailarchive.ietf.org/arch/msg/secdir/r_KL3CVj5vq-0cPnVAt5hRb4W2g>
X-Mailman-Approved-At: Fri, 12 Feb 2016 06:49:46 -0800
Cc: draft-ietf-tsvwg-circuit-breaker.all@ietf.org, iesg@ietf.org, secdir@ietf.org
Subject: Re: [secdir] secdir review of draft-ietf-tsvwg-circuit-breaker-11
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Feb 2016 14:43:12 -0000

GF: Thanks for the great review. I've noted my response below (prefixed by
GF:) - nearly all of these will be directly addressed in rev -12.

Gorry

----

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG.  These comments were written primarily for the benefit of the
security area directors.  Document editors and WG chairs should treat
these comments just like any other last call comments.

This document is ready with nits (the rest of this paragraph), modulo one
question I have (the following paragraph).  Since it's more of a
requirements doc than a full protocol specification, there are not too
many requirements for security considerations.  This document correctly
notes the risk of an attacker using the circuit breaker mechanism for
denial of service and the need for integrity and authenticity of control
messages.  It states that there is a trade-off between the cost of crypto
and the need to authenticate control messages when there is a risk of
on-path attack; I am a little uncomfortable with this statement (it is
perhaps "too weak"), especially since it does not give guidance on
determining the level of risk, but neither do I have a concrete objection
to it.  (Given the availability of physical network taps to at least
nation-state-level actors, there seems to always be a risk of on-path
attack.)  Likewise, I am somewhat uneasy with the claim that just
randomization of source port (or similar randomization in the packet
header) suffices to deter an off-path attacker -- for example, in crypto,
we usually talk of reducing the attacker's success probability to below
2^-32 or 2^-64 or something like that, but there are only ~2^16 ports
numbers to randomize in, so the success probability from just port number
randomization would not meet the usual criteria.  So, perhaps that's not a
good example to use on its own in the requirements document; other fields
in the packet header could have a larger search space and be more
reasonable for this purpose.  The rest of the security considerations are
good, covering the issues related to capacity and robustness, and
mentioning the need for per-mechanism analysis.

GF: Further advice is needed.I note the off-path “protection” of the
source port is typically used per transport flow (could be appropriate to
Fast-Trip CBs). For large flow aggregates the impact of an attack can be
greater, and so perhaps also the recommendation should be stronger?

The one question I have relates to the possibility of circuit breakers
becoming ubiquitous.  It seems pretty clear that going from a network with
no circuit breakers to a network with one circuit breaker is worthwhile,
offering a local improvement for the flows in question.  But if all, or
nearly all routes through the network traversed one or more circuit
breakers -- is there a risk of cascading failure, either accidental or
purposefully triggered by an attacker?  In a network where circuit
breakers occupy what a topologist would call a dense or fully-connected
subset of the network, would one circuit breaker tripping cause subsequent
breaker trips and near-complete network shutdown?

GF: Not expected. The result of a circuit breaker tripping is to shed
traffic (load), not to reduce capacity. This should mean that once a CB is
triggered this results in less load to another cascaded circuit breaker.

There seems to even be
something of an analogue (though not a perfect one) in electrical circuit
breakers, where an extreme failure in a device can cause the breaker in a
power strip to fuse open, causing the breaker for the particular circuit
in the building that it's on to fuse open, tripping the main breaker for
the whole panel.  It is uncommon for the cascade to continue to the mains
for the building or the local substation, but is a known risk.  So, has
anyone thought about the behavior of circuit breakers if/when they are
ubiquitous in the network?

GF: For electricity supply, the circuit breaker stops the supply. So
you’re correct in that selectivity between circuit breakers connected in
series requires that the operating or tripping time of the lower rated
downstream circuit breaker is less than the operating or tripping time of
the upstream circuit breaker. For short circuits where current limitation
occurs and the energy let-through as the device clears the fault is
significant, and you can indeed trip all the electrical circuit breakers.

GF: However, I think in the networking context, things are different
because the circuit breaker acts by turning off the source of traffic, not
by turning off the ”capacity” (other traffic can use the network after a
source is disabled). In the network, the problem then becomes one of
selecting a sufficiently long reaction time/robust measure to allow other
functions to react before the source is turned off.

---------------------------------------------------------------------------------

(Also, it's amusing to see "CB" used for "circuit breaker" in this
context, as I'm so used to seeing it expand to "channel binding".  It
seems that the RFC Editor's abbreviation list
https://www.rfc-editor.org/materials/abbrev.expansion.txt includes neither
form...)


Section 7.3 as written does not seem terribly connected to circuit
breakers or the rest of the document.  Should it be removed?
-GF: Reworked

This secdir review also comes with a bonus copyediting pass; iesg@ and
secdir@ feel free to stop reading now.
-GF: Sorry for making you proof-read, comments appreciated.

The last sentence of the first paragraph of section 1 ("Just ...
appliance.") does not have an independent clause, and leaves the reader
hanging.
-GF: Resolved.

In Section 1, second paragraph, "countered by the requirement to use
congestion control by the transmission control protocol" would probably
flow better as "in the [TCP]" or "with the [TCP]", since although the TCP
specification is what requires the use of congestion control, the TCP
protocol itself is just using congestion control.
-GF: Done.

In Section 1, second paragraph, penultimate sentence, "applications of the
Unix Datagram Protocol" suffers from the dual meaning of "applications" as
"software programs" and "instances where it is used".  The first time I
read it, I flagged it to be changed to "applications using [UDP]", but of
course it is the latter meaning that was intended.  Not a big deal, but
perhaps this could be rearranged to avoid the potential confusion.  (I
don't think there's a good word to just replace the single word with,
though.)
-GF: Done.

Section 1, third paragraph, penultimate sentence has a comma splice.
-GF: Done.

Section 1, fourth paragraph, second sentence: a timescale is inherently an
order-of-magnitude thing, and different paths have a different RTT, so
there is not a single timescale on which congestion control operates.  I
suggest just saying "operates on the timescale of a packet RTT", but
"operates on a timescale on the order of a packet RTT" is probably fine,
too.
-GF: Done.

In the following sentence, the concept of "packet loss/marking" is calmly
used with no introduction.  I'm not personally familiar with packet
marking in this sense, and though the usage later in the document gave me
some rough sense of what it means, maybe a bit more introduction (e.g., an
informational reference) would be useful.  Or maybe it's a term of art in
transport and I'm just not a practitioner; that's possible, too.
-GF: Added ref to ECN spec.

In a similar vein, "5-tuple" at the end of that same paragraph may want an
informational reference to RFC 6146, or may be considered common knowledge
for the target audience.
-GF: Added: (e.g., a 5-tuple that includes the IP addresses, protocol, and
ports).

In the next paragraph, there's a comma splice in the penultimate sentence.
-GF: Done.

The long paragraph in the middle of page 4 seems to introduce a new term
"control function" without much explanation; this phrase does not seem to
be used anyplace else in the document (thought "control plane function"
has one occurrence), so it seems likely that a slight rewording here would
improve the document.  (I'm not actually entirely sure what it's trying to
say, so I don't have any concrete suggestions.)  In this and the following
sentence, it would be good to make more clear that the text is talking
about circuit breakers and not other forms of congestion control
-GF: This seems to have collected inputs from various people into a less
read-able para. Rewritten.

The second bullet point for examples of situations that could trigger
circuit breakers ("traffic generated by an application...utilised for
other purposes") confused me the first time I read it.  Perhaps shuffle
things around a bit to clarify that it is that "the network capacity
provisioned for that application is being utilised for other purposes",
though upon re-reading the existing text may suffice as-is.
-GF: Did not change - couldn’t work out how to do it better.

The second sentence of the first (full, i.e., non-bullet-point) paragraph
on page 5 seems to suffer from a bit of pronoun/antecedent confusion.  In
particular, "will generate elastic traffic that may be expected to
regulate the load" reads as if it is the generated traffic itself that
will regulate the load, whereas a common way of thinking about it would be
that it is the application that is regulating the load produced by the
traffic that the application generates.  Also, in "the load it introduces"
there is ambiguity as to whether "it" refers to the application or the
traffic.  (Perhaps this ambiguity is irrelevant, but in general ambiguity
in a spec is to be avoided.)
-GF: Reworked.

In the following paragraph, the second sentence is a bit long, and heavily
broken up by qualifiers that are not really needed ("all but impossible",
"may further be the case", "may have some difficulty", "has in fact been
tripped").  As copyeditor, I would suggest splitting this into two
sentences and removing some of the unneeded words.
-GF: Done

Should "Circuit Breaker" be uniformly capitalized throughout the document?
It is not capitalized in the first sentence of Section 1.1.  (Perhaps the
plural "Breakers" is also appropriate?
-GF: Done

On pages 8/9, it would be good to maintain parallel structure across the
enumerated items, most notably by including "that" in the first sentences
("An ingress meter that records the number of packets", "A measurement
function that combines", ...).  Item 3 does not currently fit into that
structure, and it may not be worth the drastic changes that would be
needed to stuff it into place, since it is describing an action as opposed
to the functions that are described in the other items.  But it's probably
worth making the easy changes.
-GF: Done

In item 3 of that list, the capital "An" is not needed after a semicolon,
-GF: Done

and there is another list within the second sentence that could gain a
more parallel structure if "be sending another in-band" were replaced with
"sending an in-band".
-GF: Done

In Section 4, fourth bullet point, "adjust the traffic to experienced
congetsion" might be better as "adjust the traffic when congestion is
experienced".
-GF: Done

The fifth (i.e., next) bullet point seems to lack a subject for the first
sentence.  Presumably it refers to the circuit breaker in question, but
it's best to be explicit about it.
-GF: Done

The eighth bullet point (top of page 11), I'd put "it is" before the
"triggered" in the parenthetical.
-GF: Done

In the sixteenth bullet point (second one on page 12), you refer to the
"source" of control messages, which I think would more conventionally be
written as the "authenticity" of those messages.  ("Source" is used in
this fashion in at least one other place in the document, so please change
all occurrences if changing any.)
-GF: Done

I am a bit hazy on what exactly is going on in the example in Section 5.1
(the last three paragraphs), but I will chalk that up to my lack of
knowledge about multicast routing.  It's probably worth expanding and/or
putting an informational reference for PIM-SM, though, and offsetting the
"however" in the last sentence with commas.
-GF: Done

In the first paragraph of Section 5.2, please us the plural "paths" in
"paths provisioned using the Resource reservation protocol".
-GF: Done

Given the success of UDP-based protocols like QUIC, mosh, BitTorrent,
etc., it seems a little strong to have this claim in Section 6.1 that "all
applications ought to use a full-featured transport" when the meaning
seems to really just be that all applications ought to have congestion
control functionality for their traffic, whether obtained via a
full-featured transport or built directly into the application [protocol].
-GF: Yes. Of course, fixed.

I would also consider removing the comma in the penultimate sentence of
the first paragraph of Section 6.1, though I do not think I can claim that
it is actually incorrect.
-GF: Done

In the next paragraph, "tailored *to* the type of traffic", and probably
"when multiple congestion-controlled flows *combined* lead to short-term
overload", since otherwise one could read it as saying that (multiple)
(congestion-controlled flows lead to short-term overload), in which case
the "multiple" is seemingly irrelevant.
-GF: Done

In the next paragraph (last one on page 15), there's a singular/plural
mismatch in "a RTP-aware network devices"; I'd go with the plural, but
it's your call.
-GF: Done

In Section 6.1.1, item 3 doesn't seem quite right -- I don't think that
the breaker ought to trigger just by the act of using a TFRC-style check
with a hard upper limit; I'd expect that the observed traffic would need
to exceed that limit, too.  (Also, expand "TFRC".)
-GF: Done

>From a document structure perspective, it's slightly jarring to not have a
subsection 6.2.1 with a dedicated example, but I can understand why the
document is currently the way it is.
-GF: No action.

The last sentence of Section 6.3.1 seems to come without much lead-in; it
would be nice to get a better transition into it, and maybe a mention of
"circuit breaker" and its releation thereto.
-GF: Done

In Section 7.1, third paragraph, I don't think I understand what "other
sharing network traffic" is supposed to mean, or really, what the example
is saying in general.
-GF: Done

In Section 7.2, second paragraph, "For sure" seems a rather informal way
of starting a sentence.
-GF: Already fixed!

The third sentence in that paragraph contains a comma splice.
-GF: Done

The last sentence of Section 7.2 could benefit from avoiding the pronoun
in "this protects other network traffic" to clarify what exactly is
providing the protection ("the network configuration", perhaps?).
-GF: Done

In Section 8, first paragraph, it's probably worth covering the failure
mode when the interval is too short, just for completeness (even though
it's ~obvious and covered elsewhere in the document).
-GF: Done.

-Ben