Re: [secdir] secdir review of draft-ietf-tsvwg-circuit-breaker-11
gorry@erg.abdn.ac.uk Fri, 12 February 2016 14:43 UTC
Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9E881A040C; Fri, 12 Feb 2016 06:43:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.202
X-Spam-Level:
X-Spam-Status: No, score=-4.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MVRDb2NgTp-C; Fri, 12 Feb 2016 06:43:08 -0800 (PST)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [139.133.204.173]) by ietfa.amsl.com (Postfix) with ESMTP id 05E3D1A0397; Fri, 12 Feb 2016 06:43:07 -0800 (PST)
Received: from erg.abdn.ac.uk (galactica.erg.abdn.ac.uk [139.133.210.32]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPA id 3F5FD1B001A1; Fri, 12 Feb 2016 14:50:37 +0000 (GMT)
Received: from 212.159.18.54 (SquirrelMail authenticated user gorry) by erg.abdn.ac.uk with HTTP; Fri, 12 Feb 2016 14:43:06 -0000
Message-ID: <123b6fdd10b8f735e3a5e0f9e5e57228.squirrel@erg.abdn.ac.uk>
In-Reply-To: <alpine.GSO.1.10.1602111737220.26829@multics.mit.edu>
References: <alpine.GSO.1.10.1602111737220.26829@multics.mit.edu>
Date: Fri, 12 Feb 2016 14:43:06 -0000
From: gorry@erg.abdn.ac.uk
To: Benjamin Kaduk <kaduk@MIT.EDU>
User-Agent: SquirrelMail/1.4.23 [SVN]
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Archived-At: <http://mailarchive.ietf.org/arch/msg/secdir/r_KL3CVj5vq-0cPnVAt5hRb4W2g>
X-Mailman-Approved-At: Fri, 12 Feb 2016 06:49:46 -0800
Cc: draft-ietf-tsvwg-circuit-breaker.all@ietf.org, iesg@ietf.org, secdir@ietf.org
Subject: Re: [secdir] secdir review of draft-ietf-tsvwg-circuit-breaker-11
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Feb 2016 14:43:12 -0000
GF: Thanks for the great review. I've noted my response below (prefixed by GF:) - nearly all of these will be directly addressed in rev -12. Gorry ---- I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments. This document is ready with nits (the rest of this paragraph), modulo one question I have (the following paragraph). Since it's more of a requirements doc than a full protocol specification, there are not too many requirements for security considerations. This document correctly notes the risk of an attacker using the circuit breaker mechanism for denial of service and the need for integrity and authenticity of control messages. It states that there is a trade-off between the cost of crypto and the need to authenticate control messages when there is a risk of on-path attack; I am a little uncomfortable with this statement (it is perhaps "too weak"), especially since it does not give guidance on determining the level of risk, but neither do I have a concrete objection to it. (Given the availability of physical network taps to at least nation-state-level actors, there seems to always be a risk of on-path attack.) Likewise, I am somewhat uneasy with the claim that just randomization of source port (or similar randomization in the packet header) suffices to deter an off-path attacker -- for example, in crypto, we usually talk of reducing the attacker's success probability to below 2^-32 or 2^-64 or something like that, but there are only ~2^16 ports numbers to randomize in, so the success probability from just port number randomization would not meet the usual criteria. So, perhaps that's not a good example to use on its own in the requirements document; other fields in the packet header could have a larger search space and be more reasonable for this purpose. The rest of the security considerations are good, covering the issues related to capacity and robustness, and mentioning the need for per-mechanism analysis. GF: Further advice is needed.I note the off-path protection of the source port is typically used per transport flow (could be appropriate to Fast-Trip CBs). For large flow aggregates the impact of an attack can be greater, and so perhaps also the recommendation should be stronger? The one question I have relates to the possibility of circuit breakers becoming ubiquitous. It seems pretty clear that going from a network with no circuit breakers to a network with one circuit breaker is worthwhile, offering a local improvement for the flows in question. But if all, or nearly all routes through the network traversed one or more circuit breakers -- is there a risk of cascading failure, either accidental or purposefully triggered by an attacker? In a network where circuit breakers occupy what a topologist would call a dense or fully-connected subset of the network, would one circuit breaker tripping cause subsequent breaker trips and near-complete network shutdown? GF: Not expected. The result of a circuit breaker tripping is to shed traffic (load), not to reduce capacity. This should mean that once a CB is triggered this results in less load to another cascaded circuit breaker. There seems to even be something of an analogue (though not a perfect one) in electrical circuit breakers, where an extreme failure in a device can cause the breaker in a power strip to fuse open, causing the breaker for the particular circuit in the building that it's on to fuse open, tripping the main breaker for the whole panel. It is uncommon for the cascade to continue to the mains for the building or the local substation, but is a known risk. So, has anyone thought about the behavior of circuit breakers if/when they are ubiquitous in the network? GF: For electricity supply, the circuit breaker stops the supply. So youre correct in that selectivity between circuit breakers connected in series requires that the operating or tripping time of the lower rated downstream circuit breaker is less than the operating or tripping time of the upstream circuit breaker. For short circuits where current limitation occurs and the energy let-through as the device clears the fault is significant, and you can indeed trip all the electrical circuit breakers. GF: However, I think in the networking context, things are different because the circuit breaker acts by turning off the source of traffic, not by turning off the capacity (other traffic can use the network after a source is disabled). In the network, the problem then becomes one of selecting a sufficiently long reaction time/robust measure to allow other functions to react before the source is turned off. --------------------------------------------------------------------------------- (Also, it's amusing to see "CB" used for "circuit breaker" in this context, as I'm so used to seeing it expand to "channel binding". It seems that the RFC Editor's abbreviation list https://www.rfc-editor.org/materials/abbrev.expansion.txt includes neither form...) Section 7.3 as written does not seem terribly connected to circuit breakers or the rest of the document. Should it be removed? -GF: Reworked This secdir review also comes with a bonus copyediting pass; iesg@ and secdir@ feel free to stop reading now. -GF: Sorry for making you proof-read, comments appreciated. The last sentence of the first paragraph of section 1 ("Just ... appliance.") does not have an independent clause, and leaves the reader hanging. -GF: Resolved. In Section 1, second paragraph, "countered by the requirement to use congestion control by the transmission control protocol" would probably flow better as "in the [TCP]" or "with the [TCP]", since although the TCP specification is what requires the use of congestion control, the TCP protocol itself is just using congestion control. -GF: Done. In Section 1, second paragraph, penultimate sentence, "applications of the Unix Datagram Protocol" suffers from the dual meaning of "applications" as "software programs" and "instances where it is used". The first time I read it, I flagged it to be changed to "applications using [UDP]", but of course it is the latter meaning that was intended. Not a big deal, but perhaps this could be rearranged to avoid the potential confusion. (I don't think there's a good word to just replace the single word with, though.) -GF: Done. Section 1, third paragraph, penultimate sentence has a comma splice. -GF: Done. Section 1, fourth paragraph, second sentence: a timescale is inherently an order-of-magnitude thing, and different paths have a different RTT, so there is not a single timescale on which congestion control operates. I suggest just saying "operates on the timescale of a packet RTT", but "operates on a timescale on the order of a packet RTT" is probably fine, too. -GF: Done. In the following sentence, the concept of "packet loss/marking" is calmly used with no introduction. I'm not personally familiar with packet marking in this sense, and though the usage later in the document gave me some rough sense of what it means, maybe a bit more introduction (e.g., an informational reference) would be useful. Or maybe it's a term of art in transport and I'm just not a practitioner; that's possible, too. -GF: Added ref to ECN spec. In a similar vein, "5-tuple" at the end of that same paragraph may want an informational reference to RFC 6146, or may be considered common knowledge for the target audience. -GF: Added: (e.g., a 5-tuple that includes the IP addresses, protocol, and ports). In the next paragraph, there's a comma splice in the penultimate sentence. -GF: Done. The long paragraph in the middle of page 4 seems to introduce a new term "control function" without much explanation; this phrase does not seem to be used anyplace else in the document (thought "control plane function" has one occurrence), so it seems likely that a slight rewording here would improve the document. (I'm not actually entirely sure what it's trying to say, so I don't have any concrete suggestions.) In this and the following sentence, it would be good to make more clear that the text is talking about circuit breakers and not other forms of congestion control -GF: This seems to have collected inputs from various people into a less read-able para. Rewritten. The second bullet point for examples of situations that could trigger circuit breakers ("traffic generated by an application...utilised for other purposes") confused me the first time I read it. Perhaps shuffle things around a bit to clarify that it is that "the network capacity provisioned for that application is being utilised for other purposes", though upon re-reading the existing text may suffice as-is. -GF: Did not change - couldnt work out how to do it better. The second sentence of the first (full, i.e., non-bullet-point) paragraph on page 5 seems to suffer from a bit of pronoun/antecedent confusion. In particular, "will generate elastic traffic that may be expected to regulate the load" reads as if it is the generated traffic itself that will regulate the load, whereas a common way of thinking about it would be that it is the application that is regulating the load produced by the traffic that the application generates. Also, in "the load it introduces" there is ambiguity as to whether "it" refers to the application or the traffic. (Perhaps this ambiguity is irrelevant, but in general ambiguity in a spec is to be avoided.) -GF: Reworked. In the following paragraph, the second sentence is a bit long, and heavily broken up by qualifiers that are not really needed ("all but impossible", "may further be the case", "may have some difficulty", "has in fact been tripped"). As copyeditor, I would suggest splitting this into two sentences and removing some of the unneeded words. -GF: Done Should "Circuit Breaker" be uniformly capitalized throughout the document? It is not capitalized in the first sentence of Section 1.1. (Perhaps the plural "Breakers" is also appropriate? -GF: Done On pages 8/9, it would be good to maintain parallel structure across the enumerated items, most notably by including "that" in the first sentences ("An ingress meter that records the number of packets", "A measurement function that combines", ...). Item 3 does not currently fit into that structure, and it may not be worth the drastic changes that would be needed to stuff it into place, since it is describing an action as opposed to the functions that are described in the other items. But it's probably worth making the easy changes. -GF: Done In item 3 of that list, the capital "An" is not needed after a semicolon, -GF: Done and there is another list within the second sentence that could gain a more parallel structure if "be sending another in-band" were replaced with "sending an in-band". -GF: Done In Section 4, fourth bullet point, "adjust the traffic to experienced congetsion" might be better as "adjust the traffic when congestion is experienced". -GF: Done The fifth (i.e., next) bullet point seems to lack a subject for the first sentence. Presumably it refers to the circuit breaker in question, but it's best to be explicit about it. -GF: Done The eighth bullet point (top of page 11), I'd put "it is" before the "triggered" in the parenthetical. -GF: Done In the sixteenth bullet point (second one on page 12), you refer to the "source" of control messages, which I think would more conventionally be written as the "authenticity" of those messages. ("Source" is used in this fashion in at least one other place in the document, so please change all occurrences if changing any.) -GF: Done I am a bit hazy on what exactly is going on in the example in Section 5.1 (the last three paragraphs), but I will chalk that up to my lack of knowledge about multicast routing. It's probably worth expanding and/or putting an informational reference for PIM-SM, though, and offsetting the "however" in the last sentence with commas. -GF: Done In the first paragraph of Section 5.2, please us the plural "paths" in "paths provisioned using the Resource reservation protocol". -GF: Done Given the success of UDP-based protocols like QUIC, mosh, BitTorrent, etc., it seems a little strong to have this claim in Section 6.1 that "all applications ought to use a full-featured transport" when the meaning seems to really just be that all applications ought to have congestion control functionality for their traffic, whether obtained via a full-featured transport or built directly into the application [protocol]. -GF: Yes. Of course, fixed. I would also consider removing the comma in the penultimate sentence of the first paragraph of Section 6.1, though I do not think I can claim that it is actually incorrect. -GF: Done In the next paragraph, "tailored *to* the type of traffic", and probably "when multiple congestion-controlled flows *combined* lead to short-term overload", since otherwise one could read it as saying that (multiple) (congestion-controlled flows lead to short-term overload), in which case the "multiple" is seemingly irrelevant. -GF: Done In the next paragraph (last one on page 15), there's a singular/plural mismatch in "a RTP-aware network devices"; I'd go with the plural, but it's your call. -GF: Done In Section 6.1.1, item 3 doesn't seem quite right -- I don't think that the breaker ought to trigger just by the act of using a TFRC-style check with a hard upper limit; I'd expect that the observed traffic would need to exceed that limit, too. (Also, expand "TFRC".) -GF: Done >From a document structure perspective, it's slightly jarring to not have a subsection 6.2.1 with a dedicated example, but I can understand why the document is currently the way it is. -GF: No action. The last sentence of Section 6.3.1 seems to come without much lead-in; it would be nice to get a better transition into it, and maybe a mention of "circuit breaker" and its releation thereto. -GF: Done In Section 7.1, third paragraph, I don't think I understand what "other sharing network traffic" is supposed to mean, or really, what the example is saying in general. -GF: Done In Section 7.2, second paragraph, "For sure" seems a rather informal way of starting a sentence. -GF: Already fixed! The third sentence in that paragraph contains a comma splice. -GF: Done The last sentence of Section 7.2 could benefit from avoiding the pronoun in "this protects other network traffic" to clarify what exactly is providing the protection ("the network configuration", perhaps?). -GF: Done In Section 8, first paragraph, it's probably worth covering the failure mode when the interval is too short, just for completeness (even though it's ~obvious and covered elsewhere in the document). -GF: Done. -Ben