[tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05
Bob Briscoe <ietf@bobbriscoe.net> Fri, 09 October 2015 02:07 UTC
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A48091A8724 for <tsvwg@ietfa.amsl.com>; Thu, 8 Oct 2015 19:07:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id itsYohS5CVpo for <tsvwg@ietfa.amsl.com>; Thu, 8 Oct 2015 19:07:11 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6D95C1A86E3 for <tsvwg@ietf.org>; Thu, 8 Oct 2015 19:07:10 -0700 (PDT)
Received: from 242.23.189.80.dyn.plus.net ([80.189.23.242]:54121 helo=[192.168.0.15]) by server.dnsblock1.com with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.85) (envelope-from <ietf@bobbriscoe.net>) id 1ZkN5P-0000dS-GQ; Fri, 09 Oct 2015 03:07:08 +0100
To: gorry@erg.abdn.ac.uk
References: <5616376D.4010505@bobbriscoe.net> <561657D9.5040908@erg.abdn.ac.uk>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <56172149.1050307@bobbriscoe.net>
Date: Fri, 09 Oct 2015 03:07:05 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <561657D9.5040908@erg.abdn.ac.uk>
Content-Type: multipart/alternative; boundary="------------010108090208070608040504"
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
Archived-At: <http://mailarchive.ietf.org/arch/msg/tsvwg/DbQ_k7WA2CLh5eOqqgfeZ7i96l4>
Cc: tsvwg IETF list <tsvwg@ietf.org>
Subject: [tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Oct 2015 02:07:19 -0000
Gorry, Despite being past the WG stage, here's my review anyway. Consider this as early response to IETF last-call. In general I support the intent of this draft, but I am concerned at the severity of the problems I have found with it given it is meant to be about to go to the IESG. I am particularly concerned that I have found numerous significant problems with the normative requirements section. Have you had a substantial review from anyone before this? The level of review comments on the tsvwg list seemed quite light - picking on issues of particular concern, but not seeming to review the draft as a whole. *1. Intro: ** *Congestion Collapse is a very specific case - CB is much more general. it is clear from the draft that a CB is intended to mitigate circumstances wider than solely the extreme case of congestion collapse. For instance: a large unresponsive aggregate contributing to a high level of congestion alongside congestion responsive traffic. This is nowhere near congestion collapse, but it would be an applicable case for a circuit-breaker. Congestion collapse is a specific well-defined process that involves a cascade of congestion as a sequence of queues fill in turn moving in the upstream direction. It is due to continual retries or additional load arriving faster than existing flows are departing. {Note 1} The introduction mentions that TCP-style cc is only an appropriate remedy when long flows dominate. The implication that CB could be used to deal with congestion induced by many short flows is a step too far, IMO. This problem has not even been discussed in the IETF or IRTF to my knowledge, let alone in the context of this draft. In 6.2 this draft all-but says that a CB is a solution to this problem. I strongly object to a BCP making that assertion. CB would be a very drastic and clumsy solution to that problem.{Note 2} It says that the timescale at which a circuit-breaker operates must be seconds or tens of seconds - much longer than the RTT timescale on which TCP, SCTP and DCCP react. This disregards an important type of application response to congestion; it must say that the timescale also has to be longer than the timescale on which certain real-time applications operate their own circuit-breakers i.e. adapt down their codec rates, and eventually close the connection as a form of self-admission control. Applications operate per-flow circuit-breakers typically over the order of seconds or tens of seconds, so network CBs MUST take longer than that - I would say "no less than a minute". We MUST not discourage voluntary self-regulation by overriding it (end-to-end principle). I pick up this point later (comments on section 51.), arguing that the fast-trip CB for RTP should be considered as an application CB, and a network CB should always take longer to trigger than these app CBs. *1.1 Types of CB** ** *I saw criticism on the list of the use of the term "protect" in this section. Why hasn't it been changed? As the posting said, a CB does not protect the aggregate that it monitors; rather it /regulates/ the aggregate to protect the rest of the traffic that it is /not/ monitoring. *3.1 Functional Components.** * There is no mention of the problem of synchronising the ingress and egress measurements to allow for transit time. Given you are trying to measure loss, which is a relatively small difference between the traffic entering and leaving, you can get very bad errors if you don't take path delay into account. draft-ietf-tsvwg-tunnel-congestion-feedback describes a nice (and commonly used) stateless way of doing that, by sending the ingress measurement in-band to the egress, which triggers the egress measurement so they are synchronized; allowing for transit time. Then the egress can send them both back to the ingress to be compared and acted on. *4. Reqs** * There MUST be a control path from the ingress meter and the egress meter to the point of measurement. The Circuit Breaker MUST trigger if this control path fails. Either this is unclear terminology, or I strongly disagree. What do you mean by a control path? We should only recommend that the CB triggers due to lack of measurement signals if the measurement signals are carried in-band with the data being monitored. That is only one way of arranging the mechanism. The term control path, sounds like it is out of band. If the measurement signals are out of band, the CB MUST NOT trigger due to lack of measurement signals. I would recommend the in-band method, but there are plenty of network designers who will want to do this in centralised out of band ways, so we have to cater for that way of thinking (even tho it's misguided). The measurement period MUST be longer than the time that current Congestion Control algorithms need to reduce their rate following detection of congestion. This needs to be rewritten. Or just removed. It seems like ideas changed after it was written, and the end was changed but not the normative statement at the beginning. IMO, the measurement period can be arbitrarily short, as long as multiple measurements are combined before triggering the CB. It talks about unnecessarily penalizing long RTT flows, but the measurement period is nothing to do with the period before there is any penalization (defined later as the triggering interval). There is no problem with short measurement periods as long as any high congestion measured in these periods is averaged over all the measurement periods in the triggering interval. In fact, there should be many measurement intervals per trigger interval, so that there are many opportunities for measurement messages to get through. Otherwise if there are only one or two measurement periods per trigger interval, the possibility of a false trigger due to lost control signals becomes too great. o A Circuit Breaker is REQUIRED to define a threshold to determine whether the measured congestion is considered excessive. o A Circuit Breaker is REQUIRED to define the triggering interval, A perfectly good CB could vary the trigger interval and threshold depending on how rapidly congestion is rising, or how high its absolute level is. Indeed one could say it is actually wrong to define a single threshold or a single interval, so these normative statements are overly restrictive and preclude designs that are smarter than just simple fixed threshold. Also, see comment above about allowing time for application CBs, and suggesting one minute minumum. o A Circuit Breaker SHOULD be constructed so that it does not trigger under light or intermittent congestion, with a default response to a trigger that disables all traffic that contributed to congestion. The second half after the comma seems misplaced. If it does not trigger, why does the sentence go on to talk about disabling all traffic that contributed to congestion (which is what an /enabled/ trigger would do)? A reaction that results in a reduction SHOULD result in reducing the traffic by at least a factor of ten, What evidence have you got for this 10% number? It seems utterly inappropriate to write a number here. The number depends on what proportion of the traffic on the path between ingress and egress is regulated by the CB. If the proportion is low, it needs to reduce by a lot to make sufficient space for other traffic. If the proportion is high relative to other traffic, it might be sufficient to reduce by 5% to 95% of the previous load. If the tunnel traffic represented say 80% of the load on the path, and it reduced by a factor of 10, that would leave 92% of the path for other traffic, which might be unnecessarily much greater than the normal proportion used by other traffic. Manual operator intervention will usually be required to restore a flow. This sentence should be toned down to possibly, not usually. A human is no more capable than a machine is of bringing together all the necessary measurements to decide what other courses of action might be possible, and when to release the brakes. I suggest the last para of 5.3.1 starting: "An operator-based response provides opportunity..." is more appropriate here, and doesn't really fit where it is. Section 4.1 contains no requirements text, only examples. It ought to be moved from the normative requirements section to section 5 (Examples). *5. Examples:** * *5.1.1 Fast-Trip CB for RTP** * The draft needs to make the distinction between an application doing its own circuit breaking vs. functions on the path between the application endpoints (even if in the hosts) doing CB. The extremely important distinction is: 1a) an app knows when congestion is too high for it to work properly 1b) functions under the app can only infer congestion is possibly too high for most apps to work properly 2a) an app may be able to reduce the rate at which it sends data 2b) a function under an app can only discard data, not remove it at source. I believe that the requirements in section 4 do not apply to application-controlled circuit-breakers. So, I would not include the "Fast-Trip CB for RTP" as an example of a /network/ transport CB. As the requirements say, a network CB should never fast trip. By misclassifying RTP CBs as network CBs, you've allowed the timescale for network CBs to trigger after tens of seconds. When a network CB should allow app CBs this long to trigger themselves (as I said earlier). *Missing examples:** * * You might want to point to the flow termination function (as opposed to admission control) in the PCN architecture [RFC5559], which is precisely a network CB. It was precisely developed for cases where failures caused traffic to reroute onto a previously well-provisioned path (see 6.1). * Andrew McGregor gave the examples of Google's BwE (bandwidth enforcer) and B4, but you haven't referred to them. Given they are documented existence proof of this beast, that seems remiss. *7. Security Consid's** ** * The circuit breaker MUST be designed to be robust to packet loss that can also be experienced during congestion/overload. This implies reliable transmission - i.e. retransmit for ever until acknowledged. This is NOT a good idea. In ietf-tsvwg-tunnel-congestion-feedback we propose using SCTP partially reliable transport. Then if congestion causes messages to be lost, they don't have to be retransmitted if there are insufficient resources (thus not risking contributing to congestion collapse - and here I use the phrase correctly). Because they transmit counters, the missing counters values do not matter. This is the tried-and-tested message delivery approach used for IPFIX. The messages can still be given priority, but should not be retransmitted. Simple protection can be provided by using a randomized source port, or equivalent field in the packet header (such as the RTP SSRC value and the RTP sequence number) expected not to be known to an off-path attacker. I think the draft should recommend that for most scenarios, randomized ports will be insufficient protection for CB control messages, which should be properly crytographically authenticated. Otherwise, a CB-controlled aggregate is too vulnerable to these off-path attacks. *Gap #1:** ***The draft seems to think it is so obvious what a CB should measure that it only says it vaguely as "the level of congestion", and only suggests the difference between ingress and egress counters as an example. Some readers might well think like this: Does congestion level mean the percentage extra bit-rate relative to the aggregate's expected or maximum bit-rate? That might actually be a correct measure of congestion in some scenarios, but... The draft does not say that the congestion level is defined as dropped bytes divided by ingress bytes. The draft should spell out that a CB should measure the volume of bytes dropped and the volume of ECN-capable bytes marked with CE, and express these as a fraction of resp. total ingress non-ECT bytes and total ingress ECT bytes (assuming buffers within the scope of the CB are ECN-enabled). Even this is problematic, because the assumption in parentheses never holds, particularly during excessive congestion. It could also discuss the relative merit of measuring the percentage of packets dropped/marked instead of bytes. Also it should mention that care should be taken over how to combine the measurements. For instance avoid the common mistake of averaging fractions, because ave(c1/t1, c2/t2, c3/t3 ...) != (c1 + c2 + c3)/(t1 + t2 + t3). *Gap #2:** ***All the diags show multiple routers, but the text says congestion can be measured by comparing ingress and egress traffic. Nowhere does it say that only traffic with addressing that will have for-certain only passed through both ends should be measured. {Note 1}: A few years ago I dug deep into the history surrounding the early congestion collapses on the Internet and found that those involved were adamant that the term congestion collapse should not be waved around for dramatic effect, because it has a very specific definition, as paraphrased above. {Note 2}: The credit feature of ConEx was intended to address short-flow overload if it becomes a problem. DOn't get me wrong; I'm not objecting to the use of CBs for the short-flow problem because I want you to use my solution. I'm just using this as an example of a fine-grained way to solve the problem, rather than the sledge-hammer CB way. Here's the intuition briefly: With ConEx, you have to attach 'congestion credit' to the first packets of a flow to cover the risk of congestion before you have feedback (and if you don't and there is congestion, your packets are dropped by an audit function). Then congestion policers at the network ingress can limit the amount of congestion credit consumed without needing feedback, and thin out traffic if it consists of large numbers of short flows. If short flows come to predominate, ConEx credit was also designed to incentivize a new form of proxy that could regulate short-flows with a push-back style of congestion control, without a full feedback loop. That would be far preferable to such a drastic measure as a circuit-breaker. This aspect of ConEx was not written into the IETF docs, but it is mentioned in the re-ECN drafts that were the ancestors of ConEx. *Nits** * 3. s/last resort protection to the network paths that these are used./ /last resort protection to the traffic sharing their network path./ s/tunnels encapsulations/ /tunnel encapsulations/ 3. What makes a good CB? Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that carry non-congestion-controlled Internet flows and for traffic aggregates, e.g., traffic sent using a network tunnel. Delete " e.g., traffic sent using a network tunnel " Reason: this implies all network tunnels are problematic, whereas the rest of the sentence adequately says that only tunnels carrying non-congestion controlled flows are of concern. 4. s/monitor the level congestion/ /monitor the level of congestion/ 4.1.1 (e.g. to implement a Section 5.1) ? 4.1.2 s/pre-prosvisioned/ /pre-provisioned/ 6.1 One common question is whether a Circuit Breaker is needed when a tunnel is deployed in a private network with pre-provisioned capacity? Remove '?' from the end. 6.2 s/in the event that persistent congestion occur./ /in the event that persistent congestion occurs./ Regards Bob On 08/10/15 12:47, Gorry Fairhurst wrote: >> [Gorry, I also have to deliver on my promise on a paragraph for >> circuit-breaker. Do you have a deadline for that?] >> > The circuit-breaker ID is pending start of IETF last call, the > deadline for doing an author rev passed, sorry. > > -- > ________________________________________________________________ > Bob Briscoe http://bobbriscoe.net/