Re: [tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05
Bob Briscoe <ietf@bobbriscoe.net> Fri, 09 October 2015 11:01 UTC
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 70F9A1B3247 for <tsvwg@ietfa.amsl.com>; Fri, 9 Oct 2015 04:01:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oblYruB4i8ZT for <tsvwg@ietfa.amsl.com>; Fri, 9 Oct 2015 04:01:47 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 358B51B3241 for <tsvwg@ietf.org>; Fri, 9 Oct 2015 04:01:47 -0700 (PDT)
Received: from 242.23.189.80.dyn.plus.net ([80.189.23.242]:54146 helo=[192.168.0.15]) by server.dnsblock1.com with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.85) (envelope-from <ietf@bobbriscoe.net>) id 1ZkVQk-00012l-JM; Fri, 09 Oct 2015 12:01:43 +0100
To: gorry@erg.abdn.ac.uk, tsvwg@ietf.org
References: <5616376D.4010505@bobbriscoe.net> <561657D9.5040908@erg.abdn.ac.uk> <56172149.1050307@bobbriscoe.net> <e6b05e949788b5f9cf8cf00c81aff0c8.squirrel@erg.abdn.ac.uk>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <56179E96.5010508@bobbriscoe.net>
Date: Fri, 09 Oct 2015 12:01:42 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <e6b05e949788b5f9cf8cf00c81aff0c8.squirrel@erg.abdn.ac.uk>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
Archived-At: <http://mailarchive.ietf.org/arch/msg/tsvwg/Z9NLCQEwRErasfk1Wp0Cmx-FpzY>
Subject: Re: [tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Oct 2015 11:01:56 -0000
Gorry, Understood. Sorry for the shot out of the blue. But I missed the WG last call in Aug. Bob On 09/10/15 03:55, gorry@erg.abdn.ac.uk wrote: > Thanks for the review. > > I can't now respond to this at this time, I wasn't expecting this review > and will be away. Some parts I agree with, some parts I don't. I'll get to > it in a couple of weeks, - I may not have answers to a few of the things > we avoided saying. > > Gorry > >> Gorry, >> >> Despite being past the WG stage, here's my review anyway. Consider this >> as early response to IETF last-call. >> >> In general I support the intent of this draft, but I am concerned at the >> severity of the problems I have found with it given it is meant to be >> about to go to the IESG. I am particularly concerned that I have found >> numerous significant problems with the normative requirements section. >> >> Have you had a substantial review from anyone before this? The level of >> review comments on the tsvwg list seemed quite light - picking on issues >> of particular concern, but not seeming to review the draft as a whole. >> >> *1. Intro: ** >> *Congestion Collapse is a very specific case - CB is much more general. >> it is clear from the draft that a CB is intended to mitigate >> circumstances wider than solely the extreme case of congestion collapse. >> For instance: a large unresponsive aggregate contributing to a high >> level of congestion alongside congestion responsive traffic. This is >> nowhere near congestion collapse, but it would be an applicable case for >> a circuit-breaker. Congestion collapse is a specific well-defined >> process that involves a cascade of congestion as a sequence of queues >> fill in turn moving in the upstream direction. It is due to continual >> retries or additional load arriving faster than existing flows are >> departing. {Note 1} >> >> The introduction mentions that TCP-style cc is only an appropriate >> remedy when long flows dominate. The implication that CB could be used >> to deal with congestion induced by many short flows is a step too far, >> IMO. This problem has not even been discussed in the IETF or IRTF to my >> knowledge, let alone in the context of this draft. In 6.2 this draft >> all-but says that a CB is a solution to this problem. I strongly object >> to a BCP making that assertion. CB would be a very drastic and clumsy >> solution to that problem.{Note 2} >> >> It says that the timescale at which a circuit-breaker operates must be >> seconds or tens of seconds - much longer than the RTT timescale on which >> TCP, SCTP and DCCP react. This disregards an important type of >> application response to congestion; it must say that the timescale also >> has to be longer than the timescale on which certain real-time >> applications operate their own circuit-breakers i.e. adapt down their >> codec rates, and eventually close the connection as a form of >> self-admission control. Applications operate per-flow circuit-breakers >> typically over the order of seconds or tens of seconds, so network CBs >> MUST take longer than that - I would say "no less than a minute". >> >> We MUST not discourage voluntary self-regulation by overriding it >> (end-to-end principle). I pick up this point later (comments on section >> 51.), arguing that the fast-trip CB for RTP should be considered as an >> application CB, and a network CB should always take longer to trigger >> than these app CBs. >> >> >> *1.1 Types of CB** >> ** >> *I saw criticism on the list of the use of the term "protect" in this >> section. Why hasn't it been changed? As the posting said, a CB does not >> protect the aggregate that it monitors; rather it /regulates/ the >> aggregate to protect the rest of the traffic that it is /not/ monitoring. >> >> *3.1 Functional Components.** >> * >> There is no mention of the problem of synchronising the ingress and >> egress measurements to allow for transit time. Given you are trying to >> measure loss, which is a relatively small difference between the traffic >> entering and leaving, you can get very bad errors if you don't take path >> delay into account. draft-ietf-tsvwg-tunnel-congestion-feedback >> describes a nice (and commonly used) stateless way of doing that, by >> sending the ingress measurement in-band to the egress, which triggers >> the egress measurement so they are synchronized; allowing for transit >> time. Then the egress can send them both back to the ingress to be >> compared and acted on. >> >> *4. Reqs** >> * >> >> There MUST be a control path from the ingress meter and the egress >> meter to the point of measurement. The Circuit Breaker MUST >> trigger if this control path fails. >> >> Either this is unclear terminology, or I strongly disagree. What do you >> mean by a control path? We should only recommend that the CB triggers >> due to lack of measurement signals if the measurement signals are >> carried in-band with the data being monitored. That is only one way of >> arranging the mechanism. The term control path, sounds like it is out of >> band. If the measurement signals are out of band, the CB MUST NOT >> trigger due to lack of measurement signals. I would recommend the >> in-band method, but there are plenty of network designers who will want >> to do this in centralised out of band ways, so we have to cater for that >> way of thinking (even tho it's misguided). >> >> The measurement period MUST be longer than the time that current >> Congestion Control algorithms need to reduce their rate following >> detection of congestion. >> >> This needs to be rewritten. Or just removed. It seems like ideas changed >> after it was written, and the end was changed but not the normative >> statement at the beginning. IMO, the measurement period can be >> arbitrarily short, as long as multiple measurements are combined before >> triggering the CB. It talks about unnecessarily penalizing long RTT >> flows, but the measurement period is nothing to do with the period >> before there is any penalization (defined later as the triggering >> interval). There is no problem with short measurement periods as long as >> any high congestion measured in these periods is averaged over all the >> measurement periods in the triggering interval. >> >> In fact, there should be many measurement intervals per trigger >> interval, so that there are many opportunities for measurement messages >> to get through. Otherwise if there are only one or two measurement >> periods per trigger interval, the possibility of a false trigger due to >> lost control signals becomes too great. >> >> o A Circuit Breaker is REQUIRED to define a threshold to determine >> whether the measured congestion is considered excessive. >> >> o A Circuit Breaker is REQUIRED to define the triggering interval, >> >> A perfectly good CB could vary the trigger interval and threshold >> depending on how rapidly congestion is rising, or how high its absolute >> level is. Indeed one could say it is actually wrong to define a single >> threshold or a single interval, so these normative statements are overly >> restrictive and preclude designs that are smarter than just simple fixed >> threshold. >> >> Also, see comment above about allowing time for application CBs, and >> suggesting one minute minumum. >> >> o A Circuit Breaker SHOULD be constructed so that it does not >> trigger under light or intermittent congestion, with a default >> response to a trigger that disables all traffic that contributed >> to congestion. >> >> The second half after the comma seems misplaced. If it does not trigger, >> why does the sentence go on to talk about disabling all traffic that >> contributed to congestion (which is what an /enabled/ trigger would do)? >> >> A reaction that results in a reduction SHOULD result in >> reducing the traffic by at least a factor of ten, >> >> What evidence have you got for this 10% number? It seems utterly >> inappropriate to write a number here. The number depends on what >> proportion of the traffic on the path between ingress and egress is >> regulated by the CB. If the proportion is low, it needs to reduce by a >> lot to make sufficient space for other traffic. If the proportion is >> high relative to other traffic, it might be sufficient to reduce by 5% >> to 95% of the previous load. If the tunnel traffic represented say 80% >> of the load on the path, and it reduced by a factor of 10, that would >> leave 92% of the path for other traffic, which might be unnecessarily >> much greater than the normal proportion used by other traffic. >> >> Manual operator >> intervention will usually be required to restore a flow. >> >> This sentence should be toned down to possibly, not usually. A human is >> no more capable than a machine is of bringing together all the necessary >> measurements to decide what other courses of action might be possible, >> and when to release the brakes. I suggest the last para of 5.3.1 starting: >> >> "An operator-based response provides opportunity..." >> >> is more appropriate here, and doesn't really fit where it is. >> >> Section 4.1 contains no requirements text, only examples. It ought to be >> moved from the normative requirements section to section 5 (Examples). >> >> >> *5. Examples:** >> * >> *5.1.1 Fast-Trip CB for RTP** >> * >> The draft needs to make the distinction between an application doing its >> own circuit breaking vs. functions on the path between the application >> endpoints (even if in the hosts) doing CB. The extremely important >> distinction is: >> 1a) an app knows when congestion is too high for it to work properly >> 1b) functions under the app can only infer congestion is possibly too >> high for most apps to work properly >> 2a) an app may be able to reduce the rate at which it sends data >> 2b) a function under an app can only discard data, not remove it at >> source. >> >> I believe that the requirements in section 4 do not apply to >> application-controlled circuit-breakers. So, I would not include the >> "Fast-Trip CB for RTP" as an example of a /network/ transport CB. >> >> As the requirements say, a network CB should never fast trip. >> By misclassifying RTP CBs as network CBs, you've allowed the timescale >> for network CBs to trigger after tens of seconds. When a network CB >> should allow app CBs this long to trigger themselves (as I said earlier). >> >> >> *Missing examples:** >> * >> * You might want to point to the flow termination function (as opposed >> to admission control) in the PCN architecture [RFC5559], which is >> precisely a network CB. It was precisely developed for cases where >> failures caused traffic to reroute onto a previously well-provisioned >> path (see 6.1). >> * Andrew McGregor gave the examples of Google's BwE (bandwidth enforcer) >> and B4, but you haven't referred to them. Given they are documented >> existence proof of this beast, that seems remiss. >> >> *7. Security Consid's** >> ** >> * >> >> The circuit breaker MUST be designed to be robust to packet loss that >> can also be experienced during congestion/overload. >> >> >> This implies reliable transmission - i.e. retransmit for ever until >> acknowledged. This is NOT a good idea. In >> ietf-tsvwg-tunnel-congestion-feedback we propose using SCTP partially >> reliable transport. Then if congestion causes messages to be lost, they >> don't have to be retransmitted if there are insufficient resources (thus >> not risking contributing to congestion collapse - and here I use the >> phrase correctly). Because they transmit counters, the missing counters >> values do not matter. This is the tried-and-tested message delivery >> approach used for IPFIX. The messages can still be given priority, but >> should not be retransmitted. >> >> Simple protection can be provided by using a >> randomized source port, or equivalent field in the packet header >> (such as the RTP SSRC value and the RTP sequence number) expected not >> to be known to an off-path attacker. >> >> I think the draft should recommend that for most scenarios, randomized >> ports will be insufficient protection for CB control messages, which >> should be properly crytographically authenticated. Otherwise, a >> CB-controlled aggregate is too vulnerable to these off-path attacks. >> >> *Gap #1:** >> ***The draft seems to think it is so obvious what a CB should measure >> that it only says it vaguely as "the level of congestion", and only >> suggests the difference between ingress and egress counters as an >> example. Some readers might well think like this: Does congestion level >> mean the percentage extra bit-rate relative to the aggregate's expected >> or maximum bit-rate? That might actually be a correct measure of >> congestion in some scenarios, but... >> >> The draft does not say that the congestion level is defined as dropped >> bytes divided by ingress bytes. The draft should spell out that a CB >> should measure the volume of bytes dropped and the volume of ECN-capable >> bytes marked with CE, and express these as a fraction of resp. total >> ingress non-ECT bytes and total ingress ECT bytes (assuming buffers >> within the scope of the CB are ECN-enabled). Even this is problematic, >> because the assumption in parentheses never holds, particularly during >> excessive congestion. It could also discuss the relative merit of >> measuring the percentage of packets dropped/marked instead of bytes. >> >> Also it should mention that care should be taken over how to combine the >> measurements. For instance avoid the common mistake of averaging >> fractions, because ave(c1/t1, c2/t2, c3/t3 ...) != (c1 + c2 + c3)/(t1 + >> t2 + t3). >> >> *Gap #2:** >> ***All the diags show multiple routers, but the text says congestion can >> be measured by comparing ingress and egress traffic. Nowhere does it say >> that only traffic with addressing that will have for-certain only passed >> through both ends should be measured. >> >> >> {Note 1}: A few years ago I dug deep into the history surrounding the >> early congestion collapses on the Internet and found that those involved >> were adamant that the term congestion collapse should not be waved >> around for dramatic effect, because it has a very specific definition, >> as paraphrased above. >> >> {Note 2}: The credit feature of ConEx was intended to address short-flow >> overload if it becomes a problem. DOn't get me wrong; I'm not objecting >> to the use of CBs for the short-flow problem because I want you to use >> my solution. I'm just using this as an example of a fine-grained way to >> solve the problem, rather than the sledge-hammer CB way. >> Here's the intuition briefly: With ConEx, you have to attach 'congestion >> credit' to the first packets of a flow to cover the risk of congestion >> before you have feedback (and if you don't and there is congestion, your >> packets are dropped by an audit function). Then congestion policers at >> the network ingress can limit the amount of congestion credit consumed >> without needing feedback, and thin out traffic if it consists of large >> numbers of short flows. If short flows come to predominate, ConEx credit >> was also designed to incentivize a new form of proxy that could regulate >> short-flows with a push-back style of congestion control, without a full >> feedback loop. That would be far preferable to such a drastic measure as >> a circuit-breaker. This aspect of ConEx was not written into the IETF >> docs, but it is mentioned in the re-ECN drafts that were the ancestors >> of ConEx. >> >> >> >> *Nits** >> * >> 3. >> s/last resort protection to the network paths that these are used./ >> /last resort protection to the traffic sharing their network path./ >> >> s/tunnels encapsulations/ >> /tunnel encapsulations/ >> >> 3. What makes a good CB? >> >> Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that >> carry non-congestion-controlled Internet flows and for traffic >> aggregates, e.g., traffic sent using a network tunnel. >> >> Delete " >> >> e.g., traffic sent using a network tunnel >> >> " >> Reason: this implies all network tunnels are problematic, whereas the >> rest of the sentence adequately says that only tunnels carrying >> non-congestion controlled flows are of concern. >> >> 4. >> >> s/monitor the level congestion/ >> /monitor the level of congestion/ >> >> 4.1.1 >> (e.g. to implement a Section 5.1) >> ? >> >> 4.1.2 >> s/pre-prosvisioned/ >> /pre-provisioned/ >> >> 6.1 >> >> One common question is whether a Circuit Breaker is needed when a >> tunnel is deployed in a private network with pre-provisioned >> capacity? >> >> Remove '?' from the end. >> >> 6.2 >> >> s/in the event that persistent congestion occur./ >> /in the event that persistent congestion occurs./ >> >> >> >> Regards >> >> >> >> Bob >> >> >> On 08/10/15 12:47, Gorry Fairhurst wrote: >>>> [Gorry, I also have to deliver on my promise on a paragraph for >>>> circuit-breaker. Do you have a deadline for that?] >>>> >>> The circuit-breaker ID is pending start of IETF last call, the >>> deadline for doing an author rev passed, sorry. >>> >>> -- >>> ________________________________________________________________ >>> Bob Briscoe http://bobbriscoe.net/ -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/