Re: [tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05

Bob Briscoe <ietf@bobbriscoe.net> Fri, 09 October 2015 11:01 UTC

To: gorry@erg.abdn.ac.uk, tsvwg@ietf.org
References: <5616376D.4010505@bobbriscoe.net> <561657D9.5040908@erg.abdn.ac.uk> <56172149.1050307@bobbriscoe.net> <e6b05e949788b5f9cf8cf00c81aff0c8.squirrel@erg.abdn.ac.uk>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <56179E96.5010508@bobbriscoe.net>
Date: Fri, 09 Oct 2015 12:01:42 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <e6b05e949788b5f9cf8cf00c81aff0c8.squirrel@erg.abdn.ac.uk>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/tsvwg/Z9NLCQEwRErasfk1Wp0Cmx-FpzY>
Subject: Re: [tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05
Precedence: list

Gorry,

Understood.
Sorry for the shot out of the blue. But I missed the WG last call in Aug.


Bob

On 09/10/15 03:55, gorry@erg.abdn.ac.uk wrote:
> Thanks for the review.
>
> I can't now respond to this at this time, I wasn't expecting this review
> and will be away. Some parts I agree with, some parts I don't. I'll get to
> it in a couple of weeks, - I may not have answers to a few of the things
> we avoided saying.
>
> Gorry
>
>> Gorry,
>>
>> Despite being past the WG stage, here's my review anyway. Consider this
>> as early response to IETF last-call.
>>
>> In general I support the intent of this draft, but I am concerned at the
>> severity of the problems I have found with it given it is meant to be
>> about to go to the IESG. I am particularly concerned that I have found
>> numerous significant problems with the normative requirements section.
>>
>> Have you had a substantial review from anyone before this? The level of
>> review comments on the tsvwg list seemed quite light - picking on issues
>> of particular concern, but not seeming to review the draft as a whole.
>>
>> *1. Intro: **
>> *Congestion Collapse is a very specific case - CB is much more general.
>> it is clear from the draft that a CB is intended to mitigate
>> circumstances wider than solely the extreme case of congestion collapse.
>> For instance: a large unresponsive aggregate contributing to a high
>> level of congestion alongside congestion responsive traffic. This is
>> nowhere near congestion collapse, but it would be an applicable case for
>> a circuit-breaker. Congestion collapse is a specific well-defined
>> process that involves a cascade of congestion as a sequence of queues
>> fill in turn moving in the upstream direction. It is due to continual
>> retries or additional load arriving faster than existing flows are
>> departing. {Note 1}
>>
>> The introduction mentions that TCP-style cc is only an appropriate
>> remedy when long flows dominate. The implication that CB could be used
>> to deal with congestion induced by many short flows is a step too far,
>> IMO. This problem has not even been discussed in the IETF or IRTF to my
>> knowledge, let alone in the context of this draft. In 6.2 this draft
>> all-but says that a CB is a solution to this problem. I strongly object
>> to a BCP making that assertion. CB would be a very drastic and clumsy
>> solution to that problem.{Note 2}
>>
>> It says that the timescale at which a circuit-breaker operates must be
>> seconds or tens of seconds - much longer than the RTT timescale on which
>> TCP, SCTP and DCCP react. This disregards an important type of
>> application response to congestion; it must say that the timescale also
>> has to be longer than the timescale on which certain real-time
>> applications operate their own circuit-breakers i.e. adapt down their
>> codec rates, and eventually close the connection as a form of
>> self-admission control. Applications operate per-flow circuit-breakers
>> typically over the order of seconds or tens of seconds, so network CBs
>> MUST take longer than that - I would say "no less than a minute".
>>
>> We MUST not discourage voluntary self-regulation by overriding it
>> (end-to-end principle). I pick up this point later (comments on section
>> 51.), arguing that the fast-trip CB for RTP should be considered as an
>> application CB, and a network CB should always take longer to trigger
>> than these app CBs.
>>
>>
>> *1.1 Types of CB**
>> **
>> *I saw criticism on the list of the use of the term "protect" in this
>> section. Why hasn't it been changed? As the posting said, a CB does not
>> protect the aggregate that it monitors; rather it /regulates/ the
>> aggregate to protect the rest of the traffic that it is /not/ monitoring.
>>
>> *3.1 Functional Components.**
>> *
>> There is no mention of the problem of synchronising the ingress and
>> egress measurements to allow for transit time. Given you are trying to
>> measure loss, which is a relatively small difference between the traffic
>> entering and leaving, you can get very bad errors if you don't take path
>> delay into account. draft-ietf-tsvwg-tunnel-congestion-feedback
>> describes a nice (and commonly used) stateless way of doing that, by
>> sending the ingress measurement in-band to the egress, which triggers
>> the egress measurement so they are synchronized; allowing for transit
>> time. Then the egress can send them both back to the ingress to be
>> compared and acted on.
>>
>> *4. Reqs**
>> *
>>
>>         There MUST be a control path from the ingress meter and the egress
>>         meter to the point of measurement.  The Circuit Breaker MUST
>>         trigger if this control path fails.
>>
>> Either this is unclear terminology, or I strongly disagree. What do you
>> mean by a control path? We should only recommend that the CB triggers
>> due to lack of measurement signals if the measurement signals are
>> carried in-band with the data being monitored. That is only one way of
>> arranging the mechanism. The term control path, sounds like it is out of
>> band. If the measurement signals are out of band, the CB MUST NOT
>> trigger due to lack of measurement signals. I would recommend the
>> in-band method, but there are plenty of network designers who will want
>> to do this in centralised out of band ways, so we have to cater for that
>> way of thinking (even tho it's misguided).
>>
>>         The measurement period MUST be longer than the time that current
>>         Congestion Control algorithms need to reduce their rate following
>>         detection of congestion.
>>
>> This needs to be rewritten. Or just removed. It seems like ideas changed
>> after it was written, and the end was changed but not the normative
>> statement at the beginning. IMO, the measurement period can be
>> arbitrarily short, as long as multiple measurements are combined before
>> triggering the CB. It talks about unnecessarily penalizing long RTT
>> flows, but the measurement period is nothing to do with the period
>> before there is any penalization (defined later as the triggering
>> interval). There is no problem with short measurement periods as long as
>> any high congestion measured in these periods is averaged over all the
>> measurement periods in the triggering interval.
>>
>> In fact, there should be many measurement intervals per trigger
>> interval, so that there are many opportunities for measurement messages
>> to get through. Otherwise if there are only one or two measurement
>> periods per trigger interval, the possibility of a false trigger due to
>> lost control signals becomes too great.
>>
>>      o  A Circuit Breaker is REQUIRED to define a threshold to determine
>>         whether the measured congestion is considered excessive.
>>
>>      o  A Circuit Breaker is REQUIRED to define the triggering interval,
>>
>> A perfectly good CB could vary the trigger interval and threshold
>> depending on how rapidly congestion is rising, or how high its absolute
>> level is. Indeed one could say it is actually wrong to define a single
>> threshold or a single interval, so these normative statements are overly
>> restrictive and preclude designs that are smarter than just simple fixed
>> threshold.
>>
>> Also, see comment above about allowing time for application CBs, and
>> suggesting one minute minumum.
>>
>> o  A Circuit Breaker SHOULD be constructed so that it does not
>>         trigger under light or intermittent congestion, with a default
>>         response to a trigger that disables all traffic that contributed
>>         to congestion.
>>
>> The second half after the comma seems misplaced. If it does not trigger,
>> why does the sentence go on to talk about disabling all traffic that
>> contributed to congestion (which is what an /enabled/ trigger would do)?
>>
>> A reaction that results in a reduction SHOULD result in
>>         reducing the traffic by at least a factor of ten,
>>
>> What evidence have you got for this 10% number? It seems utterly
>> inappropriate to write a number here. The number depends on what
>> proportion of the traffic on the path between ingress and egress is
>> regulated by the CB. If the proportion is low, it needs to reduce by a
>> lot to make sufficient space for other traffic. If the proportion is
>> high relative to other traffic, it might be sufficient to reduce by 5%
>> to 95% of the previous load. If the tunnel traffic represented say 80%
>> of the load on the path, and it reduced by a factor of 10, that would
>> leave 92% of the path for other traffic, which might be unnecessarily
>> much greater than the normal proportion used by other traffic.
>>
>>         Manual operator
>>         intervention will usually be required to restore a flow.
>>
>> This sentence should be toned down to possibly, not usually. A human is
>> no more capable than a machine is of bringing together all the necessary
>> measurements to decide what other courses of action might be possible,
>> and when to release the brakes. I suggest the last para of 5.3.1 starting:
>>
>> "An operator-based response provides opportunity..."
>>
>> is more appropriate here, and doesn't really fit where it is.
>>
>> Section 4.1 contains no requirements text, only examples. It ought to be
>> moved from the normative requirements section to section 5 (Examples).
>>
>>
>> *5. Examples:**
>> *
>> *5.1.1 Fast-Trip CB for RTP**
>> *
>> The draft needs to make the distinction between an application doing its
>> own circuit breaking vs. functions on the path between the application
>> endpoints (even if in the hosts) doing CB. The extremely important
>> distinction is:
>> 1a) an app knows when congestion is too high for it to work properly
>> 1b) functions under the app can only infer congestion is possibly too
>> high for most apps to work properly
>> 2a) an app may be able to reduce the rate at which it sends data
>> 2b) a function under an app can only discard data, not remove it at
>> source.
>>
>> I believe that the requirements in section 4 do not apply to
>> application-controlled circuit-breakers. So, I would not include the
>> "Fast-Trip CB for RTP" as an example of a /network/ transport CB.
>>
>> As the requirements say, a network CB should never fast trip.
>> By misclassifying RTP CBs as network CBs, you've allowed the timescale
>> for network CBs to trigger after tens of seconds. When a network CB
>> should allow app CBs this long to trigger themselves (as I said earlier).
>>
>>
>> *Missing examples:**
>> *
>> * You might want to point to the flow termination function (as opposed
>> to admission control) in the PCN architecture [RFC5559], which is
>> precisely a network CB. It was precisely developed for cases where
>> failures caused traffic to reroute onto a previously well-provisioned
>> path (see 6.1).
>> * Andrew McGregor gave the examples of Google's BwE (bandwidth enforcer)
>> and B4, but you haven't referred to them. Given they are documented
>> existence proof of this beast, that seems remiss.
>>
>> *7. Security Consid's**
>> **
>> *
>>
>>      The circuit breaker MUST be designed to be robust to packet loss that
>>      can also be experienced during congestion/overload.
>>
>>
>> This implies reliable transmission - i.e. retransmit for ever until
>> acknowledged. This is NOT a good idea. In
>> ietf-tsvwg-tunnel-congestion-feedback we propose using SCTP partially
>> reliable transport. Then if congestion causes messages to be lost, they
>> don't have to be retransmitted if there are insufficient resources (thus
>> not risking contributing to congestion collapse - and here I use the
>> phrase correctly). Because they transmit counters, the missing counters
>> values do not matter. This is the tried-and-tested message delivery
>> approach used for IPFIX. The messages can still be given priority, but
>> should not be retransmitted.
>>
>>      Simple protection can be provided by using a
>>      randomized source port, or equivalent field in the packet header
>>      (such as the RTP SSRC value and the RTP sequence number) expected not
>>      to be known to an off-path attacker.
>>
>> I think the draft should recommend that for most scenarios, randomized
>> ports will be insufficient protection for CB control messages, which
>> should be properly crytographically authenticated. Otherwise, a
>> CB-controlled aggregate is too vulnerable to these off-path attacks.
>>
>> *Gap #1:**
>> ***The draft seems to think it is so obvious what a CB should measure
>> that it only says it vaguely as "the level of congestion", and only
>> suggests the difference between ingress and egress counters as an
>> example. Some readers might well think like this: Does congestion level
>> mean the percentage extra bit-rate relative to the aggregate's expected
>> or maximum bit-rate? That might actually be a correct measure of
>> congestion in some scenarios, but...
>>
>> The draft does not say that the congestion level is defined as dropped
>> bytes divided by ingress bytes. The draft should spell out that a CB
>> should measure the volume of bytes dropped and the volume of ECN-capable
>> bytes marked with CE, and express these as a fraction of resp. total
>> ingress non-ECT bytes and total ingress ECT bytes (assuming buffers
>> within the scope of the CB are ECN-enabled). Even this is problematic,
>> because the assumption in parentheses never holds, particularly during
>> excessive congestion. It could also discuss the relative merit of
>> measuring the percentage of packets dropped/marked instead of bytes.
>>
>> Also it should mention that care should be taken over how to combine the
>> measurements. For instance avoid the common mistake of averaging
>> fractions, because ave(c1/t1, c2/t2, c3/t3 ...) != (c1 + c2 + c3)/(t1 +
>> t2 + t3).
>>
>> *Gap #2:**
>> ***All the diags show multiple routers, but the text says congestion can
>> be measured by comparing ingress and egress traffic. Nowhere does it say
>> that only traffic with addressing that will have for-certain only passed
>> through both ends should be measured.
>>
>>
>> {Note 1}: A few years ago I dug deep into the history surrounding the
>> early congestion collapses on the Internet and found that those involved
>> were adamant that the term congestion collapse should not be waved
>> around for dramatic effect, because it has a very specific definition,
>> as paraphrased above.
>>
>> {Note 2}: The credit feature of ConEx was intended to address short-flow
>> overload if it becomes a problem. DOn't get me wrong; I'm not objecting
>> to the use of CBs for the short-flow problem because I want you to use
>> my solution. I'm just using this as an example of a fine-grained way to
>> solve the problem, rather than the sledge-hammer CB way.
>> Here's the intuition briefly: With ConEx, you have to attach 'congestion
>> credit' to the first packets of a flow to cover the risk of congestion
>> before you have feedback (and if you don't and there is congestion, your
>> packets are dropped by an audit function). Then congestion policers at
>> the network ingress can limit the amount of congestion credit consumed
>> without needing feedback, and thin out traffic if it consists of large
>> numbers of short flows. If short flows come to predominate, ConEx credit
>> was also designed to incentivize a new form of proxy that could regulate
>> short-flows with a push-back style of congestion control, without a full
>> feedback loop. That would be far preferable to such a drastic measure as
>> a circuit-breaker. This aspect of ConEx was not written into the IETF
>> docs, but it is mentioned in the re-ECN drafts that were the ancestors
>> of ConEx.
>>
>>
>>
>> *Nits**
>> *
>> 3.
>> s/last resort protection to the network paths that these are used./
>>    /last resort protection to the traffic sharing their network path./
>>
>> s/tunnels encapsulations/
>>    /tunnel encapsulations/
>>
>> 3. What makes a good CB?
>>
>>      Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that
>>      carry non-congestion-controlled Internet flows and for traffic
>>      aggregates, e.g., traffic sent using a network tunnel.
>>
>> Delete "
>>
>> e.g., traffic sent using a network tunnel
>>
>> "
>> Reason: this implies all network tunnels are problematic, whereas the
>> rest of the sentence adequately says that only tunnels carrying
>> non-congestion controlled flows are of concern.
>>
>> 4.
>>
>> s/monitor the level congestion/
>>    /monitor the level of congestion/
>>
>> 4.1.1
>> (e.g. to implement a Section 5.1)
>> ?
>>
>> 4.1.2
>> s/pre-prosvisioned/
>>    /pre-provisioned/
>>
>> 6.1
>>
>>      One common question is whether a Circuit Breaker is needed when a
>>      tunnel is deployed in a private network with pre-provisioned
>>      capacity?
>>
>> Remove '?' from the end.
>>
>> 6.2
>>
>> s/in the event that persistent congestion occur./
>>    /in the event that persistent congestion occurs./
>>
>>
>>
>> Regards
>>
>>
>>
>> Bob
>>
>>
>> On 08/10/15 12:47, Gorry Fairhurst wrote:
>>>> [Gorry, I also have to deliver on my promise on a paragraph for
>>>> circuit-breaker. Do you have a deadline for that?]
>>>>
>>> The circuit-breaker ID is pending start of IETF last call, the
>>> deadline for doing an author rev passed, sorry.
>>>
>>> --
>>> ________________________________________________________________
>>> Bob Briscoe                               http://bobbriscoe.net/

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

[tsvwg] Extensive review of draft-ietf-tsvwg-circ… Bob Briscoe
Re: [tsvwg] Extensive review of draft-ietf-tsvwg-… gorry
Re: [tsvwg] Extensive review of draft-ietf-tsvwg-… Bob Briscoe