[tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05

Bob Briscoe <ietf@bobbriscoe.net> Fri, 09 October 2015 02:07 UTC

To: gorry@erg.abdn.ac.uk
References: <5616376D.4010505@bobbriscoe.net> <561657D9.5040908@erg.abdn.ac.uk>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <56172149.1050307@bobbriscoe.net>
Date: Fri, 09 Oct 2015 03:07:05 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <561657D9.5040908@erg.abdn.ac.uk>
Content-Type: multipart/alternative; boundary="------------010108090208070608040504"
Archived-At: <http://mailarchive.ietf.org/arch/msg/tsvwg/DbQ_k7WA2CLh5eOqqgfeZ7i96l4>
Cc: tsvwg IETF list <tsvwg@ietf.org>
Subject: [tsvwg] Extensive review of draft-ietf-tsvwg-circuit-breaker-05
Precedence: list

Gorry,

Despite being past the WG stage, here's my review anyway. Consider this 
as early response to IETF last-call.

In general I support the intent of this draft, but I am concerned at the 
severity of the problems I have found with it given it is meant to be 
about to go to the IESG. I am particularly concerned that I have found 
numerous significant problems with the normative requirements section.

Have you had a substantial review from anyone before this? The level of 
review comments on the tsvwg list seemed quite light - picking on issues 
of particular concern, but not seeming to review the draft as a whole.

*1. Intro: **
*Congestion Collapse is a very specific case - CB is much more general.
it is clear from the draft that a CB is intended to mitigate 
circumstances wider than solely the extreme case of congestion collapse. 
For instance: a large unresponsive aggregate contributing to a high 
level of congestion alongside congestion responsive traffic. This is 
nowhere near congestion collapse, but it would be an applicable case for 
a circuit-breaker. Congestion collapse is a specific well-defined 
process that involves a cascade of congestion as a sequence of queues 
fill in turn moving in the upstream direction. It is due to continual 
retries or additional load arriving faster than existing flows are 
departing. {Note 1}

The introduction mentions that TCP-style cc is only an appropriate 
remedy when long flows dominate. The implication that CB could be used 
to deal with congestion induced by many short flows is a step too far, 
IMO. This problem has not even been discussed in the IETF or IRTF to my 
knowledge, let alone in the context of this draft. In 6.2 this draft 
all-but says that a CB is a solution to this problem. I strongly object 
to a BCP making that assertion. CB would be a very drastic and clumsy 
solution to that problem.{Note 2}

It says that the timescale at which a circuit-breaker operates must be 
seconds or tens of seconds - much longer than the RTT timescale on which 
TCP, SCTP and DCCP react. This disregards an important type of 
application response to congestion; it must say that the timescale also 
has to be longer than the timescale on which certain real-time 
applications operate their own circuit-breakers i.e. adapt down their 
codec rates, and eventually close the connection as a form of 
self-admission control. Applications operate per-flow circuit-breakers 
typically over the order of seconds or tens of seconds, so network CBs 
MUST take longer than that - I would say "no less than a minute".

We MUST not discourage voluntary self-regulation by overriding it 
(end-to-end principle). I pick up this point later (comments on section 
51.), arguing that the fast-trip CB for RTP should be considered as an 
application CB, and a network CB should always take longer to trigger 
than these app CBs.

*1.1 Types of CB**
**
*I saw criticism on the list of the use of the term "protect" in this 
section. Why hasn't it been changed? As the posting said, a CB does not 
protect the aggregate that it monitors; rather it /regulates/ the 
aggregate to protect the rest of the traffic that it is /not/ monitoring.

*3.1 Functional Components.**
*
There is no mention of the problem of synchronising the ingress and 
egress measurements to allow for transit time. Given you are trying to 
measure loss, which is a relatively small difference between the traffic 
entering and leaving, you can get very bad errors if you don't take path 
delay into account. draft-ietf-tsvwg-tunnel-congestion-feedback 
describes a nice (and commonly used) stateless way of doing that, by 
sending the ingress measurement in-band to the egress, which triggers 
the egress measurement so they are synchronized; allowing for transit 
time. Then the egress can send them both back to the ingress to be 
compared and acted on.

*4. Reqs**
*

       There MUST be a control path from the ingress meter and the egress
       meter to the point of measurement.  The Circuit Breaker MUST
       trigger if this control path fails.

Either this is unclear terminology, or I strongly disagree. What do you 
mean by a control path? We should only recommend that the CB triggers 
due to lack of measurement signals if the measurement signals are 
carried in-band with the data being monitored. That is only one way of 
arranging the mechanism. The term control path, sounds like it is out of 
band. If the measurement signals are out of band, the CB MUST NOT 
trigger due to lack of measurement signals. I would recommend the 
in-band method, but there are plenty of network designers who will want 
to do this in centralised out of band ways, so we have to cater for that 
way of thinking (even tho it's misguided).

       The measurement period MUST be longer than the time that current
       Congestion Control algorithms need to reduce their rate following
       detection of congestion.

This needs to be rewritten. Or just removed. It seems like ideas changed 
after it was written, and the end was changed but not the normative 
statement at the beginning. IMO, the measurement period can be 
arbitrarily short, as long as multiple measurements are combined before 
triggering the CB. It talks about unnecessarily penalizing long RTT 
flows, but the measurement period is nothing to do with the period 
before there is any penalization (defined later as the triggering 
interval). There is no problem with short measurement periods as long as 
any high congestion measured in these periods is averaged over all the 
measurement periods in the triggering interval.

In fact, there should be many measurement intervals per trigger 
interval, so that there are many opportunities for measurement messages 
to get through. Otherwise if there are only one or two measurement 
periods per trigger interval, the possibility of a false trigger due to 
lost control signals becomes too great.

    o  A Circuit Breaker is REQUIRED to define a threshold to determine
       whether the measured congestion is considered excessive.

    o  A Circuit Breaker is REQUIRED to define the triggering interval,

A perfectly good CB could vary the trigger interval and threshold 
depending on how rapidly congestion is rising, or how high its absolute 
level is. Indeed one could say it is actually wrong to define a single 
threshold or a single interval, so these normative statements are overly 
restrictive and preclude designs that are smarter than just simple fixed 
threshold.

Also, see comment above about allowing time for application CBs, and 
suggesting one minute minumum.

o  A Circuit Breaker SHOULD be constructed so that it does not
       trigger under light or intermittent congestion, with a default
       response to a trigger that disables all traffic that contributed
       to congestion.

The second half after the comma seems misplaced. If it does not trigger, 
why does the sentence go on to talk about disabling all traffic that 
contributed to congestion (which is what an /enabled/ trigger would do)?

A reaction that results in a reduction SHOULD result in
       reducing the traffic by at least a factor of ten,

What evidence have you got for this 10% number? It seems utterly 
inappropriate to write a number here. The number depends on what 
proportion of the traffic on the path between ingress and egress is 
regulated by the CB. If the proportion is low, it needs to reduce by a 
lot to make sufficient space for other traffic. If the proportion is 
high relative to other traffic, it might be sufficient to reduce by 5% 
to 95% of the previous load. If the tunnel traffic represented say 80% 
of the load on the path, and it reduced by a factor of 10, that would 
leave 92% of the path for other traffic, which might be unnecessarily 
much greater than the normal proportion used by other traffic.

       Manual operator
       intervention will usually be required to restore a flow.

This sentence should be toned down to possibly, not usually. A human is 
no more capable than a machine is of bringing together all the necessary 
measurements to decide what other courses of action might be possible, 
and when to release the brakes. I suggest the last para of 5.3.1 starting:

"An operator-based response provides opportunity..."

is more appropriate here, and doesn't really fit where it is.

Section 4.1 contains no requirements text, only examples. It ought to be 
moved from the normative requirements section to section 5 (Examples).

*5. Examples:**
*
*5.1.1 Fast-Trip CB for RTP**
*
The draft needs to make the distinction between an application doing its 
own circuit breaking vs. functions on the path between the application 
endpoints (even if in the hosts) doing CB. The extremely important 
distinction is:
1a) an app knows when congestion is too high for it to work properly
1b) functions under the app can only infer congestion is possibly too 
high for most apps to work properly
2a) an app may be able to reduce the rate at which it sends data
2b) a function under an app can only discard data, not remove it at source.

I believe that the requirements in section 4 do not apply to 
application-controlled circuit-breakers. So, I would not include the 
"Fast-Trip CB for RTP" as an example of a /network/ transport CB.

As the requirements say, a network CB should never fast trip.
By misclassifying RTP CBs as network CBs, you've allowed the timescale 
for network CBs to trigger after tens of seconds. When a network CB 
should allow app CBs this long to trigger themselves (as I said earlier).

*Missing examples:**
*
* You might want to point to the flow termination function (as opposed 
to admission control) in the PCN architecture [RFC5559], which is 
precisely a network CB. It was precisely developed for cases where 
failures caused traffic to reroute onto a previously well-provisioned 
path (see 6.1).
* Andrew McGregor gave the examples of Google's BwE (bandwidth enforcer) 
and B4, but you haven't referred to them. Given they are documented 
existence proof of this beast, that seems remiss.

*7. Security Consid's**
**
*

    The circuit breaker MUST be designed to be robust to packet loss that
    can also be experienced during congestion/overload.

This implies reliable transmission - i.e. retransmit for ever until 
acknowledged. This is NOT a good idea. In 
ietf-tsvwg-tunnel-congestion-feedback we propose using SCTP partially 
reliable transport. Then if congestion causes messages to be lost, they 
don't have to be retransmitted if there are insufficient resources (thus 
not risking contributing to congestion collapse - and here I use the 
phrase correctly). Because they transmit counters, the missing counters 
values do not matter. This is the tried-and-tested message delivery 
approach used for IPFIX. The messages can still be given priority, but 
should not be retransmitted.

    Simple protection can be provided by using a
    randomized source port, or equivalent field in the packet header
    (such as the RTP SSRC value and the RTP sequence number) expected not
    to be known to an off-path attacker.

I think the draft should recommend that for most scenarios, randomized 
ports will be insufficient protection for CB control messages, which 
should be properly crytographically authenticated. Otherwise, a 
CB-controlled aggregate is too vulnerable to these off-path attacks.

*Gap #1:**
***The draft seems to think it is so obvious what a CB should measure 
that it only says it vaguely as "the level of congestion", and only 
suggests the difference between ingress and egress counters as an 
example. Some readers might well think like this: Does congestion level 
mean the percentage extra bit-rate relative to the aggregate's expected 
or maximum bit-rate? That might actually be a correct measure of 
congestion in some scenarios, but...

The draft does not say that the congestion level is defined as dropped 
bytes divided by ingress bytes. The draft should spell out that a CB 
should measure the volume of bytes dropped and the volume of ECN-capable 
bytes marked with CE, and express these as a fraction of resp. total 
ingress non-ECT bytes and total ingress ECT bytes (assuming buffers 
within the scope of the CB are ECN-enabled). Even this is problematic, 
because the assumption in parentheses never holds, particularly during 
excessive congestion. It could also discuss the relative merit of 
measuring the percentage of packets dropped/marked instead of bytes.

Also it should mention that care should be taken over how to combine the 
measurements. For instance avoid the common mistake of averaging 
fractions, because ave(c1/t1, c2/t2, c3/t3 ...) != (c1 + c2 + c3)/(t1 + 
t2 + t3).

*Gap #2:**
***All the diags show multiple routers, but the text says congestion can 
be measured by comparing ingress and egress traffic. Nowhere does it say 
that only traffic with addressing that will have for-certain only passed 
through both ends should be measured.

{Note 1}: A few years ago I dug deep into the history surrounding the 
early congestion collapses on the Internet and found that those involved 
were adamant that the term congestion collapse should not be waved 
around for dramatic effect, because it has a very specific definition, 
as paraphrased above.

{Note 2}: The credit feature of ConEx was intended to address short-flow 
overload if it becomes a problem. DOn't get me wrong; I'm not objecting 
to the use of CBs for the short-flow problem because I want you to use 
my solution. I'm just using this as an example of a fine-grained way to 
solve the problem, rather than the sledge-hammer CB way.
Here's the intuition briefly: With ConEx, you have to attach 'congestion 
credit' to the first packets of a flow to cover the risk of congestion 
before you have feedback (and if you don't and there is congestion, your 
packets are dropped by an audit function). Then congestion policers at 
the network ingress can limit the amount of congestion credit consumed 
without needing feedback, and thin out traffic if it consists of large 
numbers of short flows. If short flows come to predominate, ConEx credit 
was also designed to incentivize a new form of proxy that could regulate 
short-flows with a push-back style of congestion control, without a full 
feedback loop. That would be far preferable to such a drastic measure as 
a circuit-breaker. This aspect of ConEx was not written into the IETF 
docs, but it is mentioned in the re-ECN drafts that were the ancestors 
of ConEx.

*Nits**
*
3.
s/last resort protection to the network paths that these are used./
  /last resort protection to the traffic sharing their network path./

s/tunnels encapsulations/
  /tunnel encapsulations/

3. What makes a good CB?

    Circuit Breakers are RECOMMENDED for IETF protocols and tunnels that
    carry non-congestion-controlled Internet flows and for traffic
    aggregates, e.g., traffic sent using a network tunnel.

Delete "

e.g., traffic sent using a network tunnel

"
Reason: this implies all network tunnels are problematic, whereas the 
rest of the sentence adequately says that only tunnels carrying 
non-congestion controlled flows are of concern.

4.

s/monitor the level congestion/
  /monitor the level of congestion/

4.1.1
(e.g. to implement a Section 5.1)
?

4.1.2
s/pre-prosvisioned/
  /pre-provisioned/

6.1

    One common question is whether a Circuit Breaker is needed when a
    tunnel is deployed in a private network with pre-provisioned
    capacity?

Remove '?' from the end.

6.2

s/in the event that persistent congestion occur./
  /in the event that persistent congestion occurs./

Regards

Bob

On 08/10/15 12:47, Gorry Fairhurst wrote:
>> [Gorry, I also have to deliver on my promise on a paragraph for
>> circuit-breaker. Do you have a deadline for that?]
>>
> The circuit-breaker ID is pending start of IETF last call, the 
> deadline for doing an author rev passed, sorry.
>
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/

[tsvwg] Extensive review of draft-ietf-tsvwg-circ… Bob Briscoe
Re: [tsvwg] Extensive review of draft-ietf-tsvwg-… gorry
Re: [tsvwg] Extensive review of draft-ietf-tsvwg-… Bob Briscoe