[re-ECN] IETF BoF Proposal: Congestion Exposure (CEX)

Bob Briscoe <rbriscoe@jungle.bt.co.uk> Mon, 07 September 2009 23:57 UTC

Return-Path: <rbriscoe@jungle.bt.co.uk>
X-Original-To: re-ecn@core3.amsl.com
Delivered-To: re-ecn@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 402123A688A for <re-ecn@core3.amsl.com>; Mon, 7 Sep 2009 16:57:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.876
X-Spam-Level:
X-Spam-Status: No, score=-1.876 tagged_above=-999 required=5 tests=[AWL=0.241, BAYES_00=-2.599, DNS_FROM_RFC_BOGUSMX=1.482, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WDj-grIawL-a for <re-ecn@core3.amsl.com>; Mon, 7 Sep 2009 16:57:40 -0700 (PDT)
Received: from smtp1.smtp.bt.com (smtp1.smtp.bt.com [217.32.164.137]) by core3.amsl.com (Postfix) with ESMTP id CAAB23A68FF for <re-ecn@ietf.org>; Mon, 7 Sep 2009 16:57:39 -0700 (PDT)
Received: from i2kc06-ukbr.domain1.systemhost.net ([193.113.197.70]) by smtp1.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 8 Sep 2009 00:58:05 +0100
Received: from cbibipnt05.iuser.iroot.adidom.com ([147.149.196.177]) by i2kc06-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(6.0.3790.3959); Tue, 8 Sep 2009 00:58:04 +0100
Received: From bagheera.jungle.bt.co.uk ([132.146.168.158]) by cbibipnt05.iuser.iroot.adidom.com (WebShield SMTP v4.5 MR1a P0803.399); id 1252367883157; Tue, 8 Sep 2009 00:58:03 +0100
Received: from MUT.jungle.bt.co.uk ([10.73.61.25]) by bagheera.jungle.bt.co.uk (8.13.5/8.12.8) with ESMTP id n87NvuqR011706; Tue, 8 Sep 2009 00:57:56 +0100
Message-Id: <200909072357.n87NvuqR011706@bagheera.jungle.bt.co.uk>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Tue, 08 Sep 2009 00:57:43 +0100
To: "EGGERT, Lars" <lars.eggert@nokia.com>, "WESTERLUND, Magnus" <magnus.westerlund@ericsson.com>
From: Bob Briscoe <rbriscoe@jungle.bt.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
X-OriginalArrivalTime: 07 Sep 2009 23:58:04.0676 (UTC) FILETIME=[0A2FB440:01CA3017]
Cc: agenda@ietf.org, re-ECN unIETF list <re-ecn@ietf.org>, Jari ARKKO <jari.arkko@ericsson.com>, Ralph Droms <rdroms@cisco.com>
Subject: [re-ECN] IETF BoF Proposal: Congestion Exposure (CEX)
X-BeenThere: re-ecn@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: re-inserted explicit congestion notification <re-ecn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/re-ecn>
List-Post: <mailto:re-ecn@ietf.org>
List-Help: <mailto:re-ecn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Sep 2009 23:57:47 -0000

Lars, Magnus, (also Jari, Ralph)

Here's the BoF proposal we've been working on.
Too many people to acknowledge, I'm afraid.

FYI, there's been a big upsurge in activity on the re-ecn@ieft.org 
list, where we've been bashing this - about 40 people have now 
volunteered to help with implementation co-authoring etc. There's 
already an implementation (and 2 more in ns2, plus another two in the 
works). All quite encouraging.  I appreciate we'll have to get even 
bigger to change IP - but a BoF and hopefully a w-g following it 
can't but help in that direction.
<https://www.ietf.org/mailman/listinfo/re-ecn>

Anyway, here's the proposal,
also at:
<http://bobbriscoe.net/projects/refb/cex-bof-proposal-00.txt>


Bob

=====================================================================
IETF BoF Proposal: Congestion Exposure

The Internet is all about sharing capacity between multiple users -
that's what packet multiplexing is all about. But the IETF (and the
wider industry) is realising we really don't understand how best to
share out capacity. Many ISPs now override the way TCP shares out
capacity. The blocks and throttles resulting from this arms race are
causing bizarre feature interactions and random black holes.

In Nov 2008, the Transport Area of the IETF asked the IRTF Congestion
Control Research Group (ICCRG) to address this challenge. This BoF
proposal brings together those who want a working group to experiment on
one of the more promising approaches: congestion exposure. The aim is
not to pre-empt the ICCRG, but to invetigate practical issues through
experimental implementation and deployment.

Congestion Exposure?

The premise: capacity sharing is hard because the information needed to
share capacity properly isn't visible at the internetwork layer.
Specifically, the idea is for the sender to mark the outermost IP header
of each packet to reveal congestion expected over the rest of the path.
A protocol called re-ECN (re-inserted explicit congestion notification)
has been proposed to do this. Re-ECN is the strongest candidate for
adoption, but the proposed w-g might find it needs redesign, or it may
adopt an alternative if one surfaces.

Whatever the precise protocol, the aim is for an ISP to be able to count
the volume of congestion about to be caused by an aggregate of traffic,
as easily as it can count the volume of bytes - but instead it just
counts the volume of marked packets. An ISP could do this for each
attached user, or for whole attached networks.

There is no intent to change the well-established approach where
congestion is detected and responded to by transports on endpoints (e.g.
congestion control in TCP or RTP/RTCP). But network operators should be
able to see the congestion too.

Once ISPs can see congestion, they can discourage users from causing
large volumes of congestion. And they can discourage other networks from
allowing their users to cause congestion.

In a nutshell, this is about "accountability for causing congestion" -
in both directions - holding users accountable for sending too much
traffic and holding networks accountable for providing too little
capacity.

Because congestion isn't currently visible to ISPs, they have to resort
to their own piecemeal ways to limit congestion, e.g. volume capping,
fair queuing and deep packet inspection (DPI). The proposed working
group will explain clearly why these techniques are poor imitations of
what's really needed - congestion limiting. These piecemeal approaches
unnecessarily limit what users do want (volume) while only weakly
limiting what they don't want (congestion). Congestion is the precise
factor that causes grief to users, so we should reveal it. Then it can
be dealt with.

We shouldn't complain that ISPs are violating the Internet architecture
with deep packet inspection etc. if we (the IETF) don't provide a better
alternative. The history of firewalls and NATs shows that we need to
provide timely protocol support for good 'semi-permeable membranes'
between users. Otherwise, the industry has no choice but to build bad
impermeable walls. If we don't multiplex capacity properly, we shouldn't
be surprised if the Internet becomes increasingly carved up into
circuits.

For the avoidance of doubt, when we talk of congestion, it doesn't imply
any impairment. The proposed approach builds on explicit notification of
congestion (ECN [RFC3168]). This is purely a warning of approaching
congestion. Congestion exposure can add incentives to keep congestion
low - this keeps queues short with minimal actual congestion delay or
loss.



Proposed Work Items

   a) (INF) Clearly explain motivation for work

   b) (EXP) Define congestion exposure protocol

   c) (INF) Report(s) on experimental uses of congestion exposure

This work is far-reaching. Therefore the agenda is cut to one
self-contained outcome: the congestion exposure protocol. The protocol
will be defined for both IPv6 and v4. And it will start with just TCP as
an example transport.

A pre-requisite will be clearly documented motivations. The group will
encourage all sorts of different experiments in using exposed congestion
information (congestion policing, new congestion controls, simpler QoS,
inter-domain SLA metering, traffic engineering, DDoS mitigation, etc.).
But the purpose of the proposed group is not to specify these uses in
detail.

Informational reports on experiments conducted with the protocol will
complete the initial work of the group. Then the wider community can
assess whether it should move from experimental to standards track.

Although the plan is to produce an experimental track protocol, the
proposed w-g will need to be chartered allow for a standards track RFC
in case it needs to reassign reserved header space for experimental use.

The main focus of experiments will be deployment issues. The re-ECN
protocol proposal is designed for permanent partial deployment, ie,
networks can ignore congestion markings in packets and sending hosts can
choose to expose congestion or not. It is also designed to work without
changing forwarding on routers. This should ease deployment, but you
only really find deployment issues by trying it for real (e.g.
interactions with LEDBAT? how prevalent are middleboxes or tunnels that
break the protocol? interactions with dynamic bandwidth allocation?
mobility?).



Further Background

To many, congestion exposure will seem a rather unexpected direction to
take. But a growing community is realising just how powerful this
approach can be - it certainly sheds light on why we have misunderstood
capacity sharing in the past.

The IETF's LEDBAT working-group (low extra delay background transport)
is a good example that helps explain the issues and highlights why
congestion exposure is needed. LEDBAT is a congestion control intended
for background traffic. It yields to competing traffic far more than TCP
does. It senses congestion earlier than other congestion controls (by
noticing an increase in round trip time), so it uses a smaller share of
capacity when other traffic is present. This allows the other traffic to
finish earlier, allowing the LEDBAT traffic to pick up the freed
capacity and finish hardly any later itself. As such, LEDBAT illustrates
two points:

1. A certain volume of LEDBAT bytes will cause significantly less harm
to other uses of the Internet than if TCP transferred the same volume of
bytes (an unresponsive UDP would be even worse and a bursty source worse
still). ISPs that limit simple volume fail to reward responsive usage
like LEDBAT - this could kill it. Therefore ISPs need a way to count the
impact of responsive traffic like LEDBAT on other traffic; they cannot
just count all volume as equally costly to others.

2. LEDBAT also shows it is highly sub-optimal for all flows to have to
have the same rate as TCP would through a bottleneck (TCP-friendliness).

In a Transport Area plenary straw poll at the March '09 IETF, there was
zero support for TCP-friendliness as a way forward for the IETF, so this
proposal is an attempt to test out an alternative way forward. We also
need to recognise that LEDBAT is a good first step, but not a panacea.
For instance, a 100kB flow will want 10MB flows to yield into the
background, but similarly, the 10MB flow will want a 10GB flow to yield
further into the background. And operators need a metric to ensure
transports have the incentive to push back or yield as appropriate.

And, of course, the problem is not just about transferring different
size files. Exposing congestion allows the impact on others to be
measured and discouraged for all sorts of different types of traffic. By
exposing congestion in the IP header, it can be dealt with irrespective
of flows, transports, or applications - just packets. The approach
should be applicable whether using unresponsive real-time streaming,
gaming, high-speed transfers for scientific computing, or whatever.

Ultimately, congestion exposure should allow the IETF to step back from
making judgements on whether flow A should go as fast as flow B (or
not). It should then eventually be possible to lift the TCP-friendliness
constraint which is largely holding back development of higher speed
transport protocols.



________________________________________________________________
Bob Briscoe,               Networks Research Centre, BT Research