[re-ECN] IETF BoF Proposal: Congestion Exposure (CEX)
Bob Briscoe <rbriscoe@jungle.bt.co.uk> Mon, 07 September 2009 23:57 UTC
Return-Path: <rbriscoe@jungle.bt.co.uk>
X-Original-To: re-ecn@core3.amsl.com
Delivered-To: re-ecn@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix)
with ESMTP id 402123A688A for <re-ecn@core3.amsl.com>;
Mon, 7 Sep 2009 16:57:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.876
X-Spam-Level:
X-Spam-Status: No, score=-1.876 tagged_above=-999 required=5 tests=[AWL=0.241,
BAYES_00=-2.599, DNS_FROM_RFC_BOGUSMX=1.482, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com
[127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WDj-grIawL-a for
<re-ecn@core3.amsl.com>; Mon, 7 Sep 2009 16:57:40 -0700 (PDT)
Received: from smtp1.smtp.bt.com (smtp1.smtp.bt.com [217.32.164.137]) by
core3.amsl.com (Postfix) with ESMTP id CAAB23A68FF for <re-ecn@ietf.org>;
Mon, 7 Sep 2009 16:57:39 -0700 (PDT)
Received: from i2kc06-ukbr.domain1.systemhost.net ([193.113.197.70]) by
smtp1.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.3959);
Tue, 8 Sep 2009 00:58:05 +0100
Received: from cbibipnt05.iuser.iroot.adidom.com ([147.149.196.177]) by
i2kc06-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(6.0.3790.3959);
Tue, 8 Sep 2009 00:58:04 +0100
Received: From bagheera.jungle.bt.co.uk ([132.146.168.158]) by
cbibipnt05.iuser.iroot.adidom.com (WebShield SMTP v4.5 MR1a P0803.399);
id 1252367883157; Tue, 8 Sep 2009 00:58:03 +0100
Received: from MUT.jungle.bt.co.uk ([10.73.61.25]) by bagheera.jungle.bt.co.uk
(8.13.5/8.12.8) with ESMTP id n87NvuqR011706; Tue, 8 Sep 2009 00:57:56 +0100
Message-Id: <200909072357.n87NvuqR011706@bagheera.jungle.bt.co.uk>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Tue, 08 Sep 2009 00:57:43 +0100
To: "EGGERT, Lars" <lars.eggert@nokia.com>, "WESTERLUND,
Magnus" <magnus.westerlund@ericsson.com>
From: Bob Briscoe <rbriscoe@jungle.bt.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
X-OriginalArrivalTime: 07 Sep 2009 23:58:04.0676 (UTC)
FILETIME=[0A2FB440:01CA3017]
Cc: agenda@ietf.org, re-ECN unIETF list <re-ecn@ietf.org>,
Jari ARKKO <jari.arkko@ericsson.com>, Ralph Droms <rdroms@cisco.com>
Subject: [re-ECN] IETF BoF Proposal: Congestion Exposure (CEX)
X-BeenThere: re-ecn@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: re-inserted explicit congestion notification <re-ecn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>,
<mailto:re-ecn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/re-ecn>
List-Post: <mailto:re-ecn@ietf.org>
List-Help: <mailto:re-ecn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>,
<mailto:re-ecn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Sep 2009 23:57:47 -0000
Lars, Magnus, (also Jari, Ralph) Here's the BoF proposal we've been working on. Too many people to acknowledge, I'm afraid. FYI, there's been a big upsurge in activity on the re-ecn@ieft.org list, where we've been bashing this - about 40 people have now volunteered to help with implementation co-authoring etc. There's already an implementation (and 2 more in ns2, plus another two in the works). All quite encouraging. I appreciate we'll have to get even bigger to change IP - but a BoF and hopefully a w-g following it can't but help in that direction. <https://www.ietf.org/mailman/listinfo/re-ecn> Anyway, here's the proposal, also at: <http://bobbriscoe.net/projects/refb/cex-bof-proposal-00.txt> Bob ===================================================================== IETF BoF Proposal: Congestion Exposure The Internet is all about sharing capacity between multiple users - that's what packet multiplexing is all about. But the IETF (and the wider industry) is realising we really don't understand how best to share out capacity. Many ISPs now override the way TCP shares out capacity. The blocks and throttles resulting from this arms race are causing bizarre feature interactions and random black holes. In Nov 2008, the Transport Area of the IETF asked the IRTF Congestion Control Research Group (ICCRG) to address this challenge. This BoF proposal brings together those who want a working group to experiment on one of the more promising approaches: congestion exposure. The aim is not to pre-empt the ICCRG, but to invetigate practical issues through experimental implementation and deployment. Congestion Exposure? The premise: capacity sharing is hard because the information needed to share capacity properly isn't visible at the internetwork layer. Specifically, the idea is for the sender to mark the outermost IP header of each packet to reveal congestion expected over the rest of the path. A protocol called re-ECN (re-inserted explicit congestion notification) has been proposed to do this. Re-ECN is the strongest candidate for adoption, but the proposed w-g might find it needs redesign, or it may adopt an alternative if one surfaces. Whatever the precise protocol, the aim is for an ISP to be able to count the volume of congestion about to be caused by an aggregate of traffic, as easily as it can count the volume of bytes - but instead it just counts the volume of marked packets. An ISP could do this for each attached user, or for whole attached networks. There is no intent to change the well-established approach where congestion is detected and responded to by transports on endpoints (e.g. congestion control in TCP or RTP/RTCP). But network operators should be able to see the congestion too. Once ISPs can see congestion, they can discourage users from causing large volumes of congestion. And they can discourage other networks from allowing their users to cause congestion. In a nutshell, this is about "accountability for causing congestion" - in both directions - holding users accountable for sending too much traffic and holding networks accountable for providing too little capacity. Because congestion isn't currently visible to ISPs, they have to resort to their own piecemeal ways to limit congestion, e.g. volume capping, fair queuing and deep packet inspection (DPI). The proposed working group will explain clearly why these techniques are poor imitations of what's really needed - congestion limiting. These piecemeal approaches unnecessarily limit what users do want (volume) while only weakly limiting what they don't want (congestion). Congestion is the precise factor that causes grief to users, so we should reveal it. Then it can be dealt with. We shouldn't complain that ISPs are violating the Internet architecture with deep packet inspection etc. if we (the IETF) don't provide a better alternative. The history of firewalls and NATs shows that we need to provide timely protocol support for good 'semi-permeable membranes' between users. Otherwise, the industry has no choice but to build bad impermeable walls. If we don't multiplex capacity properly, we shouldn't be surprised if the Internet becomes increasingly carved up into circuits. For the avoidance of doubt, when we talk of congestion, it doesn't imply any impairment. The proposed approach builds on explicit notification of congestion (ECN [RFC3168]). This is purely a warning of approaching congestion. Congestion exposure can add incentives to keep congestion low - this keeps queues short with minimal actual congestion delay or loss. Proposed Work Items a) (INF) Clearly explain motivation for work b) (EXP) Define congestion exposure protocol c) (INF) Report(s) on experimental uses of congestion exposure This work is far-reaching. Therefore the agenda is cut to one self-contained outcome: the congestion exposure protocol. The protocol will be defined for both IPv6 and v4. And it will start with just TCP as an example transport. A pre-requisite will be clearly documented motivations. The group will encourage all sorts of different experiments in using exposed congestion information (congestion policing, new congestion controls, simpler QoS, inter-domain SLA metering, traffic engineering, DDoS mitigation, etc.). But the purpose of the proposed group is not to specify these uses in detail. Informational reports on experiments conducted with the protocol will complete the initial work of the group. Then the wider community can assess whether it should move from experimental to standards track. Although the plan is to produce an experimental track protocol, the proposed w-g will need to be chartered allow for a standards track RFC in case it needs to reassign reserved header space for experimental use. The main focus of experiments will be deployment issues. The re-ECN protocol proposal is designed for permanent partial deployment, ie, networks can ignore congestion markings in packets and sending hosts can choose to expose congestion or not. It is also designed to work without changing forwarding on routers. This should ease deployment, but you only really find deployment issues by trying it for real (e.g. interactions with LEDBAT? how prevalent are middleboxes or tunnels that break the protocol? interactions with dynamic bandwidth allocation? mobility?). Further Background To many, congestion exposure will seem a rather unexpected direction to take. But a growing community is realising just how powerful this approach can be - it certainly sheds light on why we have misunderstood capacity sharing in the past. The IETF's LEDBAT working-group (low extra delay background transport) is a good example that helps explain the issues and highlights why congestion exposure is needed. LEDBAT is a congestion control intended for background traffic. It yields to competing traffic far more than TCP does. It senses congestion earlier than other congestion controls (by noticing an increase in round trip time), so it uses a smaller share of capacity when other traffic is present. This allows the other traffic to finish earlier, allowing the LEDBAT traffic to pick up the freed capacity and finish hardly any later itself. As such, LEDBAT illustrates two points: 1. A certain volume of LEDBAT bytes will cause significantly less harm to other uses of the Internet than if TCP transferred the same volume of bytes (an unresponsive UDP would be even worse and a bursty source worse still). ISPs that limit simple volume fail to reward responsive usage like LEDBAT - this could kill it. Therefore ISPs need a way to count the impact of responsive traffic like LEDBAT on other traffic; they cannot just count all volume as equally costly to others. 2. LEDBAT also shows it is highly sub-optimal for all flows to have to have the same rate as TCP would through a bottleneck (TCP-friendliness). In a Transport Area plenary straw poll at the March '09 IETF, there was zero support for TCP-friendliness as a way forward for the IETF, so this proposal is an attempt to test out an alternative way forward. We also need to recognise that LEDBAT is a good first step, but not a panacea. For instance, a 100kB flow will want 10MB flows to yield into the background, but similarly, the 10MB flow will want a 10GB flow to yield further into the background. And operators need a metric to ensure transports have the incentive to push back or yield as appropriate. And, of course, the problem is not just about transferring different size files. Exposing congestion allows the impact on others to be measured and discouraged for all sorts of different types of traffic. By exposing congestion in the IP header, it can be dealt with irrespective of flows, transports, or applications - just packets. The approach should be applicable whether using unresponsive real-time streaming, gaming, high-speed transfers for scientific computing, or whatever. Ultimately, congestion exposure should allow the IETF to step back from making judgements on whether flow A should go as fast as flow B (or not). It should then eventually be possible to lift the TCP-friendliness constraint which is largely holding back development of higher speed transport protocols. ________________________________________________________________ Bob Briscoe, Networks Research Centre, BT Research