Re: [aqm] think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02

David Lang <david@lang.hm> Fri, 27 March 2015 20:17 UTC

Return-Path: <david@lang.hm>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 700961A86E2 for <aqm@ietfa.amsl.com>; Fri, 27 Mar 2015 13:17:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xW9upq_mT1Zp for <aqm@ietfa.amsl.com>; Fri, 27 Mar 2015 13:17:46 -0700 (PDT)
Received: from bifrost.lang.hm (mail.lang.hm [64.81.33.126]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 605241A00E0 for <aqm@ietf.org>; Fri, 27 Mar 2015 13:17:46 -0700 (PDT)
Received: from asgard.lang.hm (asgard.lang.hm [10.0.0.100]) by bifrost.lang.hm (8.13.4/8.13.4/Debian-3) with ESMTP id t2RKHa4V030563; Fri, 27 Mar 2015 12:17:36 -0800
Date: Fri, 27 Mar 2015 13:17:36 -0700 (PDT)
From: David Lang <david@lang.hm>
X-X-Sender: dlang@asgard.lang.hm
To: KK <kk@cs.ucr.edu>
In-Reply-To: <D13AFCE7.46BC%kk@cs.ucr.edu>
Message-ID: <alpine.DEB.2.02.1503271257230.19390@nftneq.ynat.uz>
References: <23AFEFE3-4D93-4DD9-A22B-952C63DB9FE3@cisco.com> <BF6B00CC65FD2D45A326E74492B2C19FB75BAA82@FR711WXCHMBA05.zeu.alcatel-lucent.com> <72EE366B-05E6-454C-9E53-5054E6F9E3E3@ifi.uio.no> <55146DB9.7050501@rogers.com> <08C34E4A-DFB7-4816-92AE-2ED161799488@ifi.uio.no> <BF6B00CC65FD2D45A326E74492B2C19FB75BAFA0@FR711WXCHMBA05.zeu.alcatel-lucent.com> <alpine.DEB.2.02.1503271024550.2416@nftneq.ynat.uz> <5d58d2e21400449280173aa63069bf7a@hioexcmbx05-prd.hq.netapp.com> <20150327183659.GI39886@verdi> <72C12F6B-9DDE-4483-81F2-2D9A0F2D3A48@cs.columbia.edu> <alpine.DEB.2.02.1503271211200.19390@nftneq.ynat.uz> <D13AFCE7.46BC%kk@cs.ucr.edu>
User-Agent: Alpine 2.02 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="680960-1975013193-1427487456=:19390"
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/_xJ7qGJ1Rjyspxcw_Fi2MbRVCLY>
Cc: "Scheffenegger, Richard" <rs@netapp.com>, aqm@ietf.org, Vishal Misra <misra@cs.columbia.edu>, John Leslie <john@jlc.net>
Subject: Re: [aqm] think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Mar 2015 20:17:48 -0000

On Fri, 27 Mar 2015, KK wrote:

> The discussion about adding buffers and the impact of buffers should be
> considered relative to the time scales when congestion occurs and when it
> is relieved by the dynamics of the end-system protocols. The reason we
> have buffering is to handle transients at the points where there is a
> mismatch in available bandwidth. We don¹t look to just throw buffers in
> front of a bottleneck for ?long run¹ overload.

In theory you are correct. However in practice, you are wrong.

throughput benchmarks don't care how long the data sits in buffers, so larger 
buffers improve the benchmark numbers (up until the point that they cause 
timeouts)

But even if the product folks aren't just trying to maximize throughput, they 
size the buffers based on the worst case bandwidth/latency. So you have products 
with buffers that can handle 1Gb links with 200ms speed-of-light induces latency 
being used for 1.5Mb/768K 20ms DSL lines without any changes.

I'm not saying that ECN doesn't provide value, but the statement that without 
ECN you have the choice of low-latency OR good througput is only true if you 
ignore what's in place today.

It also does a dissservice because it implies that if you use something other 
than ECN, it's going to hurt your performance. This discourages people from 
enabling pie or fq_codel because they have read about how bad they are and how 
they will increase latency because they drop packets. This isn't just a 
theoretical "someone may think this", I've seen this exact argument trotted out 
a couple times recently.

> While active queue management undoubtedly seeks to keep the backlog
> build-up at a manageable level so as to not allow latency to grow and
> still keep the links busy to the extent possible, the complement that ECN
> provides is to mitigate the impact of the drop that AQM uses to signal
> end-points to react to the transient congestion. ECN has the benefit when
> you have flows that have small windows, where the impact of loss is more
> significant.
>
> As you say, "when a packet is lost it causes a 'large' amount of latency
> as the sender times out and retransmits, but if this is only happening
> every few thousand packets, it's a minor effect.². But this is the case
> for flows that are long-lived. If the flows are short-lived (and I believe
> empirical evidence suggests that they are a significant portion of the
> flows), then it is not a minor effect any more.

Even an occasional lost packet in a short flow is a minor effect compared to the 
current status quo of high latency on all packets.

Yes, many web pages are made up of many different items, fetched from many 
different locations, so avoiding packet losses on these flows is desirable.

But it's even more important to keep latency low while the link is under load, 
otherwise your connections end up being serialized, which kills performance even 
more.

As an example (just to be sure we are all talking about the same thing)

user clicks a link
DNS lookup
small page fetch
N resources to fetch, add to queue
for each resource in the queue (up to M in parallel)
   DNS lookup (may be cached)
   page fetch (some small, some large, some massive)
   may trigger more resources to fetch that get added to queue

it's common for there to be a few massive resources to fetch in a page that get 
queued early (UI javascript libraries or background images)

If a packet gets lost from one of the large fetches, it doesn't have that big of 
an effect. If it gets lost from one of the small fetches, it has more of an 
effect.

But if the first resource to be fetch causes latency to go to 500ms (actually a 
fairly 'clean' network by today's standards), then all of the DNS lookups, TCP 
handshakes, etc that are needed for all the other resources end up taking far 
longer than the time that would be lost due to a dropped packet.

This is a better vs best argument. Nobody disputes that something like 
fq_codel/pie/cake/whatever + ECN would be better than just 
fq_codel/pie/cake/whatever, but the way this is being worded make it sound that 
static buffer sizes + tail-drop + ECN is better than fq_codel/pie/cake/whaever 
because these other queueing algorithms will cause packet loss.

David Lang