Re: [aqm] think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02

"Scheffenegger, Richard" <rs@netapp.com> Sat, 28 March 2015 14:51 UTC

Return-Path: <rs@netapp.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9AC671A8868 for <aqm@ietfa.amsl.com>; Sat, 28 Mar 2015 07:51:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.011
X-Spam-Level:
X-Spam-Status: No, score=-5.011 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sZC_b6Win3Pr for <aqm@ietfa.amsl.com>; Sat, 28 Mar 2015 07:51:03 -0700 (PDT)
Received: from mx143.netapp.com (mx143.netapp.com [216.240.21.24]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB4571A8874 for <aqm@ietf.org>; Sat, 28 Mar 2015 07:51:02 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.11,484,1422950400"; d="scan'208";a="32198696"
Received: from hioexcmbx03-prd.hq.netapp.com ([10.122.105.36]) by mx143-out.netapp.com with ESMTP; 28 Mar 2015 07:50:43 -0700
Received: from HIOEXCMBX05-PRD.hq.netapp.com (10.122.105.38) by hioexcmbx03-prd.hq.netapp.com (10.122.105.36) with Microsoft SMTP Server (TLS) id 15.0.995.29; Sat, 28 Mar 2015 07:50:42 -0700
Received: from HIOEXCMBX05-PRD.hq.netapp.com ([::1]) by hioexcmbx05-prd.hq.netapp.com ([fe80::29f7:3e3f:78c5:a0bc%21]) with mapi id 15.00.0995.031; Sat, 28 Mar 2015 07:50:42 -0700
From: "Scheffenegger, Richard" <rs@netapp.com>
To: David Lang <david@lang.hm>
Thread-Topic: [aqm] think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02
Thread-Index: AQHQZydROE8kscPRPkSlMb6YQb+/jZ0uA9QAgAGsMYCAAH0FgIAAwqkAgAAfRgD//5Oj4IAAfnKAgAACTYCAAAlAgIAABPoAgAALlgCAAMGmpQ==
Date: Sat, 28 Mar 2015 14:50:42 +0000
Message-ID: <AE342093-DE05-4D93-96DA-EB07E221F1D9@netapp.com>
References: <23AFEFE3-4D93-4DD9-A22B-952C63DB9FE3@cisco.com> <BF6B00CC65FD2D45A326E74492B2C19FB75BAA82@FR711WXCHMBA05.zeu.alcatel-lucent.com> <72EE366B-05E6-454C-9E53-5054E6F9E3E3@ifi.uio.no> <55146DB9.7050501@rogers.com> <08C34E4A-DFB7-4816-92AE-2ED161799488@ifi.uio.no> <BF6B00CC65FD2D45A326E74492B2C19FB75BAFA0@FR711WXCHMBA05.zeu.alcatel-lucent.com> <alpine.DEB.2.02.1503271024550.2416@nftneq.ynat.uz> <5d58d2e21400449280173aa63069bf7a@hioexcmbx05-prd.hq.netapp.com> <20150327183659.GI39886@verdi> <72C12F6B-9DDE-4483-81F2-2D9A0F2D3A48@cs.columbia.edu> <alpine.DEB.2.02.1503271211200.19390@nftneq.ynat.uz> <D13AFCE7.46BC%kk@cs.ucr.edu>, <alpine.DEB.2.02.1503271257230.19390@nftneq.ynat.uz>
In-Reply-To: <alpine.DEB.2.02.1503271257230.19390@nftneq.ynat.uz>
Accept-Language: de-AT, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/P9u5VTJI2HrxhLoLdzfVd9zTGBc>
Cc: John Leslie <john@jlc.net>, KK <kk@cs.ucr.edu>, "aqm@ietf.org" <aqm@ietf.org>, Vishal Misra <misra@cs.columbia.edu>
Subject: Re: [aqm] think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 28 Mar 2015 14:51:05 -0000

David,

Perhaps you would care to provide some text to address the misconception that you pointed out? (To wait for a 100% fix as a 90% fix appears much less appealing, while the current state of art is at 0%)

If you think that aqm-recommendations is not strogly enough worded. I think this particular discussion (to aqm or not) really belongs there. The other document (ecn benefits) has a different target in arguing for going those last 10%...

Best regards,
   Richard Scheffenegger

> Am 27.03.2015 um 16:17 schrieb "David Lang" <david@lang.hm>:
> 
>> On Fri, 27 Mar 2015, KK wrote:
>> 
>> The discussion about adding buffers and the impact of buffers should be
>> considered relative to the time scales when congestion occurs and when it
>> is relieved by the dynamics of the end-system protocols. The reason we
>> have buffering is to handle transients at the points where there is a
>> mismatch in available bandwidth. We don¹t look to just throw buffers in
>> front of a bottleneck for ?long run¹ overload.
> 
> In theory you are correct. However in practice, you are wrong.
> 
> throughput benchmarks don't care how long the data sits in buffers, so larger buffers improve the benchmark numbers (up until the point that they cause timeouts)
> 
> But even if the product folks aren't just trying to maximize throughput, they size the buffers based on the worst case bandwidth/latency. So you have products with buffers that can handle 1Gb links with 200ms speed-of-light induces latency being used for 1.5Mb/768K 20ms DSL lines without any changes.
> 
> I'm not saying that ECN doesn't provide value, but the statement that without ECN you have the choice of low-latency OR good througput is only true if you ignore what's in place today.
> 
> It also does a dissservice because it implies that if you use something other than ECN, it's going to hurt your performance. This discourages people from enabling pie or fq_codel because they have read about how bad they are and how they will increase latency because they drop packets. This isn't just a theoretical "someone may think this", I've seen this exact argument trotted out a couple times recently.
> 
>> While active queue management undoubtedly seeks to keep the backlog
>> build-up at a manageable level so as to not allow latency to grow and
>> still keep the links busy to the extent possible, the complement that ECN
>> provides is to mitigate the impact of the drop that AQM uses to signal
>> end-points to react to the transient congestion. ECN has the benefit when
>> you have flows that have small windows, where the impact of loss is more
>> significant.
>> 
>> As you say, "when a packet is lost it causes a 'large' amount of latency
>> as the sender times out and retransmits, but if this is only happening
>> every few thousand packets, it's a minor effect.². But this is the case
>> for flows that are long-lived. If the flows are short-lived (and I believe
>> empirical evidence suggests that they are a significant portion of the
>> flows), then it is not a minor effect any more.
> 
> Even an occasional lost packet in a short flow is a minor effect compared to the current status quo of high latency on all packets.
> 
> Yes, many web pages are made up of many different items, fetched from many different locations, so avoiding packet losses on these flows is desirable.
> 
> But it's even more important to keep latency low while the link is under load, otherwise your connections end up being serialized, which kills performance even more.
> 
> As an example (just to be sure we are all talking about the same thing)
> 
> user clicks a link
> DNS lookup
> small page fetch
> N resources to fetch, add to queue
> for each resource in the queue (up to M in parallel)
>  DNS lookup (may be cached)
>  page fetch (some small, some large, some massive)
>  may trigger more resources to fetch that get added to queue
> 
> it's common for there to be a few massive resources to fetch in a page that get queued early (UI javascript libraries or background images)
> 
> If a packet gets lost from one of the large fetches, it doesn't have that big of an effect. If it gets lost from one of the small fetches, it has more of an effect.
> 
> But if the first resource to be fetch causes latency to go to 500ms (actually a fairly 'clean' network by today's standards), then all of the DNS lookups, TCP handshakes, etc that are needed for all the other resources end up taking far longer than the time that would be lost due to a dropped packet.
> 
> This is a better vs best argument. Nobody disputes that something like fq_codel/pie/cake/whatever + ECN would be better than just fq_codel/pie/cake/whatever, but the way this is being worded make it sound that static buffer sizes + tail-drop + ECN is better than fq_codel/pie/cake/whaever because these other queueing algorithms will cause packet loss.
> 
> David Lang
> _______________________________________________
> aqm mailing list
> aqm@ietf.org
> https://www.ietf.org/mailman/listinfo/aqm