Re: [re-ECN] FW: ConEx BoF announcement text

Fred Baker <fred@cisco.com> Thu, 22 October 2009 20:38 UTC

Return-Path: <fred@cisco.com>
X-Original-To: re-ecn@core3.amsl.com
Delivered-To: re-ecn@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 2D5FC3A68AC for <re-ecn@core3.amsl.com>; Thu, 22 Oct 2009 13:38:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.332
X-Spam-Level:
X-Spam-Status: No, score=-106.332 tagged_above=-999 required=5 tests=[AWL=-0.033, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dAIMX6x2c57o for <re-ecn@core3.amsl.com>; Thu, 22 Oct 2009 13:38:30 -0700 (PDT)
Received: from sj-iport-6.cisco.com (sj-iport-6.cisco.com [171.71.176.117]) by core3.amsl.com (Postfix) with ESMTP id BD1D33A6861 for <re-ecn@ietf.org>; Thu, 22 Oct 2009 13:38:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=fred@cisco.com; l=9203; q=dns/txt; s=sjiport06001; t=1256243921; x=1257453521; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Fred=20Baker=20<fred@cisco.com>|Subject:=20Re: =20[re-ECN]=20FW:=20=20ConEx=20BoF=20announcement=20text |Date:=20Thu,=2022=20Oct=202009=2013:38:37=20-0700 |Message-Id:=20<BA5FCA1F-0F30-42D2-8C3A-006B28B7D0E1@cisc o.com>|To:=20=3D?ISO-8859-1?Q?=3D22Ilpo_J=3DE4rvinen=3D22 ?=3D=20<ilpo.jarvinen@helsinki.fi>|Cc:=20Bob=20Briscoe=20 <rbriscoe@jungle.bt.co.uk>,=0D=0A=20=20=20=20=20=20=20=20 Leslie=20Daigle=20<leslie@thinkingcat.com>,=0D=0A=20=20 =20=20=20=20=20=20re-ECN=20unIETF=20list=20<re-ecn@ietf.o rg>|Mime-Version:=201.0=20(Apple=20Message=20framework=20 v936)|Content-Transfer-Encoding:=20quoted-printable |In-Reply-To:=20<Pine.LNX.4.64.0910222140130.20686@melkin kari.cs.Helsinki.FI>|References:=20<4A916DBC72536E419A0BD 955EDECEDEC0636399B@E03MVB1-UKBR.domain1.systemhost.net> =20<4ADD187E.6000200@thinkingcat.com>=20<200910221807.n9M I7P2a002071@bagheera.jungle.bt.co.uk>=20<F437BD07-581B-45 42-ABDB-ABABEDC3B8DD@cisco.com>=20<Pine.LNX.4.64.09102221 40130.20686@melkinkari.cs.Helsinki.FI>; bh=/3WDm8Bs9O3Ry3aX9BEkg7R15ArDjIWej5wz7Fs9yTA=; b=AS5bOj5qzyHLUI4abpo+CDedWTbTJkCHEOePcUPVOV/XwklWyggrOHKR UPadM3+SeNqktIrXu2wQqOIMheSXlb+4wpkyLJGp+P4lfKwx3SBzJIMDv 9r6VAZ/lHPV9jgH7n43MdCdOnxlalzylrrwWZpsGdJhMALJuUzl/TfxRU g=;
Authentication-Results: sj-iport-6.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApoEAIlf4EqrR7H+/2dsb2JhbADEKphQgj2CAgQ
X-IronPort-AV: E=Sophos;i="4.44,607,1249257600"; d="scan'208";a="415501589"
Received: from sj-core-2.cisco.com ([171.71.177.254]) by sj-iport-6.cisco.com with ESMTP; 22 Oct 2009 20:38:40 +0000
Received: from stealth-10-32-244-219.cisco.com (stealth-10-32-244-219.cisco.com [10.32.244.219]) by sj-core-2.cisco.com (8.13.8/8.14.3) with ESMTP id n9MKcduv012050; Thu, 22 Oct 2009 20:38:40 GMT
Message-Id: <BA5FCA1F-0F30-42D2-8C3A-006B28B7D0E1@cisco.com>
From: Fred Baker <fred@cisco.com>
To: =?ISO-8859-1?Q?=22Ilpo_J=E4rvinen=22?= <ilpo.jarvinen@helsinki.fi>
In-Reply-To: <Pine.LNX.4.64.0910222140130.20686@melkinkari.cs.Helsinki.FI>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v936)
Date: Thu, 22 Oct 2009 13:38:37 -0700
References: <4A916DBC72536E419A0BD955EDECEDEC0636399B@E03MVB1-UKBR.domain1.systemhost.net> <4ADD187E.6000200@thinkingcat.com> <200910221807.n9MI7P2a002071@bagheera.jungle.bt.co.uk> <F437BD07-581B-4542-ABDB-ABABEDC3B8DD@cisco.com> <Pine.LNX.4.64.0910222140130.20686@melkinkari.cs.Helsinki.FI>
X-Mailer: Apple Mail (2.936)
Cc: re-ECN unIETF list <re-ecn@ietf.org>
Subject: Re: [re-ECN] FW: ConEx BoF announcement text
X-BeenThere: re-ecn@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: re-inserted explicit congestion notification <re-ecn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/re-ecn>
List-Post: <mailto:re-ecn@ietf.org>
List-Help: <mailto:re-ecn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2009 20:38:32 -0000

well, we disagree on some points. I don't know that perfect agreement  
is necessary, and certainly not on this thread. That said...

Regarding behavior beyond the cliff, it might be instructive to take  
the most extreme possible case - a dumbbell network and a really  
obnoxious file transfer algorithm.

          +-+ 10*N   +-+ 10*N  +-+   N  +-+ 10*N  +-+
          |A|--------|B|-------|C|------|D|-------|E|
          +-+        +-+       +-+      +-+       +-+
                 10*N |         |10*N
                    +--+      +--+
                    | F|      | G|
                    +--+      +--+

Presume that the capacity from C to D is N, and all other capacities  
are 10*N. Presume that some application at A wants to send something  
really big to a peer at E, and some other application at F wants to  
send something else to a peer at G.

Presume also that the way file transfers work is that the sender puts  
appropriate headers on all of the indicated packets, queues them up to  
its link layer chip, and spews them as fast as it can. If it's a 100  
gigabyte file, it queues up 100 gigabytes and watches it heat the  
"wires". The peer receives what it receives and tells the sender what  
it didn't. The sender now repeats the exercise with anything his peer  
didn't receive.

BTW, that is in fact how a file transfer done in a certain research  
environment on very noisy lines is proposed to work. I didn't just  
dream that up.

Obviously, links A-B, B-C, F-B, and C-D will be fully utilized. Since  
B-C has less (half) capacity than A-B + F-B, each of those flows will  
drop about half of its traffic. The transfer rate F-G should be around  
5*N. C-D will also drop 80% of the remaining traffic, resulting in an  
arrival rate at E of N. A will go through at least ten iterations of  
its transfer, and F at least two.

The first down side is that while A-B and F-B are fully utilized, most  
of that capacity is wasted. This impacts F's ability to deliver  
traffic to G; it should have achieved 90% throughput rate (the rate C- 
G less the rate C-D), and in fact achieved 50% or less. With the right  
mechanism (ack clocking comes to mind, and there are other  
approaches), the capacity used by the file transfer on A-B is only 10%  
used and there is at most a 10% impact on the file transfer F-G.

The second down-side is that the traffic arriving at G and E will be  
scattered all through the respective files, meaning that they have to  
store the bits and pieces and do a fair bit of searching and sorting  
to determine what they have not yet received and to reconstitute the  
file. This is a waste both of memory and CPU resources.

Voila, we come back pretty quickly to wanting the transmission rate of  
A to approximate the reception rate at E and the transmission rate at  
F to approximate the reception rate at G, and wanting the window to  
not be excessively large. An excessively large window wastes bandwidth  
before the bottleneck and potentially creates unnecessary bottlenecks,  
impacts other file transfers, imposes a memory/algorithm burden on the  
receiver, and for all of those costs accomplishes nothing that a  
smaller window would not have accomplished better.

A window that is too large is a bad thing, and the congestion it  
causes is a bad thing. Not that the presence of congestion is bad - it  
is, as Bob says, a natural side effect of what happens. But congestion  
that is poorly managed by the endpoint and/or the network is a very  
bad thing. re-ecn does in fact try to give the sender an incentive to  
approximate the knee (the smallest window that achieves the potential  
throughput rate), and increasing it to or beyond the cliff by  
definition doesn't increase it's rate materially beyond that point.

On Oct 22, 2009, at 12:36 PM, Ilpo Järvinen wrote:

> On Thu, 22 Oct 2009, Fred Baker wrote:
>
>> I'll argue something slightly different. It comes to the same  
>> endpoint, but
>> through a different observation. Some of us, you Bob especially,  
>> are very
>> familiar with this; others are perhaps less so. Bear with me.
>>
>> Jain's 1994 patent on congestion control defines two terms: the  
>> "cliff" and
>> the "knee" of a throughput curve.
>>
>>          |
>>      T   |      availabile capacity
>>      h   | ---------------------------------------
>>      r   |       +---+
>>      u   |  knee/     \cliff
>>      p   |     /       \
>>      u   |    /         \
>>      t   |   /           \
>>          | /             \
>>          |/               \
>>          |/                 ----------------
>>          -----------------------------------------
>>                    Window  --->
>>
>> As the TCP/SCTP window increases from zero to some value,  
>> throughput also
>> increases. That stops when the available capacity has been  
>> consumed; at that
>> point, even if the window grows, throughput does not. What  
>> increases instead
>> is RTT, because a queue grows.
>
> Up to this point I agree with this paragraph...
>
>> If window continues to increase, at some point the queue starts  
>> dropping
>> traffic, and throughput degrades.
>
> ...However, this is not a true claim. As long as no unnecessary work  
> is
> not done at any bottleneck (and we don't have a shared resource such  
> as
> a wireless channel), thoughput does _not_ degrade (at all!), even  
> though
> many keep asserting that without much of a thought nowadays. Of course
> non-infinitely long flows would introduce some anomalities to  
> calculations
> through which one could use to show that this claim is "true" but I  
> think
> that's hardly what people usually mean when they make that claim that
> dropping implies throughput degradation.
>
> Somewhat related, for some reason there's a very common misconcept  
> that
> drops imply poor performance. That is often stated as a fact without
> looking deeper into what really was the cause for sub-optimal
> performance, often that turns out to be something else when looked
> closely enough. I'd say that very often losses were, in fact,  
> innocent.
> Quite often the blame would fall on constant factor MD (if one is  
> honest
> enough to admit).
>
> Btw, your graph is not accurate (wrt. Jain), he's drawing a much  
> sharper
> drop than you do at his "cliff" (so with ascii you'd have to use | at
> first to match him). I've some doubts about validity of that sharp
> angle too, so I find your graph is much more sensible (that is, as  
> long as
> the resource is shared).
>
>> By definition, the window at the "knee" is smaller than the window at
>> the "cliff", but throughput is the same.
>>
>> Common TCP congestion control algorithms such as Reno and Cubic  
>> tune to the
>> cliff. That has the upside of maximizing throughput; it has the  
>> downside that
>> it is abusive to other applications such as voice/video (increased  
>> and
>> variable RTT, and induces loss), and to other TCP sessions. #include
>> <discussion of bittorent and why other users are negatively  
>> impacted by it>
>
> I think Bob doesn't agree with you here, he isn't saying that  
> congestion
> is something bad but something nearly opposite. Bob's own words:
>
> "It would contradict this to say congestion is a problem - it's not  
> - it's
> healthy and natural in a data network."
>
> ...It's just that if you keep doing that too much (beyond the  
> congestion
> volume allocated for you), you would "suffer (if I've understood him
> right). ...So it has very little to do with operating in the "cliff"  
> or
> "knee" in short timescales (unless of course you've already ran out of
> your quota).
>
>> Other TCP congestion control algorithms based on ECN, CalTech FAST,  
>> etc, try
>> to tune to the knee. This gets them exactly the same throughput  
>> including
>> support of very high rate applications but with a smaller window  
>> value and
>> therefore lower queue depth and lower probability of loss - both for
>> themselves and their competitors. It does so without negatively, or  
>> at least
>> AS negatively, impacting applications they compete with beyond  
>> seeking to
>> share the available capacity with them.
>
> I certainly agree with you here that low queuing delay is a desirable
> property too. ...However, I don't see how e.g., ECN alone could  
> achieve
> low queue delay and high throughput at the same time, you'd need  
> something
> more than that (mainly to remove the 0.5 factored MD). And after  
> removing
> that, we'd get something close to what Matt Mathis is suggesting and  
> that
> then does not anymore depend on ECN to work (though I think one could
> certainly avoid lots of loss recovering by signalling in early with
> ECN).
>
> -- 
> i.