Re: [re-ECN] FW: ConEx BoF announcement text

Fred Baker <fred@cisco.com> Thu, 22 October 2009 22:51 UTC

Return-Path: <fred@cisco.com>
X-Original-To: re-ecn@core3.amsl.com
Delivered-To: re-ecn@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 2F5973A69EB for <re-ecn@core3.amsl.com>; Thu, 22 Oct 2009 15:51:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.329
X-Spam-Level:
X-Spam-Status: No, score=-106.329 tagged_above=-999 required=5 tests=[AWL=-0.030, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id buXW+fyP4vtM for <re-ecn@core3.amsl.com>; Thu, 22 Oct 2009 15:51:22 -0700 (PDT)
Received: from sj-iport-3.cisco.com (sj-iport-3.cisco.com [171.71.176.72]) by core3.amsl.com (Postfix) with ESMTP id 6881F3A67C0 for <re-ecn@ietf.org>; Thu, 22 Oct 2009 15:51:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=fred@cisco.com; l=15575; q=dns/txt; s=sjiport03001; t=1256251892; x=1257461492; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Fred=20Baker=20<fred@cisco.com>|Subject:=20Re: =20[re-ECN]=20FW:=20=20ConEx=20BoF=20announcement=20text |Date:=20Thu,=2022=20Oct=202009=2015:51:31=20-0700 |Message-Id:=20<9423E875-E829-4F1A-ADBD-EAF7089FD615@cisc o.com>|To:=20=3D?ISO-8859-1?Q?=3D22Ilpo_J=3DE4rvinen=3D22 ?=3D=20<ilpo.jarvinen@helsinki.fi>|Cc:=20Bob=20Briscoe=20 <rbriscoe@jungle.bt.co.uk>,=0D=0A=20=20=20=20=20=20=20=20 Leslie=20Daigle=20<leslie@thinkingcat.com>,=0D=0A=20=20 =20=20=20=20=20=20re-ECN=20unIETF=20list=20<re-ecn@ietf.o rg>|Mime-Version:=201.0=20(Apple=20Message=20framework=20 v936)|Content-Transfer-Encoding:=20quoted-printable |In-Reply-To:=20<Pine.LNX.4.64.0910222340490.20686@melkin kari.cs.Helsinki.FI>|References:=20<4A916DBC72536E419A0BD 955EDECEDEC0636399B@E03MVB1-UKBR.domain1.systemhost.net> =20<4ADD187E.6000200@thinkingcat.com>=20<200910221807.n9M I7P2a002071@bagheera.jungle.bt.co.uk>=20<F437BD07-581B-45 42-ABDB-ABABEDC3B8DD@cisco.com>=20<Pine.LNX.4.64.09102221 40130.20686@melkinkari.cs.Helsinki.FI>=20<BA5FCA1F-0F30-4 2D2-8C3A-006B28B7D0E1@cisco.com>=20<Pine.LNX.4.64.0910222 340490.20686@melkinkari.cs.Helsinki.FI>; bh=mmMCW5DpJqclhqA2BrKAgbtYEVxnlgr4Xcy7YR0Zq+A=; b=Mb8+wUCMxnamo65UqPR9ijpKXZ3a6wphrwRbZTe4YBxLGcAdyRqK14GH GAxSSXuKda9aE8M7FY5PqldbisJ96BDkKH4OzvxjIEaGkR62iKZnW7TMp X8RaZOTHkTiR25em/4UA04e9yNeSKKRI2K6JiqzsmHb/35dL4fWQYXJK6 g=;
Authentication-Results: sj-iport-3.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApoEAHZ+4EqrR7Ht/2dsb2JhbADEEphRgj2CAgQ
X-IronPort-AV: E=Sophos;i="4.44,608,1249257600"; d="scan'208";a="198393845"
Received: from sj-core-1.cisco.com ([171.71.177.237]) by sj-iport-3.cisco.com with ESMTP; 22 Oct 2009 22:51:32 +0000
Received: from stealth-10-32-244-219.cisco.com (stealth-10-32-244-219.cisco.com [10.32.244.219]) by sj-core-1.cisco.com (8.13.8/8.14.3) with ESMTP id n9MMpVvi024975; Thu, 22 Oct 2009 22:51:31 GMT
Message-Id: <9423E875-E829-4F1A-ADBD-EAF7089FD615@cisco.com>
From: Fred Baker <fred@cisco.com>
To: =?ISO-8859-1?Q?=22Ilpo_J=E4rvinen=22?= <ilpo.jarvinen@helsinki.fi>
In-Reply-To: <Pine.LNX.4.64.0910222340490.20686@melkinkari.cs.Helsinki.FI>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v936)
Date: Thu, 22 Oct 2009 15:51:31 -0700
References: <4A916DBC72536E419A0BD955EDECEDEC0636399B@E03MVB1-UKBR.domain1.systemhost.net> <4ADD187E.6000200@thinkingcat.com> <200910221807.n9MI7P2a002071@bagheera.jungle.bt.co.uk> <F437BD07-581B-4542-ABDB-ABABEDC3B8DD@cisco.com> <Pine.LNX.4.64.0910222140130.20686@melkinkari.cs.Helsinki.FI> <BA5FCA1F-0F30-42D2-8C3A-006B28B7D0E1@cisco.com> <Pine.LNX.4.64.0910222340490.20686@melkinkari.cs.Helsinki.FI>
X-Mailer: Apple Mail (2.936)
Cc: re-ECN unIETF list <re-ecn@ietf.org>
Subject: Re: [re-ECN] FW: ConEx BoF announcement text
X-BeenThere: re-ecn@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: re-inserted explicit congestion notification <re-ecn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/re-ecn>
List-Post: <mailto:re-ecn@ietf.org>
List-Help: <mailto:re-ecn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Oct 2009 22:51:24 -0000

:-)

You started out by telling me that arbitrarily increasing the window  
doesn't hurt anything - you quoted Bob saying that congestion isn't a  
problem. I demonstrated that congestion out of control is a problem.  
Then you complained that I didn't demonstrate that Cubic/Reno is a  
problem. OK, let's talk about Cubic/Reno, and why Bittorrent has a  
working group in the IETF (ledbat) about how to reduce the impact on  
ISPs and their customers when they run seven Cubic/Reno TCPs in  
parallel.

Actually, could I get you to read the ledbat charter? http://www.ietf.org/dyn/wg/charter/ledbat-charter.html

On Oct 22, 2009, at 3:30 PM, Ilpo Järvinen wrote:

> On Thu, 22 Oct 2009, Fred Baker wrote:
>
>> well, we disagree on some points.
>
> I guess we agree quite much yes. However, what I'm pretty much still
> lacking the connection you made between Cubic/Reno and throughput
> reduction because of "the cliff". I oppose such an universal  
> statement you
> made earlier about the cliff being there if we keep pushing the  
> network.
> E.g., you put earlier Cubic among the "bad ones" who push the  
> network, but
> now your scenario below doesn't have anything that even remotely  
> resembles
> Cubic/Reno but something totally unresponsive (I'm assuming SACK based
> recovery is there with this defination of Reno, and that router  
> buffers
> are large enough to accommodate MD-madness "recovery"). ...That's the
> connection I want to fully tear down. Don't first put Cubic or Reno  
> into
> bads than show an example with something completely different, it  
> isn't
> very fair to blame something from others faults like that and justify
> their "badness" that way. This is very important because this
> mis-connection is there in many minds that when enough congestion,
> throughput automatically dramatically drops (like in the Jain's  
> graph). As
> a result, we hear these "congestion is the problem", and "more  
> aggressive
> means more congestion" type of an attitude. Just to make sure, I  
> didn't
> mean to say here that I'd think that Cubic/Reno is something perfect
> either.
>
> ...The rest is just some comments to your rather intimidated  
> scenario :-).
>
>> I don't know that perfect agreement is necessary, and certainly not  
>> on
>> this thread. That said...
>>
>> Regarding behavior beyond the cliff, it might be instructive to  
>> take the most
>> extreme possible case - a dumbbell network and a really obnoxious  
>> file
>> transfer algorithm.
>>
>>        +-+ 10*N   +-+ 10*N  +-+   N  +-+ 10*N  +-+
>>        |A|--------|B|-------|C|------|D|-------|E|
>>        +-+        +-+       +-+      +-+       +-+
>>               10*N |         |10*N
>>                  +--+      +--+
>>                  | F|      | G|
>>                  +--+      +--+
>>
>> Presume that the capacity from C to D is N, and all other  
>> capacities are 10*N.
>> Presume that some application at A wants to send something really  
>> big to a
>> peer at E, and some other application at F wants to send something  
>> else to a
>> peer at G.
>>
>> Presume also that the way file transfers work is that the sender puts
>> appropriate headers on all of the indicated packets, queues them up  
>> to its
>> link layer chip, and spews them as fast as it can. If it's a 100  
>> gigabyte
>> file, it queues up 100 gigabytes and watches it heat the "wires".  
>> The peer
>> receives what it receives and tells the sender what it didn't. The  
>> sender now
>> repeats the exercise with anything his peer didn't receive.
>>
>> BTW, that is in fact how a file transfer done in a certain research
>> environment on very noisy lines is proposed to work. I didn't just  
>> dream that
>> up.
>>
>> Obviously, links A-B, B-C, F-B, and C-D will be fully utilized.  
>> Since B-C has
>> less (half) capacity than A-B + F-B, each of those flows will drop  
>> about half
>> of its traffic. The transfer rate F-G should be around 5*N. C-D  
>> will also drop
>> 80% of the remaining traffic, resulting in an arrival rate at E of  
>> N. A will
>> go through at least ten iterations of its transfer, and F at least  
>> two.
>>
>> The first down side is that while A-B and F-B are fully utilized,  
>> most
>> of that capacity is wasted.
>
> Yes, that's the nature of unresponsiveness you added by yourself,  
> and now
> you say that it is a down side?!? ...Nobody (including the network)  
> can
> prevent an application from doing stupid things like wasting the  
> resources
> of the first link :-). I understand though that you could construct  
> a more
> complex topology where it wouldn't be the first link.
>
> Instead of tackling that possibility, I take another extreme example  
> to
> stress the point of mine: An application could cause the very same
> symptoms as were with the classical congestion collapse (as per Nagle)
> just by keeping resending just the very same segment even though it  
> made
> through (and ACK was received), only multiple copies are then in the
> network and the throughput of that particular flow will be  
> dramatically
> affected. But it's just plain stupid behavior which no network can  
> prevent
> you from doing. Of course nobody else would have to be part of that
> experiment and they can operate with good throughputs. In a thought
> experiment we can put a TCP implementation with SACK and lots of  
> receiver
> window into the age of Nagle and prove that it keeps operating with  
> very
> good throughput regardless of what the stupid senders do as long as  
> the
> transfer isn't unlucky to run of data to send while recovering.
>
> ...Secondly, I don't find this non-optimal use of excess capacity to  
> be
> the reason why one would need/want Re-ECN.
>
> Another stupidity of the application/protocol in this example is the  
> way
> of feedback which exaggerates anomalities nearby the end of a  
> transfer.
>
>> This impacts F's ability to deliver traffic to G; it
>> should have achieved 90% throughput rate (the rate C-G less the  
>> rate C-D), and
>> in fact achieved 50% or less. With the right mechanism (ack  
>> clocking comes to
>> mind, and there are other approaches), the capacity used by the  
>> file transfer
>> on A-B is only 10% used and there is at most a 10% impact on the  
>> file transfer
>> F-G.
>
> ...Right, would there be responsiveness, all changes. And I guess we  
> then
> pretty much agree.
>
>> The second down-side is that the traffic arriving at G and E will be
>> scattered all through the respective files, meaning that they have to
>> store the bits and pieces and do a fair bit of searching and  
>> sorting to
>> determine what they have not yet received and to reconstitute the  
>> file.
>> This is a waste both of memory and CPU resources.
>
> This has nothing to do with the throughput reduction anymore, don't  
> you
> think? Also, to me these seem "extras" again are because of stupid
> behavior introduced by you rather than anything very close to what I  
> was
> arguing against.
>
> But in general I agree with you that loss recovery even without any
> throughput loss introduces these space constraints (though in a lot  
> less
> severe way than in your extreme example) and one would want to avoid  
> that
> if at all possible. But IMHO that again has rather little to do with  
> "the
> cliff" and the throughput reduction (that supposedly happens because  
> of
> it).
>
>> Voila, we come back pretty quickly to wanting the transmission rate  
>> of A to
>> approximate the reception rate at E and the transmission rate at F to
>> approximate the reception rate at G, and wanting the window to not be
>> excessively large. An excessively large window wastes bandwidth  
>> before the
>> bottleneck and potentially creates unnecessary bottlenecks, impacts
>> other file transfers, imposes a memory/algorithm burden on the  
>> receiver,
>> and for all of those costs accomplishes nothing that a smaller window
>> would not have accomplished better.
>
> Agreed. ...Noted though that you have for some reason cleverly milded
> the "reduces throughput" into "_potentially_ creates unnecessary
> bottlenecks" (emphasis mine) here which I can certainly agree much
> more ;-).
>
> ...In fact, I'd even not choke because of your example here, if  
> given in
> less extreme form, as we have CBR type of non-responsiveness present  
> too
> for real, and thus dead packets wasting capacity (without ECN) and  
> so on.
>
>> A window that is too large is a bad thing, and the congestion it  
>> causes is a
>> bad thing. Not that the presence of congestion is bad - it is, as  
>> Bob says, a
>> natural side effect of what happens. But congestion that is poorly  
>> managed by
>> the endpoint and/or the network is a very bad thing. re-ecn does in  
>> fact try
>> to give the sender an incentive to approximate the knee (the  
>> smallest window
>> that achieves the potential throughput rate), and increasing it to  
>> or beyond
>> the cliff by definition doesn't increase it's rate materially  
>> beyond that
>> point.
>
> Agreed. Like you pointed out earlier, there's no increase in  
> throughput
> beyond that point not matter what trick one tries to play and I agree.
> My point was just to say that with responsive transfers there isn't  
> such
> magic cliff (always), but the throughput would just keep that constant
> line even if drops start to occur. And especially mentioning Cubic/ 
> Reno,
> which are responsive, in this context of "cliff" seemed very  
> misleading.
>
>
> --
> i.
>
>
>> On Oct 22, 2009, at 12:36 PM, Ilpo Järvinen wrote:
>>
>>> On Thu, 22 Oct 2009, Fred Baker wrote:
>>>
>>>> I'll argue something slightly different. It comes to the same  
>>>> endpoint, but
>>>> through a different observation. Some of us, you Bob especially,  
>>>> are very
>>>> familiar with this; others are perhaps less so. Bear with me.
>>>>
>>>> Jain's 1994 patent on congestion control defines two terms: the  
>>>> "cliff" and
>>>> the "knee" of a throughput curve.
>>>>
>>>> |
>>>>    T   |      availabile capacity
>>>>    h   | ---------------------------------------
>>>>    r   |       +---+
>>>>    u   |  knee/     \cliff
>>>>    p   |     /       \
>>>>    u   |    /         \
>>>>    t   |   /           \
>>>> |/             \
>>>> |/               \
>>>> |/                 ----------------
>>>>        -----------------------------------------
>>>>                  Window  --->
>>>>
>>>> As the TCP/SCTP window increases from zero to some value,  
>>>> throughput also
>>>> increases. That stops when the available capacity has been  
>>>> consumed; at
>>>> that
>>>> point, even if the window grows, throughput does not. What  
>>>> increases
>>>> instead
>>>> is RTT, because a queue grows.
>>>
>>> Up to this point I agree with this paragraph...
>>>
>>>> If window continues to increase, at some point the queue starts  
>>>> dropping
>>>> traffic, and throughput degrades.
>>>
>>> ...However, this is not a true claim. As long as no unnecessary  
>>> work is
>>> not done at any bottleneck (and we don't have a shared resource  
>>> such as
>>> a wireless channel), thoughput does _not_ degrade (at all!), even  
>>> though
>>> many keep asserting that without much of a thought nowadays. Of  
>>> course
>>> non-infinitely long flows would introduce some anomalities to  
>>> calculations
>>> through which one could use to show that this claim is "true" but  
>>> I think
>>> that's hardly what people usually mean when they make that claim  
>>> that
>>> dropping implies throughput degradation.
>>>
>>> Somewhat related, for some reason there's a very common misconcept  
>>> that
>>> drops imply poor performance. That is often stated as a fact without
>>> looking deeper into what really was the cause for sub-optimal
>>> performance, often that turns out to be something else when looked
>>> closely enough. I'd say that very often losses were, in fact,  
>>> innocent.
>>> Quite often the blame would fall on constant factor MD (if one is  
>>> honest
>>> enough to admit).
>>>
>>> Btw, your graph is not accurate (wrt. Jain), he's drawing a much  
>>> sharper
>>> drop than you do at his "cliff" (so with ascii you'd have to use |  
>>> at
>>> first to match him). I've some doubts about validity of that sharp
>>> angle too, so I find your graph is much more sensible (that is, as  
>>> long as
>>> the resource is shared).
>>>
>>>> By definition, the window at the "knee" is smaller than the  
>>>> window at
>>>> the "cliff", but throughput is the same.
>>>>
>>>> Common TCP congestion control algorithms such as Reno and Cubic  
>>>> tune to the
>>>> cliff. That has the upside of maximizing throughput; it has the  
>>>> downside
>>>> that
>>>> it is abusive to other applications such as voice/video  
>>>> (increased and
>>>> variable RTT, and induces loss), and to other TCP sessions.  
>>>> #include
>>>> <discussion of bittorent and why other users are negatively  
>>>> impacted by it>
>>>
>>> I think Bob doesn't agree with you here, he isn't saying that  
>>> congestion
>>> is something bad but something nearly opposite. Bob's own words:
>>>
>>> "It would contradict this to say congestion is a problem - it's  
>>> not - it's
>>> healthy and natural in a data network."
>>>
>>> ...It's just that if you keep doing that too much (beyond the  
>>> congestion
>>> volume allocated for you), you would "suffer (if I've understood him
>>> right). ...So it has very little to do with operating in the  
>>> "cliff" or
>>> "knee" in short timescales (unless of course you've already ran  
>>> out of
>>> your quota).
>>>
>>>> Other TCP congestion control algorithms based on ECN, CalTech  
>>>> FAST, etc,
>>>> try
>>>> to tune to the knee. This gets them exactly the same throughput  
>>>> including
>>>> support of very high rate applications but with a smaller window  
>>>> value and
>>>> therefore lower queue depth and lower probability of loss - both  
>>>> for
>>>> themselves and their competitors. It does so without negatively,  
>>>> or at
>>>> least
>>>> AS negatively, impacting applications they compete with beyond  
>>>> seeking to
>>>> share the available capacity with them.
>>>
>>> I certainly agree with you here that low queuing delay is a  
>>> desirable
>>> property too. ...However, I don't see how e.g., ECN alone could  
>>> achieve
>>> low queue delay and high throughput at the same time, you'd need  
>>> something
>>> more than that (mainly to remove the 0.5 factored MD). And after  
>>> removing
>>> that, we'd get something close to what Matt Mathis is suggesting  
>>> and that
>>> then does not anymore depend on ECN to work (though I think one  
>>> could
>>> certainly avoid lots of loss recovering by signalling in early with
>>> ECN).
>>>
>>> -- 
>>> i.
>>
>>
>
> -- 
> i.