Re: [tcpm] Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Fri, 16 May 2014 13:49 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EDB881A01F5 for <tcpm@ietfa.amsl.com>; Fri, 16 May 2014 06:49:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.852
X-Spam-Level:
X-Spam-Status: No, score=-4.852 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kTS5GgRfzP8w for <tcpm@ietfa.amsl.com>; Fri, 16 May 2014 06:49:02 -0700 (PDT)
Received: from spey.erg.abdn.ac.uk (spey.erg.abdn.ac.uk [139.133.204.173]) by ietfa.amsl.com (Postfix) with ESMTP id 17F651A01A0 for <tcpm@ietf.org>; Fri, 16 May 2014 06:49:02 -0700 (PDT)
Received: by spey.erg.abdn.ac.uk (Postfix, from userid 5001) id 389322B458B; Fri, 16 May 2014 14:48:54 +0100 (BST)
Received: from ERG-research.local (gorry-mac.erg.abdn.ac.uk [139.133.207.5]) by spey.erg.abdn.ac.uk (Postfix) with ESMTPSA id 67B5B2B43B3; Fri, 16 May 2014 14:48:51 +0100 (BST)
Message-ID: <53761743.6090906@erg.abdn.ac.uk>
Date: Fri, 16 May 2014 14:48:51 +0100
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Organization: The University of Aberdeen is a charity registered in Scotland, No SC013683.
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: "McAlpine, Gary" <gary.mcalpine@bluecoat.com>, Ingemar Johansson S <ingemar.s.johansson@ericsson.com>, "tcpm@ietf.org" <tcpm@ietf.org>
References: <81564C0D7D4D2A4B9A86C8C7404A13DA31F62CA6@ESESSMB205.ericsson.se> <FD2F17B9B55D72489D521ADC634E4628A44836@pwsvl-excmbx-05.internal.cacheflow.com>
In-Reply-To: <FD2F17B9B55D72489D521ADC634E4628A44836@pwsvl-excmbx-05.internal.cacheflow.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/b_okoqz1cOnIT_cK46yluO_7hhE
Subject: Re: [tcpm] Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: gorry@erg.abdn.ac.uk
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 May 2014 13:49:05 -0000

Yes - this seems like an oversight in the way ssthresh was conceived. 
The value is intended to capture congestion history, and hence it's nice 
to cache this for new flows to  prevent them overshooting the bottleneck 
capacity. I can think of 3 things where current ssthresh methods give 
unwarranted poor performance:

  - path changes, presumably uncommon.
  - historic information (an event hours ago has little bearing on a 
long-lived connection and is a real impediment when the RTT is large).
  - non-congestion loss (especially when FS was small)

Gorry

On 15/05/2014 18:21, McAlpine, Gary wrote:
> Hi Ingemar,
>
> I'm not sure what the rational was for dropping ssthresh to 2*MSS, but it seems to me that there are too many non-congestion-related events that can cause this extreme setting of ssthresh. Once ssthresh gets set so low, the real problem we have seen is recovery on long-lived connections or where an ssthresh-host-cache is in use. In these cases, the current congestion RFCs don't provide a specific mechanism for ssthresh to recover to a level that represents the actual congestion level. So what we have seen are connections between a particular client and server go to very low throughput and not recover for a very long time. In fact, the traffic between the client and server may be such that it can never recover until the connection is dropped (in the case of a long-lived connections) or the cached ssthresh is reset.
>
> To allow our customers to avoid this problem, we have provided two mechanisms in our software:
>
>
> 1.       They can disable ssthresh-host-cache so that new connections will always restart ssthresh.
>
> 2.       Since RFC 5681 paragraph 4.1 is silent on what to do with ssthresh when restarting idle connections, we assume ssthresh is no longer valid after a sufficiently long idle period. Given the next transfer is going to perform a slow-start that will (essentially) search for the appropriate cwnd level and restart the ack clock,  we also restart ssthresh so that the appropriate cwnd level can be found.
>
> These mechanisms seem to work quite well and we haven't seen any cases where they have caused other problems, but I would be much happier if the congestion RFCs were not so silent on what to do with ssthresh to recover from cases where ssthresh gets set to an inappropriately low level.
>
> Thanks,
> Gary
>
>
>
> From: Ingemar Johansson S [mailto:ingemar.s.johansson@ericsson.com]
> Sent: Tuesday, May 13, 2014 1:15 AM
> To: tcpm@ietf.org
> Cc: Karen E. E. Nielsen; gorry@erg.abdn.ac.uk; McAlpine, Gary
> Subject: Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)
>
> Hi
>
> Karen pointed out this thread to me
> http://www.ietf.org/mail-archive/web/tcpm/current/msg08315.html
> I started to look closer at this issue quite recently, the problem I have is that SSthresh can drop to very low values after only a few lost packets. I have seen this odd effect earlier but have not bothered with it.
>
>
> The experiment is to run a large FTP transfer over a 1Mbps bottleneck with min RTT = 40ms. No AQM or tail drop queue enabled i.e a buffer bloated scenario.
>
> TCP NewReno. After 10s I "pull the plug" for 100ms (100% packet drop), this leads to 5 lost segments. at T =10.1s packets are forwarded as usual.
>
> What I have seen is that a retransmission timeout is immediately followed by a loss event, the effect of which is that SSThresh goes down to 2 MSS.  It seems to me that the RTO timer value is too low, I have not understood the effect completely though. Could the RTO timer be the culprit or is there some other effect ?.
>
> I am running these experiments in a proprietary LTE system simulator, we try to keep it up to date to match the Linux TCP stack reasonably well, it cannot be ruled out however that our implementation miss some important feature.
>
> /Ingemar
>
> =================================
> Ingemar Johansson  M.Sc.
> Senior Researcher
>
> Ericsson AB
> Wireless Access Networks
> Labratoriegränd 11
> 971 28, Luleå, Sweden
> Phone +46-1071 43042
> SMS/MMS +46-73 078 3289
> ingemar.s.johansson@ericsson.com<mailto:ingemar.s.johansson@ericsson.com>
> www.ericsson.com
>
> "Those are my principles, and if you don't like them...
> well, I have others."  Groucho Marx<http://www.brainyquote.com/quotes/authors/g/groucho_marx.html>
> =================================
>
>
>