Re: [tcpm] Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)

"McAlpine, Gary" <gary.mcalpine@bluecoat.com> Thu, 15 May 2014 17:22 UTC

Return-Path: <gary.mcalpine@bluecoat.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B6C331A02D1 for <tcpm@ietfa.amsl.com>; Thu, 15 May 2014 10:22:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.551
X-Spam-Level:
X-Spam-Status: No, score=-2.551 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nAJ88HlzWX4R for <tcpm@ietfa.amsl.com>; Thu, 15 May 2014 10:22:11 -0700 (PDT)
Received: from plsvl-mailgw-01.bluecoat.com (spf.bluecoat.com [199.91.133.11]) by ietfa.amsl.com (Postfix) with ESMTP id 84EC61A02ED for <tcpm@ietf.org>; Thu, 15 May 2014 10:22:11 -0700 (PDT)
Received: from pwsvl-exchts-04.internal.cacheflow.com (esxprd03.bluecoat.com [10.2.2.162]) by plsvl-mailgw-01.bluecoat.com (Postfix) with ESMTP id 57FDB81A180; Thu, 15 May 2014 10:22:04 -0700 (PDT)
Received: from pwsvl-excmbx-05.internal.cacheflow.com ([fe80::f848:d461:9aa9:59a8]) by pwsvl-exchts-04.internal.cacheflow.com ([fe80::9403:6f39:feac:adb1%12]) with mapi id 14.03.0123.003; Thu, 15 May 2014 10:22:03 -0700
From: "McAlpine, Gary" <gary.mcalpine@bluecoat.com>
To: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>, "tcpm@ietf.org" <tcpm@ietf.org>
Thread-Topic: Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)
Thread-Index: Ac9uevrHDRZF5TP/SM+28Qj1oLeHCwB3WJig
Date: Thu, 15 May 2014 17:21:57 +0000
Message-ID: <FD2F17B9B55D72489D521ADC634E4628A44836@pwsvl-excmbx-05.internal.cacheflow.com>
References: <81564C0D7D4D2A4B9A86C8C7404A13DA31F62CA6@ESESSMB205.ericsson.se>
In-Reply-To: <81564C0D7D4D2A4B9A86C8C7404A13DA31F62CA6@ESESSMB205.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.2.2.106]
Content-Type: multipart/alternative; boundary="_000_FD2F17B9B55D72489D521ADC634E4628A44836pwsvlexcmbx05inte_"
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/tcpm/YjL-mDlGMF_idqIY1SaxuHwCRSQ
Subject: Re: [tcpm] Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 May 2014 17:22:15 -0000

Hi Ingemar,

I'm not sure what the rational was for dropping ssthresh to 2*MSS, but it seems to me that there are too many non-congestion-related events that can cause this extreme setting of ssthresh. Once ssthresh gets set so low, the real problem we have seen is recovery on long-lived connections or where an ssthresh-host-cache is in use. In these cases, the current congestion RFCs don't provide a specific mechanism for ssthresh to recover to a level that represents the actual congestion level. So what we have seen are connections between a particular client and server go to very low throughput and not recover for a very long time. In fact, the traffic between the client and server may be such that it can never recover until the connection is dropped (in the case of a long-lived connections) or the cached ssthresh is reset.

To allow our customers to avoid this problem, we have provided two mechanisms in our software:


1.       They can disable ssthresh-host-cache so that new connections will always restart ssthresh.

2.       Since RFC 5681 paragraph 4.1 is silent on what to do with ssthresh when restarting idle connections, we assume ssthresh is no longer valid after a sufficiently long idle period. Given the next transfer is going to perform a slow-start that will (essentially) search for the appropriate cwnd level and restart the ack clock,  we also restart ssthresh so that the appropriate cwnd level can be found.

These mechanisms seem to work quite well and we haven't seen any cases where they have caused other problems, but I would be much happier if the congestion RFCs were not so silent on what to do with ssthresh to recover from cases where ssthresh gets set to an inappropriately low level.

Thanks,
Gary



From: Ingemar Johansson S [mailto:ingemar.s.johansson@ericsson.com]
Sent: Tuesday, May 13, 2014 1:15 AM
To: tcpm@ietf.org
Cc: Karen E. E. Nielsen; gorry@erg.abdn.ac.uk; McAlpine, Gary
Subject: Problem with Low SSThresh (was I-D Action: draft-ietf-tcpm-newcwv-03.txt)

Hi

Karen pointed out this thread to me
http://www.ietf.org/mail-archive/web/tcpm/current/msg08315.html
I started to look closer at this issue quite recently, the problem I have is that SSthresh can drop to very low values after only a few lost packets. I have seen this odd effect earlier but have not bothered with it.


The experiment is to run a large FTP transfer over a 1Mbps bottleneck with min RTT = 40ms. No AQM or tail drop queue enabled i.e a buffer bloated scenario.

TCP NewReno. After 10s I "pull the plug" for 100ms (100% packet drop), this leads to 5 lost segments. at T =10.1s packets are forwarded as usual.

What I have seen is that a retransmission timeout is immediately followed by a loss event, the effect of which is that SSThresh goes down to 2 MSS.  It seems to me that the RTO timer value is too low, I have not understood the effect completely though. Could the RTO timer be the culprit or is there some other effect ?.

I am running these experiments in a proprietary LTE system simulator, we try to keep it up to date to match the Linux TCP stack reasonably well, it cannot be ruled out however that our implementation miss some important feature.

/Ingemar

=================================
Ingemar Johansson  M.Sc.
Senior Researcher

Ericsson AB
Wireless Access Networks
Labratoriegränd 11
971 28, Luleå, Sweden
Phone +46-1071 43042
SMS/MMS +46-73 078 3289
ingemar.s.johansson@ericsson.com<mailto:ingemar.s.johansson@ericsson.com>
www.ericsson.com

"Those are my principles, and if you don't like them...
well, I have others."  Groucho Marx<http://www.brainyquote.com/quotes/authors/g/groucho_marx.html>
=================================