Re: [tcpm] 答复: A question about Delayed ACK and RTO

"Jakob Heitz (jheitz)" <jheitz@cisco.com> Wed, 14 December 2016 17:21 UTC

Return-Path: <jheitz@cisco.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A153129BAC for <tcpm@ietfa.amsl.com>; Wed, 14 Dec 2016 09:21:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.417
X-Spam-Level:
X-Spam-Status: No, score=-17.417 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.896, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JLY12Sap5Kte for <tcpm@ietfa.amsl.com>; Wed, 14 Dec 2016 09:21:35 -0800 (PST)
Received: from alln-iport-8.cisco.com (alln-iport-8.cisco.com [173.37.142.95]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 026BE129B95 for <tcpm@ietf.org>; Wed, 14 Dec 2016 09:21:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=23463; q=dns/txt; s=iport; t=1481736093; x=1482945693; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=l+0x9xhQ+bK//JsWwT7TgDdZOVOF+Q2mHNPVBtNdeKg=; b=WMX3oU14h4t2HNjOPy/xhMikTz8k6SdrvdeLc8VG2FDV0PJo8JZ+d9NV OTsHm0vZbMrmQt1JTVQiQ9CutHH53eYRty94gNJEtMnk0UIbbWQNQuuml yg+p5k4y9AiYWcJ+IkDbUsgi1EVUUVOrph2CE5RitnR0QExx3fGrU4IFc 8=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0CrAwCmflFY/4ENJK1dGQEBAQEBAQEBAQEBBwEBAQEBgnM5CwEBAQEBH1oyVI1OrCKCCR8BDIV2AhqBXT8UAQIBAQEBAQEBYiiEaQICAgEBDFcJBAcQAgEGAg4xBQICJQsUBgsCBAoEBR6ITQ6MUJ1ECIImiw8BAQEBAQEBAQEBAQEBAQEBAQEBAQEYBYY+gX2CXoRfH4JKMYIwBZUAhWsBkSyQS44UhA4BHzc+ZCkOAQGDBTscgV1yAYgwAQEB
X-IronPort-AV: E=Sophos;i="5.33,347,1477958400"; d="scan'208,217";a="360748704"
Received: from alln-core-9.cisco.com ([173.36.13.129]) by alln-iport-8.cisco.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 14 Dec 2016 17:21:32 +0000
Received: from XCH-RCD-013.cisco.com (xch-rcd-013.cisco.com [173.37.102.23]) by alln-core-9.cisco.com (8.14.5/8.14.5) with ESMTP id uBEHLWuD020943 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 14 Dec 2016 17:21:32 GMT
Received: from xch-aln-014.cisco.com (173.36.7.24) by XCH-RCD-013.cisco.com (173.37.102.23) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Wed, 14 Dec 2016 11:21:32 -0600
Received: from xch-aln-014.cisco.com ([173.36.7.24]) by XCH-ALN-014.cisco.com ([173.36.7.24]) with mapi id 15.00.1210.000; Wed, 14 Dec 2016 11:21:32 -0600
From: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
To: Neal Cardwell <ncardwell@google.com>
Thread-Topic: [tcpm] 答复: A question about Delayed ACK and RTO
Thread-Index: AQHSVixyarKi+1Y3zEWcMy0t1yecaaEHsK1i
Date: Wed, 14 Dec 2016 17:21:32 +0000
Message-ID: <3C746F85-73B9-428F-AD18-82E6BB99BE52@cisco.com>
References: <A747A0713F56294D8FBE33E5C6B8F5815F545A98@DGGEMA502-MBS.china.huawei.com> <CADVnQynhEt-bKeFz3bN=MHd4rx8csg-no9ifTp__rbmT_JDggw@mail.gmail.com> <A747A0713F56294D8FBE33E5C6B8F5815F545E28@DGGEMA502-MBS.china.huawei.com> <CADVnQykfYUOyrSsQLrz2EFL0_q5whjwdD-cEehowj+nL4zUMtQ@mail.gmail.com> <F9D5899E-435E-417A-B5D0-561183AFFC92@cisco.com>, <CADVnQy=2vuxwybswBgzEEjinp92DRKFZgphaVdBGvguJBirbJw@mail.gmail.com>
In-Reply-To: <CADVnQy=2vuxwybswBgzEEjinp92DRKFZgphaVdBGvguJBirbJw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
Content-Type: multipart/alternative; boundary="_000_3C746F8573B9428FAD1882E6BB99BE52ciscocom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/5LoFmNURRJd3yTM4nxA9yuUx9Fo>
Cc: "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] 答复: A question about Delayed ACK and RTO
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Dec 2016 17:21:38 -0000

The problem in a virtualized environment is that RTT is highly variable. Mostly sub millisecond with occasional spikes at several 100 ms. And not just because of delayed ack, but due to effects of VM scheduling and load.

Thanks,
Jakob.


On Dec 14, 2016, at 9:06 AM, Neal Cardwell <ncardwell@google.com<mailto:ncardwell@google.com>> wrote:

On Wed, Dec 14, 2016 at 11:56 AM, Jakob Heitz (jheitz) <jheitz@cisco.com<mailto:jheitz@cisco.com>> wrote:
Historically, the minimum RTO is 1 second and actual RTT is very rarely more than 1 second, so all this RTT calculation hardly ever matters anyway.

That may be true historically, but for many years major TCP implementations (including Linux and FreeBSD) have used a minimum RTO closer to 200ms. And in datacenter environments even 200ms can be infeasibly high.

neal


Thanks,
Jakob.


On Dec 14, 2016, at 5:24 AM, Neal Cardwell <ncardwell@google.com<mailto:ncardwell@google.com>> wrote:

On Wed, Dec 14, 2016 at 4:48 AM, zhangyali (D) <zhangyali369@huawei.com<mailto:zhangyali369@huawei.com>> wrote:
Hi Neal,

Thanks for your providing request information.

About the delay of delayed ACK, I found another clue in a SIGCOMM paper in 1988. In the page 14, a sentence is “The 4.5KBps senders were talking to 4.3BSD receivers which would delay an ack until 35% of the window was filled or 200 ms had passed (i.e., an ack was delayed for 5-7 packets on average).” There is no reference about the 200 ms, so I guess this is the particular delay appears for the first time.
I try to calculate the value based on delayed packet numbers, packet size and sending rate. In this case, the delayed packet number is 7, the packet size is 576Byte (refer to RFC879 in 1983), and sundering rate is 4.5Bps. The value is 896 ms! Longer than 200 ms.

>. However, it's quite easy for a bulk transfer to never have any delayed ACKs for most of its lifetime, during which the RTO gradually converges toward the raw RTT value. Then when there is suddenly a delayed ACK, there can be a spurious RTO.
I am not very sure if I see your point. I think the key point is the RTT variation will be weakened when packets are sent in a burst (the consequence of delayed ACK).   And the spurious  could only happen for the last few packets whose number is smaller than delayed packets, right?

Yes, there should only be a delayed ACK at the end of an application chunk if there is an odd number of packets. But we would expect roughly half of application chunks to have an odd number of packets. Though the proportion is probably higher than that, since many application chunks are just one packet (e.g. an HTTP or RPC request or response).


About your proposal about negotiation of delayed ACK, a potential problem is that RTO will be stretched than before because you add extra delay. As you have said, most of the packets will not exceed RTO even host enable delayed ACK for most large flows, but the retransmission will be delayed (i.e., 5ms)also once one packet is lost.

The basic idea of the proposal is to tweak the RTO calculation and turn a previously existing, historically motivated 200ms fixed "slop factor" into a dynamically negotiated 5ms "slop factor". In our experience that is almost always a win.

neal


Best,
Yali
发件人: Neal Cardwell [mailto:ncardwell@google.com<mailto:ncardwell@google.com>]
发送时间: 2016年12月13日 23:18
收件人: zhangyali (D) <zhangyali369@huawei.com<mailto:zhangyali369@huawei.com>>
抄送: tcpm@ietf.org<mailto:tcpm@ietf.org>
主题: Re: [tcpm] A question about Delayed ACK and RTO

On Tue, Dec 13, 2016 at 3:58 AM, zhangyali (D) <zhangyali369@huawei.com<mailto:zhangyali369@huawei.com>> wrote:
Hi all,

Recent days, I am doing some simulation about TCP performance in NS3. I found a phenomenon is a default setting of TCP’s delayed ACK is two packets or 200ms. I am wondering if this setting is accord with  some RFCs in IETF. But After I referred to some RFCs, I just found some restrictions, such as, the delay must be less than 0.5ms (RFC1122).

I believe the 200ms figure is due to historical reasons. AFAIK it's due to the BSD delayed ACK behavior (see Stevens "TCP/IP Illustrated Volume 2", section 25.4 and figure 25.7, which describes the 200ms delayed ACK timer). Then this value was inherited by other widely-deployed OSes.


I think the delayed ACK has a close relationship with RTO. Take an extreme scenario, if the delay is longer than RTO, many packets will be retransmitted, which will waste many network resources.

Yes, exactly. In theory, the RTO tries to be adaptive enough to measure any RTT variations caused by delayed ACKs, and increase the RTO in response to this. However, it's quite easy for a bulk transfer to never have any delayed ACKs for most of its lifetime, during which the RTO gradually converges toward the raw RTT value. Then when there is suddenly a delayed ACK, there can be a spurious RTO.

To avoid that, many TCP stacks (at least the major open source OSes) have a hard-coded 200ms "slop factor" or "fudge factor", to try to never let their estimate of RTT variation fall below that 200ms value, to avoid this effect.

So I want to know do we have some RFCs have given the exact value of both? And if we permit any TCP stack to set them freely, what is the mechanism to balance the mismatch between RTO and delayed ACK?

I'm not aware of RFC specifications for exact values of both (delayed ACK and RTO). However, the historical precedent is very strong, and the 200ms delayed ACK value was very pronounced in Internet traces at least as recently as 2011, when I last looked at the effect. (Probably others have more recent data points for the prevalence of 200ms delayed ACKs.)

At IETF 97 our team at Google presented some features we use for internal TCP traffic at Google, where the endpoints can negotiate the specific constant to use for the maximum delayed ACK from the receiver and the corresponding minimum RTT delay variation for budgeting in the RTO at the sender:

  https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf

As this slide deck notes, within Google we negotiate 5ms for delayed ACKs.

cheers,
neal


_______________________________________________
tcpm mailing list
tcpm@ietf.org<mailto:tcpm@ietf.org>
https://www.ietf.org/mailman/listinfo/tcpm