Re: [tcpm] 答复: A question about Delayed ACK and RTO
Neal Cardwell <ncardwell@google.com> Wed, 14 December 2016 19:13 UTC
Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EFB0B129EB1 for <tcpm@ietfa.amsl.com>; Wed, 14 Dec 2016 11:13:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.896
X-Spam-Level:
X-Spam-Status: No, score=-4.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-2.896, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QXTZ5M77mjXQ for <tcpm@ietfa.amsl.com>; Wed, 14 Dec 2016 11:13:46 -0800 (PST)
Received: from mail-oi0-x229.google.com (mail-oi0-x229.google.com [IPv6:2607:f8b0:4003:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 71EDC12711D for <tcpm@ietf.org>; Wed, 14 Dec 2016 11:13:46 -0800 (PST)
Received: by mail-oi0-x229.google.com with SMTP id y198so30274602oia.1 for <tcpm@ietf.org>; Wed, 14 Dec 2016 11:13:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=iXGUA+Xw11lK3FmwK2wIIc988zrQ4PMaUyXQGg6HPzk=; b=FUbYGpY/zzCylJNAqFQOOEsgqhnf3CW4qhSf88IUkaZgdq8ktQY0ZTUBHLvhE6lU9M y6PCQ9A4wirHnOtu/i+l7qlwtgQKv5xwUgpQ3jx/kHA7KmMMs/2Vg/llLEriqzIKSWQI B8wlLG5judiyu/A93/0/5FUcYaK4cfV0JAjOlBqWrECAWu8lbliUGDKqyDSepHMRyUs6 5V8Fy3WuG5hhgHwF7znDfX9E/dzl8V9+0W8HqIPRQNXuNHMnCJyfWuGhQbalUSht1p94 lJeNLlCQFwind4lh/fgi0NdGdhcfycg6Ij+kzxhFMq0839p0ozKFkViujpC8dbZCW7Ue S0TA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=iXGUA+Xw11lK3FmwK2wIIc988zrQ4PMaUyXQGg6HPzk=; b=trbErPXDbXypTqyzhcy6Z3rjEM1amFNWf1CMC/7WXJr0YxkG5EZxpOoA5hW6FCw5Fq eql99eGINZiscoLqjyQIV9bUDatQ945LjdSXwehUPJjyiKfTWfxcI21APUU7Eb9PXIdU JSfBXWw6bXUdheBQNLDQlmPlNVyIgilQi6wiCVoex+KEtZe/rgCbOq1Fj6/6iVjZBpph Nao9ha6zijMGIOnv1ULdxQ4aEDIqhdk8MphLH4nYbeetbV775j3CI7NaYjLLXKIlyf4C GS4ECPRLF6eMfKUrbRLb3imh7wxZSrp6s1PD6pXpUdQKdzfw6/PmQDlzaKT7Gf+Yg3P5 VPRg==
X-Gm-Message-State: AKaTC032vX0lPnDeYHzLDUWiNMkZ5AoiG0C73RgmWqBJgjQerNOvUP7xDLaVrUFTIo6wwb2zF9ZFNNuD/BRfy0s+
X-Received: by 10.202.108.76 with SMTP id h73mr57410216oic.29.1481742825659; Wed, 14 Dec 2016 11:13:45 -0800 (PST)
MIME-Version: 1.0
Received: by 10.202.240.213 with HTTP; Wed, 14 Dec 2016 11:13:15 -0800 (PST)
In-Reply-To: <3C746F85-73B9-428F-AD18-82E6BB99BE52@cisco.com>
References: <A747A0713F56294D8FBE33E5C6B8F5815F545A98@DGGEMA502-MBS.china.huawei.com> <CADVnQynhEt-bKeFz3bN=MHd4rx8csg-no9ifTp__rbmT_JDggw@mail.gmail.com> <A747A0713F56294D8FBE33E5C6B8F5815F545E28@DGGEMA502-MBS.china.huawei.com> <CADVnQykfYUOyrSsQLrz2EFL0_q5whjwdD-cEehowj+nL4zUMtQ@mail.gmail.com> <F9D5899E-435E-417A-B5D0-561183AFFC92@cisco.com> <CADVnQy=2vuxwybswBgzEEjinp92DRKFZgphaVdBGvguJBirbJw@mail.gmail.com> <3C746F85-73B9-428F-AD18-82E6BB99BE52@cisco.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Wed, 14 Dec 2016 14:13:15 -0500
Message-ID: <CADVnQynkjjBXcQf5MNTFpXRrkw3E1v2y4i9PJoPKvGt8nvnvoA@mail.gmail.com>
To: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
Content-Type: multipart/alternative; boundary="001a1142d920c627cc0543a3220c"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/7BYgsEdiIDFZGDuSuifRpO3lCbA>
Cc: "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] 答复: A question about Delayed ACK and RTO
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Dec 2016 19:13:49 -0000
On Wed, Dec 14, 2016 at 12:21 PM, Jakob Heitz (jheitz) <jheitz@cisco.com> wrote: > The problem in a virtualized environment is that RTT is highly variable. > Mostly sub millisecond with occasional spikes at several 100 ms. And not > just because of delayed ack, but due to effects of VM scheduling and load. > True enough. But the RTO mechanism need not try to drive the spurious RTO rate down to zero. There is some trade-off between faster loss recovery and an increase in spurious retransmits, and some associated cost-benefit analysis. These days IMHO the cost of a spurious timer-based loss repair attempt is relatively low, now that we have TLP and a number of nice undo mechanisms (FRTO, Eifel, DSACKs). By contrast, in datacenter apps w/ sub-millisecond RTTs, there are prohibitive costs for delaying every timer-driven loss repair for 200ms. cheers, neal > > Thanks, > Jakob. > > > On Dec 14, 2016, at 9:06 AM, Neal Cardwell <ncardwell@google.com> wrote: > > On Wed, Dec 14, 2016 at 11:56 AM, Jakob Heitz (jheitz) <jheitz@cisco.com> > wrote: > >> Historically, the minimum RTO is 1 second and actual RTT is very rarely >> more than 1 second, so all this RTT calculation hardly ever matters anyway. >> > > That may be true historically, but for many years major TCP > implementations (including Linux and FreeBSD) have used a minimum RTO > closer to 200ms. And in datacenter environments even 200ms can be > infeasibly high. > > neal > > >> >> Thanks, >> Jakob. >> >> >> On Dec 14, 2016, at 5:24 AM, Neal Cardwell <ncardwell@google.com> wrote: >> >> On Wed, Dec 14, 2016 at 4:48 AM, zhangyali (D) <zhangyali369@huawei.com> >> wrote: >> >>> Hi Neal, >>> >>> >>> >>> Thanks for your providing request information. >>> >>> >>> >>> About the delay of delayed ACK, I found another clue in a SIGCOMM paper >>> in 1988. In the page 14, a sentence is “The 4.5KBps senders were talking to >>> 4.3BSD receivers which would delay an ack until 35% of the window was >>> filled or 200 ms had passed (i.e., an ack was delayed for 5-7 packets on >>> average).” There is no reference about the 200 ms, so I guess this is the >>> particular delay appears for the first time. >>> >>> I try to calculate the value based on delayed packet numbers, packet >>> size and sending rate. In this case, the delayed packet number is 7, the >>> packet size is 576Byte (refer to RFC879 in 1983), and sundering rate is >>> 4.5Bps. The value is 896 ms! Longer than 200 ms. >>> >>> >>> >>> >. However, it's quite easy for a bulk transfer to never have any >>> delayed ACKs for most of its lifetime, during which the RTO gradually >>> converges toward the raw RTT value. Then when there is suddenly a delayed >>> ACK, there can be a spurious RTO. >>> >>> I am not very sure if I see your point. I think the key point is the RTT >>> variation will be weakened when packets are sent in a burst (the >>> consequence of delayed ACK). And the spurious could only happen for the >>> last few packets whose number is smaller than delayed packets, right? >>> >> >> Yes, there should only be a delayed ACK at the end of an application >> chunk if there is an odd number of packets. But we would expect roughly >> half of application chunks to have an odd number of packets. Though the >> proportion is probably higher than that, since many application chunks are >> just one packet (e.g. an HTTP or RPC request or response). >> >> >> >>> About your proposal about negotiation of delayed ACK, a potential >>> problem is that RTO will be stretched than before because you add extra >>> delay. As you have said, most of the packets will not exceed RTO even host >>> enable delayed ACK for most large flows, but the retransmission will be >>> delayed (i.e., 5ms)also once one packet is lost. >>> >> >> The basic idea of the proposal is to tweak the RTO calculation and turn a >> previously existing, historically motivated 200ms fixed "slop factor" into >> a dynamically negotiated 5ms "slop factor". In our experience that is >> almost always a win. >> >> neal >> >> >>> >>> >>> Best, >>> >>> Yali >>> >>> *发件人**:* Neal Cardwell [mailto:ncardwell@google.com] >>> *发送时间:* 2016年12月13日 23:18 >>> *收件人:* zhangyali (D) <zhangyali369@huawei.com> >>> *抄送:* tcpm@ietf.org >>> *主题:* Re: [tcpm] A question about Delayed ACK and RTO >>> >>> >>> >>> On Tue, Dec 13, 2016 at 3:58 AM, zhangyali (D) <zhangyali369@huawei.com> >>> wrote: >>> >>> Hi all, >>> >>> >>> >>> Recent days, I am doing some simulation about TCP performance in NS3. I >>> found a phenomenon is a default setting of TCP’s delayed ACK is two packets >>> or 200ms. I am wondering if this setting is accord with some RFCs in IETF. >>> But After I referred to some RFCs, I just found some restrictions, such as, >>> the delay must be less than 0.5ms (RFC1122). >>> >>> >>> >>> I believe the 200ms figure is due to historical reasons. AFAIK it's due >>> to the BSD delayed ACK behavior (see Stevens "TCP/IP Illustrated Volume 2", >>> section 25.4 and figure 25.7, which describes the 200ms delayed ACK timer). >>> Then this value was inherited by other widely-deployed OSes. >>> >>> >>> >>> >>> >>> I think the delayed ACK has a close relationship with RTO. Take an >>> extreme scenario, if the delay is longer than RTO, many packets will be >>> retransmitted, which will waste many network resources. >>> >>> >>> >>> Yes, exactly. In theory, the RTO tries to be adaptive enough to measure >>> any RTT variations caused by delayed ACKs, and increase the RTO in response >>> to this. However, it's quite easy for a bulk transfer to never have any >>> delayed ACKs for most of its lifetime, during which the RTO gradually >>> converges toward the raw RTT value. Then when there is suddenly a delayed >>> ACK, there can be a spurious RTO. >>> >>> >>> >>> To avoid that, many TCP stacks (at least the major open source OSes) >>> have a hard-coded 200ms "slop factor" or "fudge factor", to try to never >>> let their estimate of RTT variation fall below that 200ms value, to avoid >>> this effect. >>> >>> >>> >>> So I want to know do we have some RFCs have given the exact value of >>> both? And if we permit any TCP stack to set them freely, what is the >>> mechanism to balance the mismatch between RTO and delayed ACK? >>> >>> >>> >>> I'm not aware of RFC specifications for exact values of both (delayed >>> ACK and RTO). However, the historical precedent is very strong, and the >>> 200ms delayed ACK value was very pronounced in Internet traces at least as >>> recently as 2011, when I last looked at the effect. (Probably others have >>> more recent data points for the prevalence of 200ms delayed ACKs.) >>> >>> >>> >>> At IETF 97 our team at Google presented some features we use for >>> internal TCP traffic at Google, where the endpoints can negotiate the >>> specific constant to use for the maximum delayed ACK from the receiver and >>> the corresponding minimum RTT delay variation for budgeting in the RTO at >>> the sender: >>> >>> >>> >>> https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tc >>> p-options-for-low-latency-00.pdf >>> >>> >>> >>> As this slide deck notes, within Google we negotiate 5ms for delayed >>> ACKs. >>> >>> >>> >>> cheers, >>> >>> neal >>> >>> >>> >> >> _______________________________________________ >> tcpm mailing list >> tcpm@ietf.org >> https://www.ietf.org/mailman/listinfo/tcpm >> >> >
- [tcpm] A question about Delayed ACK and RTO zhangyali (D)
- Re: [tcpm] A question about Delayed ACK and RTO Neal Cardwell
- Re: [tcpm] A question about Delayed ACK and RTO Mark Allman
- [tcpm] 答复: A question about Delayed ACK and RTO zhangyali (D)
- Re: [tcpm] 答复: A question about Delayed ACK and R… Neal Cardwell
- Re: [tcpm] 答复: A question about Delayed ACK and R… Jakob Heitz (jheitz)
- Re: [tcpm] 答复: A question about Delayed ACK and R… Neal Cardwell
- Re: [tcpm] 答复: A question about Delayed ACK and R… Jakob Heitz (jheitz)
- Re: [tcpm] 答复: A question about Delayed ACK and R… Neal Cardwell
- Re: [tcpm] 答复: A question about Delayed ACK and R… Joe Touch
- [tcpm] 答复: 答复: A question about Delayed ACK and R… zhangyali (D)
- [tcpm] 答复: 答复: A question about Delayed ACK and R… zhangyali (D)
- [tcpm] 答复: 答复: 答复: A question about Delayed ACK a… zhangyali (D)
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Sepherosa Ziehau
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Yoshifumi Nishida
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Gorry Fairhurst
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Alejandro Popovsky
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Mark Allman
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Joe Touch
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Jakob Heitz (jheitz)
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Alejandro Popovsky
- Re: [tcpm] ??: ??: ??: A question about Delayed A… hiren panchasara
- Re: [tcpm] ??: ??: ??: A question about Delayed A… Mark Allman
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Praveen Balasubramanian
- Re: [tcpm] 答复: 答复: A question about Delayed ACK a… Praveen Balasubramanian
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Joe Touch
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Joe Touch
- Re: [tcpm] 答复: 答复: A question about Delayed ACK a… Joe Touch
- Re: [tcpm] ??: ??: ??: A question about Delayed A… hiren panchasara
- [tcpm] 答复: 答复: 答复: A question about Delayed ACK a… zhangyali (D)
- [tcpm] 答复: 答复: 答复: 答复: A question about Delayed A… zhangyali (D)
- [tcpm] 答复: 答复: 答复: 答复: A question about Delayed A… zhangyali (D)
- Re: [tcpm] 答复: 答复: 答复: 答复: A question about Delay… Gorry Fairhurst
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… David Borman
- Re: [tcpm] 答复: 答复: 答复: A question about Delayed A… Joe Touch