Re: [tcpm] 答复: A question about Delayed ACK and RTO

Neal Cardwell <ncardwell@google.com> Wed, 14 December 2016 19:13 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EFB0B129EB1 for <tcpm@ietfa.amsl.com>; Wed, 14 Dec 2016 11:13:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.896
X-Spam-Level:
X-Spam-Status: No, score=-4.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-2.896, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QXTZ5M77mjXQ for <tcpm@ietfa.amsl.com>; Wed, 14 Dec 2016 11:13:46 -0800 (PST)
Received: from mail-oi0-x229.google.com (mail-oi0-x229.google.com [IPv6:2607:f8b0:4003:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 71EDC12711D for <tcpm@ietf.org>; Wed, 14 Dec 2016 11:13:46 -0800 (PST)
Received: by mail-oi0-x229.google.com with SMTP id y198so30274602oia.1 for <tcpm@ietf.org>; Wed, 14 Dec 2016 11:13:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=iXGUA+Xw11lK3FmwK2wIIc988zrQ4PMaUyXQGg6HPzk=; b=FUbYGpY/zzCylJNAqFQOOEsgqhnf3CW4qhSf88IUkaZgdq8ktQY0ZTUBHLvhE6lU9M y6PCQ9A4wirHnOtu/i+l7qlwtgQKv5xwUgpQ3jx/kHA7KmMMs/2Vg/llLEriqzIKSWQI B8wlLG5judiyu/A93/0/5FUcYaK4cfV0JAjOlBqWrECAWu8lbliUGDKqyDSepHMRyUs6 5V8Fy3WuG5hhgHwF7znDfX9E/dzl8V9+0W8HqIPRQNXuNHMnCJyfWuGhQbalUSht1p94 lJeNLlCQFwind4lh/fgi0NdGdhcfycg6Ij+kzxhFMq0839p0ozKFkViujpC8dbZCW7Ue S0TA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=iXGUA+Xw11lK3FmwK2wIIc988zrQ4PMaUyXQGg6HPzk=; b=trbErPXDbXypTqyzhcy6Z3rjEM1amFNWf1CMC/7WXJr0YxkG5EZxpOoA5hW6FCw5Fq eql99eGINZiscoLqjyQIV9bUDatQ945LjdSXwehUPJjyiKfTWfxcI21APUU7Eb9PXIdU JSfBXWw6bXUdheBQNLDQlmPlNVyIgilQi6wiCVoex+KEtZe/rgCbOq1Fj6/6iVjZBpph Nao9ha6zijMGIOnv1ULdxQ4aEDIqhdk8MphLH4nYbeetbV775j3CI7NaYjLLXKIlyf4C GS4ECPRLF6eMfKUrbRLb3imh7wxZSrp6s1PD6pXpUdQKdzfw6/PmQDlzaKT7Gf+Yg3P5 VPRg==
X-Gm-Message-State: AKaTC032vX0lPnDeYHzLDUWiNMkZ5AoiG0C73RgmWqBJgjQerNOvUP7xDLaVrUFTIo6wwb2zF9ZFNNuD/BRfy0s+
X-Received: by 10.202.108.76 with SMTP id h73mr57410216oic.29.1481742825659; Wed, 14 Dec 2016 11:13:45 -0800 (PST)
MIME-Version: 1.0
Received: by 10.202.240.213 with HTTP; Wed, 14 Dec 2016 11:13:15 -0800 (PST)
In-Reply-To: <3C746F85-73B9-428F-AD18-82E6BB99BE52@cisco.com>
References: <A747A0713F56294D8FBE33E5C6B8F5815F545A98@DGGEMA502-MBS.china.huawei.com> <CADVnQynhEt-bKeFz3bN=MHd4rx8csg-no9ifTp__rbmT_JDggw@mail.gmail.com> <A747A0713F56294D8FBE33E5C6B8F5815F545E28@DGGEMA502-MBS.china.huawei.com> <CADVnQykfYUOyrSsQLrz2EFL0_q5whjwdD-cEehowj+nL4zUMtQ@mail.gmail.com> <F9D5899E-435E-417A-B5D0-561183AFFC92@cisco.com> <CADVnQy=2vuxwybswBgzEEjinp92DRKFZgphaVdBGvguJBirbJw@mail.gmail.com> <3C746F85-73B9-428F-AD18-82E6BB99BE52@cisco.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Wed, 14 Dec 2016 14:13:15 -0500
Message-ID: <CADVnQynkjjBXcQf5MNTFpXRrkw3E1v2y4i9PJoPKvGt8nvnvoA@mail.gmail.com>
To: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
Content-Type: multipart/alternative; boundary="001a1142d920c627cc0543a3220c"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/7BYgsEdiIDFZGDuSuifRpO3lCbA>
Cc: "tcpm@ietf.org" <tcpm@ietf.org>
Subject: Re: [tcpm] 答复: A question about Delayed ACK and RTO
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Dec 2016 19:13:49 -0000

On Wed, Dec 14, 2016 at 12:21 PM, Jakob Heitz (jheitz) <jheitz@cisco.com>
wrote:

> The problem in a virtualized environment is that RTT is highly variable.
> Mostly sub millisecond with occasional spikes at several 100 ms. And not
> just because of delayed ack, but due to effects of VM scheduling and load.
>

True enough. But the RTO mechanism need not try to drive the spurious RTO
rate down to zero. There is some trade-off between faster loss recovery and
an increase in spurious retransmits, and some associated cost-benefit
analysis. These days IMHO the cost of a spurious timer-based loss repair
attempt is relatively low, now that we have TLP and a number of nice undo
mechanisms (FRTO, Eifel, DSACKs). By contrast, in datacenter apps w/
sub-millisecond RTTs, there are prohibitive costs for delaying every
timer-driven loss repair for 200ms.

cheers,
neal



>
> Thanks,
> Jakob.
>
>
> On Dec 14, 2016, at 9:06 AM, Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Dec 14, 2016 at 11:56 AM, Jakob Heitz (jheitz) <jheitz@cisco.com>
> wrote:
>
>> Historically, the minimum RTO is 1 second and actual RTT is very rarely
>> more than 1 second, so all this RTT calculation hardly ever matters anyway.
>>
>
> That may be true historically, but for many years major TCP
> implementations (including Linux and FreeBSD) have used a minimum RTO
> closer to 200ms. And in datacenter environments even 200ms can be
> infeasibly high.
>
> neal
>
>
>>
>> Thanks,
>> Jakob.
>>
>>
>> On Dec 14, 2016, at 5:24 AM, Neal Cardwell <ncardwell@google.com> wrote:
>>
>> On Wed, Dec 14, 2016 at 4:48 AM, zhangyali (D) <zhangyali369@huawei.com>
>> wrote:
>>
>>> Hi Neal,
>>>
>>>
>>>
>>> Thanks for your providing request information.
>>>
>>>
>>>
>>> About the delay of delayed ACK, I found another clue in a SIGCOMM paper
>>> in 1988. In the page 14, a sentence is “The 4.5KBps senders were talking to
>>> 4.3BSD receivers which would delay an ack until 35% of the window was
>>> filled or 200 ms had passed (i.e., an ack was delayed for 5-7 packets on
>>> average).” There is no reference about the 200 ms, so I guess this is the
>>> particular delay appears for the first time.
>>>
>>> I try to calculate the value based on delayed packet numbers, packet
>>> size and sending rate. In this case, the delayed packet number is 7, the
>>> packet size is 576Byte (refer to RFC879 in 1983), and sundering rate is
>>> 4.5Bps. The value is 896 ms! Longer than 200 ms.
>>>
>>>
>>>
>>> >. However, it's quite easy for a bulk transfer to never have any
>>> delayed ACKs for most of its lifetime, during which the RTO gradually
>>> converges toward the raw RTT value. Then when there is suddenly a delayed
>>> ACK, there can be a spurious RTO.
>>>
>>> I am not very sure if I see your point. I think the key point is the RTT
>>> variation will be weakened when packets are sent in a burst (the
>>> consequence of delayed ACK).   And the spurious  could only happen for the
>>> last few packets whose number is smaller than delayed packets, right?
>>>
>>
>> Yes, there should only be a delayed ACK at the end of an application
>> chunk if there is an odd number of packets. But we would expect roughly
>> half of application chunks to have an odd number of packets. Though the
>> proportion is probably higher than that, since many application chunks are
>> just one packet (e.g. an HTTP or RPC request or response).
>>
>>
>>
>>> About your proposal about negotiation of delayed ACK, a potential
>>> problem is that RTO will be stretched than before because you add extra
>>> delay. As you have said, most of the packets will not exceed RTO even host
>>> enable delayed ACK for most large flows, but the retransmission will be
>>> delayed (i.e., 5ms)also once one packet is lost.
>>>
>>
>> The basic idea of the proposal is to tweak the RTO calculation and turn a
>> previously existing, historically motivated 200ms fixed "slop factor" into
>> a dynamically negotiated 5ms "slop factor". In our experience that is
>> almost always a win.
>>
>> neal
>>
>>
>>>
>>>
>>> Best,
>>>
>>> Yali
>>>
>>> *发件人**:* Neal Cardwell [mailto:ncardwell@google.com]
>>> *发送时间:* 2016年12月13日 23:18
>>> *收件人:* zhangyali (D) <zhangyali369@huawei.com>
>>> *抄送:* tcpm@ietf.org
>>> *主题:* Re: [tcpm] A question about Delayed ACK and RTO
>>>
>>>
>>>
>>> On Tue, Dec 13, 2016 at 3:58 AM, zhangyali (D) <zhangyali369@huawei.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>>
>>>
>>> Recent days, I am doing some simulation about TCP performance in NS3. I
>>> found a phenomenon is a default setting of TCP’s delayed ACK is two packets
>>> or 200ms. I am wondering if this setting is accord with  some RFCs in IETF.
>>> But After I referred to some RFCs, I just found some restrictions, such as,
>>> the delay must be less than 0.5ms (RFC1122).
>>>
>>>
>>>
>>> I believe the 200ms figure is due to historical reasons. AFAIK it's due
>>> to the BSD delayed ACK behavior (see Stevens "TCP/IP Illustrated Volume 2",
>>> section 25.4 and figure 25.7, which describes the 200ms delayed ACK timer).
>>> Then this value was inherited by other widely-deployed OSes.
>>>
>>>
>>>
>>>
>>>
>>> I think the delayed ACK has a close relationship with RTO. Take an
>>> extreme scenario, if the delay is longer than RTO, many packets will be
>>> retransmitted, which will waste many network resources.
>>>
>>>
>>>
>>> Yes, exactly. In theory, the RTO tries to be adaptive enough to measure
>>> any RTT variations caused by delayed ACKs, and increase the RTO in response
>>> to this. However, it's quite easy for a bulk transfer to never have any
>>> delayed ACKs for most of its lifetime, during which the RTO gradually
>>> converges toward the raw RTT value. Then when there is suddenly a delayed
>>> ACK, there can be a spurious RTO.
>>>
>>>
>>>
>>> To avoid that, many TCP stacks (at least the major open source OSes)
>>> have a hard-coded 200ms "slop factor" or "fudge factor", to try to never
>>> let their estimate of RTT variation fall below that 200ms value, to avoid
>>> this effect.
>>>
>>>
>>>
>>> So I want to know do we have some RFCs have given the exact value of
>>> both? And if we permit any TCP stack to set them freely, what is the
>>> mechanism to balance the mismatch between RTO and delayed ACK?
>>>
>>>
>>>
>>> I'm not aware of RFC specifications for exact values of both (delayed
>>> ACK and RTO). However, the historical precedent is very strong, and the
>>> 200ms delayed ACK value was very pronounced in Internet traces at least as
>>> recently as 2011, when I last looked at the effect. (Probably others have
>>> more recent data points for the prevalence of 200ms delayed ACKs.)
>>>
>>>
>>>
>>> At IETF 97 our team at Google presented some features we use for
>>> internal TCP traffic at Google, where the endpoints can negotiate the
>>> specific constant to use for the maximum delayed ACK from the receiver and
>>> the corresponding minimum RTT delay variation for budgeting in the RTO at
>>> the sender:
>>>
>>>
>>>
>>>   https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tc
>>> p-options-for-low-latency-00.pdf
>>>
>>>
>>>
>>> As this slide deck notes, within Google we negotiate 5ms for delayed
>>> ACKs.
>>>
>>>
>>>
>>> cheers,
>>>
>>> neal
>>>
>>>
>>>
>>
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>>
>>
>