Re: [tcpm] 答复: A question about Delayed ACK and RTO

On Wed, Dec 14, 2016 at 12:21 PM, Jakob Heitz (jheitz) <jheitz@cisco.com>
wrote:

> The problem in a virtualized environment is that RTT is highly variable.
> Mostly sub millisecond with occasional spikes at several 100 ms. And not
> just because of delayed ack, but due to effects of VM scheduling and load.
>

True enough. But the RTO mechanism need not try to drive the spurious RTO
rate down to zero. There is some trade-off between faster loss recovery and
an increase in spurious retransmits, and some associated cost-benefit
analysis. These days IMHO the cost of a spurious timer-based loss repair
attempt is relatively low, now that we have TLP and a number of nice undo
mechanisms (FRTO, Eifel, DSACKs). By contrast, in datacenter apps w/
sub-millisecond RTTs, there are prohibitive costs for delaying every
timer-driven loss repair for 200ms.

cheers,
neal

>
> Thanks,
> Jakob.
>
>
> On Dec 14, 2016, at 9:06 AM, Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Dec 14, 2016 at 11:56 AM, Jakob Heitz (jheitz) <jheitz@cisco.com>
> wrote:
>
>> Historically, the minimum RTO is 1 second and actual RTT is very rarely
>> more than 1 second, so all this RTT calculation hardly ever matters anyway.
>>
>
> That may be true historically, but for many years major TCP
> implementations (including Linux and FreeBSD) have used a minimum RTO
> closer to 200ms. And in datacenter environments even 200ms can be
> infeasibly high.
>
> neal
>
>
>>
>> Thanks,
>> Jakob.
>>
>>
>> On Dec 14, 2016, at 5:24 AM, Neal Cardwell <ncardwell@google.com> wrote:
>>
>> On Wed, Dec 14, 2016 at 4:48 AM, zhangyali (D) <zhangyali369@huawei.com>
>> wrote:
>>
>>> Hi Neal,
>>>
>>>
>>>
>>> Thanks for your providing request information.
>>>
>>>
>>>
>>> About the delay of delayed ACK, I found another clue in a SIGCOMM paper
>>> in 1988. In the page 14, a sentence is “The 4.5KBps senders were talking to
>>> 4.3BSD receivers which would delay an ack until 35% of the window was
>>> filled or 200 ms had passed (i.e., an ack was delayed for 5-7 packets on
>>> average).” There is no reference about the 200 ms, so I guess this is the
>>> particular delay appears for the first time.
>>>
>>> I try to calculate the value based on delayed packet numbers, packet
>>> size and sending rate. In this case, the delayed packet number is 7, the
>>> packet size is 576Byte (refer to RFC879 in 1983), and sundering rate is
>>> 4.5Bps. The value is 896 ms! Longer than 200 ms.
>>>
>>>
>>>
>>> >. However, it's quite easy for a bulk transfer to never have any
>>> delayed ACKs for most of its lifetime, during which the RTO gradually
>>> converges toward the raw RTT value. Then when there is suddenly a delayed
>>> ACK, there can be a spurious RTO.
>>>
>>> I am not very sure if I see your point. I think the key point is the RTT
>>> variation will be weakened when packets are sent in a burst (the
>>> consequence of delayed ACK).   And the spurious  could only happen for the
>>> last few packets whose number is smaller than delayed packets, right?
>>>
>>
>> Yes, there should only be a delayed ACK at the end of an application
>> chunk if there is an odd number of packets. But we would expect roughly
>> half of application chunks to have an odd number of packets. Though the
>> proportion is probably higher than that, since many application chunks are
>> just one packet (e.g. an HTTP or RPC request or response).
>>
>>
>>
>>> About your proposal about negotiation of delayed ACK, a potential
>>> problem is that RTO will be stretched than before because you add extra
>>> delay. As you have said, most of the packets will not exceed RTO even host
>>> enable delayed ACK for most large flows, but the retransmission will be
>>> delayed (i.e., 5ms)also once one packet is lost.
>>>
>>
>> The basic idea of the proposal is to tweak the RTO calculation and turn a
>> previously existing, historically motivated 200ms fixed "slop factor" into
>> a dynamically negotiated 5ms "slop factor". In our experience that is
>> almost always a win.
>>
>> neal
>>
>>
>>>
>>>
>>> Best,
>>>
>>> Yali
>>>
>>> *发件人**:* Neal Cardwell [mailto:ncardwell@google.com]
>>> *发送时间:* 2016年12月13日 23:18
>>> *收件人:* zhangyali (D) <zhangyali369@huawei.com>
>>> *抄送:* tcpm@ietf.org
>>> *主题:* Re: [tcpm] A question about Delayed ACK and RTO
>>>
>>>
>>>
>>> On Tue, Dec 13, 2016 at 3:58 AM, zhangyali (D) <zhangyali369@huawei.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>>
>>>
>>> Recent days, I am doing some simulation about TCP performance in NS3. I
>>> found a phenomenon is a default setting of TCP’s delayed ACK is two packets
>>> or 200ms. I am wondering if this setting is accord with  some RFCs in IETF.
>>> But After I referred to some RFCs, I just found some restrictions, such as,
>>> the delay must be less than 0.5ms (RFC1122).
>>>
>>>
>>>
>>> I believe the 200ms figure is due to historical reasons. AFAIK it's due
>>> to the BSD delayed ACK behavior (see Stevens "TCP/IP Illustrated Volume 2",
>>> section 25.4 and figure 25.7, which describes the 200ms delayed ACK timer).
>>> Then this value was inherited by other widely-deployed OSes.
>>>
>>>
>>>
>>>
>>>
>>> I think the delayed ACK has a close relationship with RTO. Take an
>>> extreme scenario, if the delay is longer than RTO, many packets will be
>>> retransmitted, which will waste many network resources.
>>>
>>>
>>>
>>> Yes, exactly. In theory, the RTO tries to be adaptive enough to measure
>>> any RTT variations caused by delayed ACKs, and increase the RTO in response
>>> to this. However, it's quite easy for a bulk transfer to never have any
>>> delayed ACKs for most of its lifetime, during which the RTO gradually
>>> converges toward the raw RTT value. Then when there is suddenly a delayed
>>> ACK, there can be a spurious RTO.
>>>
>>>
>>>
>>> To avoid that, many TCP stacks (at least the major open source OSes)
>>> have a hard-coded 200ms "slop factor" or "fudge factor", to try to never
>>> let their estimate of RTT variation fall below that 200ms value, to avoid
>>> this effect.
>>>
>>>
>>>
>>> So I want to know do we have some RFCs have given the exact value of
>>> both? And if we permit any TCP stack to set them freely, what is the
>>> mechanism to balance the mismatch between RTO and delayed ACK?
>>>
>>>
>>>
>>> I'm not aware of RFC specifications for exact values of both (delayed
>>> ACK and RTO). However, the historical precedent is very strong, and the
>>> 200ms delayed ACK value was very pronounced in Internet traces at least as
>>> recently as 2011, when I last looked at the effect. (Probably others have
>>> more recent data points for the prevalence of 200ms delayed ACKs.)
>>>
>>>
>>>
>>> At IETF 97 our team at Google presented some features we use for
>>> internal TCP traffic at Google, where the endpoints can negotiate the
>>> specific constant to use for the maximum delayed ACK from the receiver and
>>> the corresponding minimum RTT delay variation for budgeting in the RTO at
>>> the sender:
>>>
>>>
>>>
>>>   https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tc
>>> p-options-for-low-latency-00.pdf
>>>
>>>
>>>
>>> As this slide deck notes, within Google we negotiate 5ms for delayed
>>> ACKs.
>>>
>>>
>>>
>>> cheers,
>>>
>>> neal
>>>
>>>
>>>
>>
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>>
>>
>