Re: [tcpm] 答复: 答复: 答复: A question about Delayed ACK and RTO

Tcp would benefit much from and 'end of app write call' signal or may be 
'end of sender buffer data' signal.

TLP, adaptive RTO at the end of bursts, end of burst reordering, and may 
be some other mechanisms
were introduced to help with this issue.

May be the addition of an option for this signal may help, or may be the 
reinterpretation of the push flag.
This would need the modification of both senders and receivers, so the 
effect of not having a modified
receiver getting the flag should be contemplated.

Alejandro.

On 15/12/2016 7:20 a. m., Gorry Fairhurst wrote:
>
> Just adding a few more comments here, because there has been much talk 
> about how desirable it is for the RTO to converge to the RTT, and I 
> think we need to be clear.
>
> The TCP RTO has two distinct functions:
>
> (1)  the RTO is of course the method of last resort for the recovery 
> of a lost segment (the final segment of a burst, following persistent 
> loss, etc).
>
> (2) Second the RTO is  method to detect a path failure - where 
> something has changed significantly. As such, triggering a 
> conservative (1 or 3 seconds) RTO implies that state about the path 
> needs to be reset (RTT, perhaps invoking PMTUD, congestion state 
> variables, etc, possibly in the future falling back from using ECN, etc)
>
> For the first function, there are many ways to detect lost segments 
> (Dupacks, probe packets, recovery timers, etc). All of these act 
> faster than a conservatively set timeout. They also do not need to 
> erase collected path state, they can simply re-transmit the lost 
> segments.
>
> An RTO close to the RTT will be disastrous for some paths. This is one 
> of the reasons why people have deployed PEPs to support radio 
> technology - we should not be encouraging this. An RTO set close to 
> the RTT can produce complex interactions when the path characteristics 
> change (which does happen with propagation impairments and radio 
> resource management, on wireless, microwave satellite and other paths, 
> and also with middleboxes that attempt to control capacity/volume 
> usage). This type of path is not going away, talk of wireless links 
> with top speeds of Gbps for 5G will likely be accompanied by very 
> large variation in path characteristics - at the other extreme people 
> in large parts of the world are still with kbps technology. If the 
> timer only recovers packets, it would simply result in retransmission, 
> if it triggers other probing that would be unfortunate. I think the 
> IETF needs to ensure our specs work for all people.
>
> What I am saying is that the second function does not need a timeout 
> period near the RTT. Robust protocols have been designed to have a 
> conservative RTO timer that is set to much more than a RTT, designed 
> to detect an unresponsive endpoint or path problem. The IETF has 
> argued - and to my knowledge still argues that Min_RTO should be one 
> second, which I think reflects this position. If the loss recovery is 
> efficient, this will trigger only infrequently (e.g., persistent loss 
> of one packet). I think this is the correct way to design TCP. It 
> motivates that we should actually strive for a one second (or so) RTO.
>
> Gorry
>
>
> On 15/12/2016 08:24, Yoshifumi Nishida wrote:
>> A question would be when TCP sent odd number of packets, how it can
>> know whether another packets will come from upper layer very soon or
>> not. Also, I'm not very sure yet if the probability is so obvious. I
>> can agree that RTO can get close to raw rtt value under bulk transfer.
>> But, I am guessing not so many apps change the behavior from bulk
>> transfer to spontaneous small burst transfer.
>> -- 
>> Yoshi
>>
>>
>> On Wed, Dec 14, 2016 at 7:08 PM, zhangyali (D) 
>> <zhangyali369@huawei.com> wrote:
>>> Please allow me to add one more point. If the probability of odd 
>>> number of
>>> packets occurring  spurious RTO is so outstanding, why nobody try to 
>>> solve
>>> this problem?
>>>
>>>
>>>
>>> 发件人: tcpm [mailto:tcpm-bounces@ietf.org] 代表 zhangyali (D)
>>> 发送时间: 2016年12月15日 9:45
>>> 收件人: Neal Cardwell <ncardwell@google.com>
>>> 抄送: tcpm@ietf.org
>>> 主题: [tcpm] 答复: 答复: A question about Delayed ACK and RTO
>>>
>>>
>>>
>>>> Yes, there should only be a delayed ACK at the end of an 
>>>> application chunk
>>>> if there is an odd number of packets. But we would expect roughly 
>>>> half of
>>>> application chunks to have an odd number of packets. Though the 
>>>> proportion
>>>> is probably higher than that, since many application chunks are 
>>>> just one
>>>> packet (e.g. an HTTP or RPC request or response).
>>>
>>>
>>>
>>> I agree with you that odd number of packets may occur spurious RTO more
>>> easily, but I think for the flows owning just one packet should not be
>>> affected by delayed ACK. AFAIK, slow-start stage will begin with one 
>>> packet,
>>> and sender will send two packets after it receives an ACK. If the first
>>> packet is obstructed by the delayed ACK, the ‘clock algorithm’ will be
>>> broken down. So receiver should judge if this packet is the first 
>>> one in the
>>> slow-start stage, if yes, send the ack immediately once receiving the
>>> packet.
>>>
>>>
>>>
>>> Yali
>>>
>>>
>>>
>>> 发件人: Neal Cardwell [mailto:ncardwell@google.com]
>>> 发送时间: 2016年12月14日 21:24
>>> 收件人: zhangyali (D) <zhangyali369@huawei.com>
>>> 抄送: tcpm@ietf.org
>>> 主题: Re: 答复: [tcpm] A question about Delayed ACK and RTO
>>>
>>>
>>>
>>> On Wed, Dec 14, 2016 at 4:48 AM, zhangyali (D) 
>>> <zhangyali369@huawei.com>
>>> wrote:
>>>
>>> Hi Neal,
>>>
>>>
>>>
>>> Thanks for your providing request information.
>>>
>>>
>>>
>>> About the delay of delayed ACK, I found another clue in a SIGCOMM 
>>> paper in
>>> 1988. In the page 14, a sentence is “The 4.5KBps senders were 
>>> talking to
>>> 4.3BSD receivers which would delay an ack until 35% of the window 
>>> was filled
>>> or 200 ms had passed (i.e., an ack was delayed for 5-7 packets on 
>>> average).”
>>> There is no reference about the 200 ms, so I guess this is the 
>>> particular
>>> delay appears for the first time.
>>>
>>> I try to calculate the value based on delayed packet numbers, packet 
>>> size
>>> and sending rate. In this case, the delayed packet number is 7, the 
>>> packet
>>> size is 576Byte (refer to RFC879 in 1983), and sundering rate is 
>>> 4.5Bps. The
>>> value is 896 ms! Longer than 200 ms.
>>>
>>>
>>>
>>>> . However, it's quite easy for a bulk transfer to never have any 
>>>> delayed
>>>> ACKs for most of its lifetime, during which the RTO gradually 
>>>> converges
>>>> toward the raw RTT value. Then when there is suddenly a delayed 
>>>> ACK, there
>>>> can be a spurious RTO.
>>>
>>> I am not very sure if I see your point. I think the key point is the 
>>> RTT
>>> variation will be weakened when packets are sent in a burst (the 
>>> consequence
>>> of delayed ACK).   And the spurious  could only happen for the last few
>>> packets whose number is smaller than delayed packets, right?
>>>
>>>
>>>
>>> Yes, there should only be a delayed ACK at the end of an application 
>>> chunk
>>> if there is an odd number of packets. But we would expect roughly 
>>> half of
>>> application chunks to have an odd number of packets. Though the 
>>> proportion
>>> is probably higher than that, since many application chunks are just 
>>> one
>>> packet (e.g. an HTTP or RPC request or response).
>>>
>>>
>>>
>>>
>>>
>>> About your proposal about negotiation of delayed ACK, a potential 
>>> problem is
>>> that RTO will be stretched than before because you add extra delay. 
>>> As you
>>> have said, most of the packets will not exceed RTO even host enable 
>>> delayed
>>> ACK for most large flows, but the retransmission will be delayed (i.e.,
>>> 5ms)also once one packet is lost.
>>>
>>>
>>>
>>> The basic idea of the proposal is to tweak the RTO calculation and 
>>> turn a
>>> previously existing, historically motivated 200ms fixed "slop 
>>> factor" into a
>>> dynamically negotiated 5ms "slop factor". In our experience that is 
>>> almost
>>> always a win.
>>>
>>>
>>>
>>> neal
>>>
>>>
>>>
>>>
>>>
>>> Best,
>>>
>>> Yali
>>>
>>> 发件人: Neal Cardwell [mailto:ncardwell@google.com]
>>> 发送时间: 2016年12月13日 23:18
>>> 收件人: zhangyali (D) <zhangyali369@huawei.com>
>>> 抄送: tcpm@ietf.org
>>> 主题: Re: [tcpm] A question about Delayed ACK and RTO
>>>
>>>
>>>
>>> On Tue, Dec 13, 2016 at 3:58 AM, zhangyali (D) 
>>> <zhangyali369@huawei.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>>
>>>
>>> Recent days, I am doing some simulation about TCP performance in NS3. I
>>> found a phenomenon is a default setting of TCP’s delayed ACK is two 
>>> packets
>>> or 200ms. I am wondering if this setting is accord with  some RFCs 
>>> in IETF.
>>> But After I referred to some RFCs, I just found some restrictions, 
>>> such as,
>>> the delay must be less than 0.5ms (RFC1122).
>>>
>>>
>>>
>>> I believe the 200ms figure is due to historical reasons. AFAIK it's 
>>> due to
>>> the BSD delayed ACK behavior (see Stevens "TCP/IP Illustrated Volume 
>>> 2",
>>> section 25.4 and figure 25.7, which describes the 200ms delayed ACK 
>>> timer).
>>> Then this value was inherited by other widely-deployed OSes.
>>>
>>>
>>>
>>>
>>>
>>> I think the delayed ACK has a close relationship with RTO. Take an 
>>> extreme
>>> scenario, if the delay is longer than RTO, many packets will be
>>> retransmitted, which will waste many network resources.
>>>
>>>
>>>
>>> Yes, exactly. In theory, the RTO tries to be adaptive enough to 
>>> measure any
>>> RTT variations caused by delayed ACKs, and increase the RTO in 
>>> response to
>>> this. However, it's quite easy for a bulk transfer to never have any 
>>> delayed
>>> ACKs for most of its lifetime, during which the RTO gradually converges
>>> toward the raw RTT value. Then when there is suddenly a delayed ACK, 
>>> there
>>> can be a spurious RTO.
>>>
>>>
>>>
>>> To avoid that, many TCP stacks (at least the major open source OSes) 
>>> have a
>>> hard-coded 200ms "slop factor" or "fudge factor", to try to never 
>>> let their
>>> estimate of RTT variation fall below that 200ms value, to avoid this 
>>> effect.
>>>
>>>
>>>
>>> So I want to know do we have some RFCs have given the exact value of 
>>> both?
>>> And if we permit any TCP stack to set them freely, what is the 
>>> mechanism to
>>> balance the mismatch between RTO and delayed ACK?
>>>
>>>
>>>
>>> I'm not aware of RFC specifications for exact values of both 
>>> (delayed ACK
>>> and RTO). However, the historical precedent is very strong, and the 
>>> 200ms
>>> delayed ACK value was very pronounced in Internet traces at least as
>>> recently as 2011, when I last looked at the effect. (Probably others 
>>> have
>>> more recent data points for the prevalence of 200ms delayed ACKs.)
>>>
>>>
>>>
>>> At IETF 97 our team at Google presented some features we use for 
>>> internal
>>> TCP traffic at Google, where the endpoints can negotiate the specific
>>> constant to use for the maximum delayed ACK from the receiver and the
>>> corresponding minimum RTT delay variation for budgeting in the RTO 
>>> at the
>>> sender:
>>>
>>>
>>>
>>>
>>> https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf 
>>>
>>>
>>>
>>>
>>> As this slide deck notes, within Google we negotiate 5ms for delayed 
>>> ACKs.
>>>
>>>
>>>
>>> cheers,
>>>
>>> neal
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>>>
>>
>> _______________________________________________
>> tcpm mailing list
>> tcpm@ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>>
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm