[tcpm] Review of draft-wang-tcpm-low-latency-opt-00
Bob Briscoe <ietf@bobbriscoe.net> Wed, 02 August 2017 15:54 UTC
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 66A7B131BBF for <tcpm@ietfa.amsl.com>; Wed, 2 Aug 2017 08:54:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZS9gTAG5twLe for <tcpm@ietfa.amsl.com>; Wed, 2 Aug 2017 08:54:14 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7B14C131935 for <tcpm@ietf.org>; Wed, 2 Aug 2017 08:54:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:References:Cc:To:Subject:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=VL7gsOyeLSFMZ2UJaz45XCzw+1StF8vaPqWyauBaHAQ=; b=J8rq4Icp332Ih65mNq2PUPGdq oPZdK/OZKNUw4YfMFoNxAhzHItwl9PBN8mBKSg8FtZ9o+Puq2gcnHi7rvEtfs9HCqnTI/lpcV4yXD 73YokmN/+iRK4i5yXlTw8tYva2aAUepzWpbY2nLt4sq21h6XqLCboeTLm8B6iXWjo46DYfQAlgYe6 3aF4eeJK60D72d7MiS26NfjZt2NbYcsSdgwC7WK3pkmWDea53UmESi6yCqMrkM1ZMSATJ34OFTSdR Si51z1MjfX7YWkoe4R+ZbtNgdHjcwQ9MC70lWpjPf4oi5MSGfoLMZ7c8pqOFr7o2U0dJ7XXqeDZq/ ULT6qLQsw==;
Received: from 52.139.199.146.dyn.plus.net ([146.199.139.52]:33722 helo=[192.168.1.2]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89) (envelope-from <ietf@bobbriscoe.net>) id 1dcvyO-0005ot-9f; Wed, 02 Aug 2017 16:54:12 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Eric Dumazet <edumazet@google.com>, Yuchung Cheng <ycheng@google.com>, Wei Wang <weiwan@google.com>, Neal Cardwell <ncardwell@google.com>
Cc: tcpm IETF list <tcpm@ietf.org>
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net>
Message-ID: <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net>
Date: Wed, 02 Aug 2017 16:54:11 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net>
Content-Type: multipart/alternative; boundary="------------3828722705E8406C00D6F410"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/pPg8FK3a3kEeBcp3_YAmJAWQpN8>
Subject: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Aug 2017 15:54:17 -0000
Wei, Yuchung, Neal and Eric, as authors of draft-wang-tcpm-low-latency-opt-00, I promised a review. It questions the technical logic behind the draft, so I haven't bothered to give a detailed review of the wording of the draft, because that might be irrelevant if you agree with my arguments. *1/ MAD by configuration?** * o If the user does not specify a MAD value, then the implementation SHOULD NOT specify a MAD value in the Low Latency option. That sentence triggered my "anti-human-intervention" reflex. My train of thought went as follows: * Let's consider what advice we would give on what MAD value ought to be configured. * You say that MAD can be smaller in DCs. So I assume your advice would be that MAD should depend on RTT {Note 1} and clock granularity {Note 2}. * So why configure one value of MAD for all RTTs? That only makes sense in DC environments where the range of RTTs is small. * However, for the range of RTTs on the public Internet, why not calculate MAD from RTT and granularity, then standardize the calculation so that both ends arrive at the same result when starting from the same RTT and granularity parameters? (The sender and receiver might measure different smoothed (SRTT) values, but they will converge as the flow progresses.) Then the receiver only needs to communicate its clock granularity to the sender, and the fact that it is driving MAD off its SRTT. Then the sender can use a formula for RTO derived from the value of MAD that it calculates the receiver will be using. Then its RTO will be completely tailored to the RTT of the flow. Note: There are two different uses for the min RTO that need to be separated: a) Before an initial RTT value has been measured, to determine the RTO during the 3WHS. b) Once either end has measured the RTT for a connection. (a) needs to cope with the whole range of possible RTTs, whereas (b) is the subject of this email, because it can be tailored for the measured RTT. *2/ The problem, and its prevalence** * With gradual removal of bufferbloat and more prevalent usage of CDNs, typical base RTTs on the public Internet now make the value of minRTO and of MAD look silly. As can be seen above, the problem is indeed that each end only has partial knowledge of the config of the other end. However, the problem is not just that MAD needs to be communicated to the other end so it can be hard-coded to a lower value. The problem is that MAD is hard-coded in the first place. The draft needs to say how prevalent the problem is (on the public Internet) where the sender has to wait for the receiver's delayed ACK timer at the end of a flow or between the end of a volley of packets and the start of the next. The draft also needs to say what tradeoff is considered acceptable between a residual level of spurious retransmissions and lower timeout delay. Eliminating all spurious retransmissions is not the goal. The draft also needs to say that introducing a new TCP Option is itself a problem (on the public Internet), because of middleboxes particularly proxies. Therefore a solution that does not need a new TCP Option would be preferable.... Perhaps the solution for communicating timestamp resolution in draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites draft-trammell-tcpm-timestamp-interval-01) could be modified to also communicate: * TCP's clock granularity (closely related to TCP timestamp resolution), * and the fact that the host is calculating MAD as a function of RTT and granularity. Then the existing timestamp option could be repurposed, which should drastically reduce deployment problems. *3/ Only DC?** * All the related work references are solely in the context of a DC. Pls include refs about this problem in a public Internet context. You will find there is a pretty good search engine at www.google.com. The only non-DC ref I can find about minRTO is [Psaras07], which is mainly about a proposal to apply minRTO if the sender expects the next ACK to be delayed. Nonetheless, the simulation experiment in Section 5.1 provides good evidence for how RTO latency is dependent on uncertainty about the MAD that the other end is using. [Psaras07] Psaras, I. & Tsaoussidis, V., "The TCP Minimum RTO Revisited," In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet NETWORKING'07 pp.981-991 Springer-Verlag (2007) https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited *4/ Status** * Normally, I wouldn't want to hold up a draft that has been proven over years of practice, such as the technique in low-latency-opt, which has been proven in Google's DCs over the last few years. Whereas, my ideas are just that: ideas, not proven. However, the technique in low-latency-opt has only been proven in DC environments where the range of RTTs is limited. So, now that you are proposing to transplant it onto the public Internet, it also only has the status of an unproven idea. To be clear, as it stands, I do not think low-latency-opt is applicable to the public Internet. *5/ Nits** *These nits depart from my promise not comment on details that could become irrelevant if you agree with my idea. Hey, whatever,... S.3.5: RTO <- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay) My immediate reaction to this was that G should not appear twice. However, perhaps you meant them to be G_s and G_r (sender and receiver) respectively. {Note 2} S.3.5 & S.5. It seems unnecessary to prohibit values of MAD greater than the default (given some companies are already investing in commercial public space flight programmes, so TCP could need to routinely support RTTs that are longer than typical not just shorter). Cheers Bob * **{Note 1}*: On average, if not app-limited, the time between ACKs will be d_r*R_r/W_s where: R is SRTT d is the delayed ACK factor, e.g. d=2 for ACKing every other packet W is the window in units of segments subscripts X_r or X_s denote receiver or sender for the half-connection. So as long as the receiver can estimate the varying value of W at the sender, the receiver's MAD could be MAD_r = max(k*d_r*R_r / W_s, G_r), The factor k (lower case) allows for some bunching of packets e.g. due to link layer aggregation or the residual effects of slow-start, which leaves some bunching even if SS uses pacing. Let's say k=2, but it would need to be checked empirically. For example, take R=100us, d=2, W=8 and G = 1us. Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might need to be greater, but there would certainly be no need for MAD to be 5ms, which is perhaps 100 times greater than necessary. * **{Note 2}*: Why is there no field in the Low Latency option to communicate receiver clock granularity to the sender? Bob -- ________________________________________________________________ Bob Briscoehttp://bobbriscoe.net/
- [tcpm] New individual (int-area) draft minimizing… Bob Briscoe
- [tcpm] Review of draft-wang-tcpm-low-latency-opt-… Bob Briscoe
- Re: [tcpm] Review of draft-wang-tcpm-low-latency-… Wei Wang
- Re: [tcpm] Review of draft-wang-tcpm-low-latency-… Neal Cardwell
- Re: [tcpm] Review of draft-wang-tcpm-low-latency-… Bob Briscoe
- Re: [tcpm] Review of draft-wang-tcpm-low-latency-… Bob Briscoe
- Re: [tcpm] Review of draft-wang-tcpm-low-latency-… Jeremy Harris
- Re: [tcpm] Review of draft-wang-tcpm-low-latency-… Jeremy Harris