[tcpm] Review of draft-wang-tcpm-low-latency-opt-00

Bob Briscoe <ietf@bobbriscoe.net> Wed, 02 August 2017 15:54 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 66A7B131BBF for <tcpm@ietfa.amsl.com>; Wed, 2 Aug 2017 08:54:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZS9gTAG5twLe for <tcpm@ietfa.amsl.com>; Wed, 2 Aug 2017 08:54:14 -0700 (PDT)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7B14C131935 for <tcpm@ietf.org>; Wed, 2 Aug 2017 08:54:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:References:Cc:To:Subject:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=VL7gsOyeLSFMZ2UJaz45XCzw+1StF8vaPqWyauBaHAQ=; b=J8rq4Icp332Ih65mNq2PUPGdq oPZdK/OZKNUw4YfMFoNxAhzHItwl9PBN8mBKSg8FtZ9o+Puq2gcnHi7rvEtfs9HCqnTI/lpcV4yXD 73YokmN/+iRK4i5yXlTw8tYva2aAUepzWpbY2nLt4sq21h6XqLCboeTLm8B6iXWjo46DYfQAlgYe6 3aF4eeJK60D72d7MiS26NfjZt2NbYcsSdgwC7WK3pkmWDea53UmESi6yCqMrkM1ZMSATJ34OFTSdR Si51z1MjfX7YWkoe4R+ZbtNgdHjcwQ9MC70lWpjPf4oi5MSGfoLMZ7c8pqOFr7o2U0dJ7XXqeDZq/ ULT6qLQsw==;
Received: from 52.139.199.146.dyn.plus.net ([146.199.139.52]:33722 helo=[192.168.1.2]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89) (envelope-from <ietf@bobbriscoe.net>) id 1dcvyO-0005ot-9f; Wed, 02 Aug 2017 16:54:12 +0100
From: Bob Briscoe <ietf@bobbriscoe.net>
To: Eric Dumazet <edumazet@google.com>, Yuchung Cheng <ycheng@google.com>, Wei Wang <weiwan@google.com>, Neal Cardwell <ncardwell@google.com>
Cc: tcpm IETF list <tcpm@ietf.org>
References: <8abadc4d-4165-a5bc-23bb-e4f9258c695b@bobbriscoe.net> <CAK6E8=c4D0QTzMobMQXLZMU5JiBRXXPdYJ0KTqvg08t+G0VDxQ@mail.gmail.com> <CANn89iL+TC6sh=e+keb4Psxz+E6oHV3Mcvsay6UYL2qEKUT6bw@mail.gmail.com> <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net>
Message-ID: <d2570431-8c01-d7fc-5aa3-581d69836923@bobbriscoe.net>
Date: Wed, 02 Aug 2017 16:54:11 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <2131135f-b123-70f0-d464-dac6640d6cd2@bobbriscoe.net>
Content-Type: multipart/alternative; boundary="------------3828722705E8406C00D6F410"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/pPg8FK3a3kEeBcp3_YAmJAWQpN8>
Subject: [tcpm] Review of draft-wang-tcpm-low-latency-opt-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Aug 2017 15:54:17 -0000

Wei, Yuchung, Neal and Eric, as authors of 
draft-wang-tcpm-low-latency-opt-00,

I promised a review. It questions the technical logic behind the draft, 
so I haven't bothered to give a detailed review of the wording of the 
draft, because that might be irrelevant if you agree with my arguments.

*1/ MAD by configuration?**
*

    o  If the user does not specify a MAD value, then the implementation
       SHOULD NOT specify a MAD value in the Low Latency option.

That sentence triggered my "anti-human-intervention" reflex. My train of 
thought went as follows:

* Let's consider what advice we would give on what MAD value ought to be 
configured.
* You say that MAD can be smaller in DCs. So I assume your advice would 
be that MAD should depend on RTT {Note 1} and clock granularity {Note 2}.
* So why configure one value of MAD for all RTTs? That only makes sense 
in DC environments where the range of RTTs is small.
* However, for the range of RTTs on the public Internet, why not 
calculate MAD from RTT and granularity, then standardize the calculation 
so that both ends arrive at the same result when starting from the same 
RTT and granularity parameters? (The sender and receiver might measure 
different smoothed (SRTT) values, but they will converge as the flow 
progresses.)

Then the receiver only needs to communicate its clock granularity to the 
sender, and the fact that it is driving MAD off its SRTT. Then the 
sender can use a formula for RTO derived from the value of MAD that it 
calculates the receiver will be using. Then its RTO will be completely 
tailored to the RTT of the flow.

Note: There are two different uses for the min RTO that need to be 
separated:
     a) Before an initial RTT value has been measured, to determine the 
RTO during the 3WHS.
     b) Once either end has measured the RTT for a connection.
(a) needs to cope with the whole range of possible RTTs, whereas (b) is 
the subject of this email, because it can be tailored for the measured RTT.

*2/ The problem, and its prevalence**
*
With gradual removal of bufferbloat and more prevalent usage of CDNs, 
typical base RTTs on the public Internet now make the value of minRTO 
and of MAD look silly.

As can be seen above, the problem is indeed that each end only has 
partial knowledge of the config of the other end.
However, the problem is not just that MAD needs to be communicated to 
the other end so it can be hard-coded to a lower value.
The problem is that MAD is hard-coded in the first place.

The draft needs to say how prevalent the problem is (on the public 
Internet) where the sender has to wait for the receiver's delayed ACK 
timer at the end of a flow or between the end of a volley of packets and 
the start of the next.

The draft also needs to say what tradeoff is considered acceptable 
between a residual level of spurious retransmissions and lower timeout 
delay. Eliminating all spurious retransmissions is not the goal.

The draft also needs to say that introducing a new TCP Option is itself 
a problem (on the public Internet), because of middleboxes particularly 
proxies. Therefore a solution that does not need a new TCP Option would 
be preferable....

Perhaps the solution for communicating timestamp resolution in 
draft-scheffenegger-tcpm-timestamp-negotiation-05 (which cites 
draft-trammell-tcpm-timestamp-interval-01) could be modified to also 
communicate:
* TCP's clock granularity (closely related to TCP timestamp resolution),
*  and the fact that the host is calculating MAD as a function of RTT 
and granularity.
Then the existing timestamp option could be repurposed, which should 
drastically reduce deployment problems.

*3/ Only DC?**
*
All the related work references are solely in the context of a DC. Pls 
include refs about this problem in a public Internet context. You will 
find there is a pretty good search engine at www.google.com.

The only non-DC ref I can find about minRTO is [Psaras07], which is 
mainly about a proposal to apply minRTO if the sender expects the next 
ACK to be delayed. Nonetheless, the simulation experiment in Section 5.1 
provides good evidence for how RTO latency is dependent on uncertainty 
about the MAD that the other end is using.

[Psaras07] Psaras, I. & Tsaoussidis, V., "The TCP Minimum RTO 
Revisited," In: Proc. 6th Int'l IFIP-TC6 Conference on Ad Hoc and Sensor 
Networks, Wireless Networks, Next Generation Internet NETWORKING'07 
pp.981-991 Springer-Verlag (2007)
https://www.researchgate.net/publication/225442912_The_TCP_Minimum_RTO_Revisited

*4/ Status**
*
Normally, I wouldn't want to hold up a draft that has been proven over 
years of practice, such as the technique in low-latency-opt, which has 
been proven in Google's DCs over the last few years. Whereas, my ideas 
are just that: ideas, not proven. However, the technique in 
low-latency-opt has only been proven in DC environments where the range 
of RTTs is limited. So, now that you are proposing to transplant it onto 
the public Internet, it also only has the status of an unproven idea.

To be clear, as it stands, I do not think low-latency-opt is applicable 
to the public Internet.


*5/ Nits**
*These nits depart from my promise not comment on details that could 
become irrelevant if you agree with my idea. Hey, whatever,...

S.3.5:

	RTO <- SRTT + max(G, K*RTTVAR) + max(G, max_ACK_delay)

My immediate reaction to this was that G should not appear twice. 
However, perhaps you meant them to be G_s and G_r (sender and receiver) 
respectively. {Note 2}

S.3.5 & S.5. It seems unnecessary to prohibit values of MAD greater than 
the default (given some companies are already investing in commercial 
public space flight programmes, so TCP could need to routinely support 
RTTs that are longer than typical not just shorter).


Cheers



Bob

*
**{Note 1}*: On average, if not app-limited, the time between ACKs will 
be d_r*R_r/W_s where:
    R is SRTT
    d is the delayed ACK factor, e.g. d=2 for ACKing every other packet
    W is the window in units of segments
    subscripts X_r or X_s denote receiver or sender for the half-connection.

So as long as the receiver can estimate the varying value of W at the 
sender, the receiver's MAD could be
     MAD_r = max(k*d_r*R_r / W_s, G_r),
The factor k (lower case) allows for some bunching of packets e.g. due 
to link layer aggregation or the residual effects of slow-start, which 
leaves some bunching even if SS uses pacing. Let's say k=2, but it would 
need to be checked empirically.

For example, take R=100us, d=2, W=8 and G = 1us.
Given d*R/W = 25us, MAD could be perhaps 50us (i.e. k=2). k might need 
to be greater, but there would certainly be no need for MAD to be 5ms, 
which is perhaps 100 times greater than necessary.
*
**{Note 2}*: Why is there no field in the Low Latency option to 
communicate receiver clock granularity to the sender?


Bob

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/