Re: [tcpm] WGLC for draft-ietf-tcpm-rack-08 - 3 issues

Gorry Fairhurst <gorry@erg.abdn.ac.uk> Mon, 06 April 2020 15:46 UTC

Return-Path: <gorry@erg.abdn.ac.uk>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6AB443A0ADA for <tcpm@ietfa.amsl.com>; Mon, 6 Apr 2020 08:46:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wyI-zI_RsgU5 for <tcpm@ietfa.amsl.com>; Mon, 6 Apr 2020 08:46:15 -0700 (PDT)
Received: from pegasus.erg.abdn.ac.uk (pegasus.erg.abdn.ac.uk [IPv6:2001:630:42:150::2]) by ietfa.amsl.com (Postfix) with ESMTP id 18BF13A0A92 for <tcpm@ietf.org>; Mon, 6 Apr 2020 08:46:14 -0700 (PDT)
Received: from GF-MacBook-Pro.local (fgrpf.plus.com [212.159.18.54]) by pegasus.erg.abdn.ac.uk (Postfix) with ESMTPSA id 2C2881B001FE; Mon, 6 Apr 2020 16:46:10 +0100 (BST)
To: Michael Tuexen <tuexen@fh-muenster.de>, tcpm IETF list <tcpm@ietf.org>
References: <3D4D034B-7A72-4313-8FB6-CB689A167E91@fh-muenster.de> <BF4360DD-04B2-4D4C-9E55-314E9793D447@ericsson.com>
From: Gorry Fairhurst <gorry@erg.abdn.ac.uk>
Cc: David Black <David.Black@dell.com>
Message-ID: <70b26bc9-bd29-0712-e189-14a9f556902d@erg.abdn.ac.uk>
Date: Mon, 06 Apr 2020 16:46:09 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.6.0
MIME-Version: 1.0
In-Reply-To: <BF4360DD-04B2-4D4C-9E55-314E9793D447@ericsson.com>
Content-Type: multipart/alternative; boundary="------------AB6CD13AB3E9449DD1F2B917"
Content-Language: en-GB
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/NbzDnLn_WXSNkzPsAMrZd4ljyxE>
Subject: Re: [tcpm] WGLC for draft-ietf-tcpm-rack-08 - 3 issues
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Apr 2020 15:46:19 -0000

I reviewed the latest version of RACK as a part of the TCPM WGLC and 
this email contains 3 points I think are more important for the WG to 
consider.

(1) Why re-specify the DupACK Thresold?

I have a question why does RACK need to say this, effectively 
over-riding any future change to the DupACK threshold:
“then RACK SHOULD honor the classic 3-DUPACK rule for initiating fast 
recovery.”.
I suggest:
“then RACK SHOULD honor the
        classic DUPACK rule for fast recovery as specified in [RFC5681].”
and later:
“and use of the 3-DUPACK rule” rather than “DUPACK Rule [RFC5681]”.
- If that spec changes, I think RACK should also. I do not see the 
reason to restrict this to the constant three for the purpose of RACK.

—
(2) Can the document be clearer about how RTTs are used and what happens 
on paths where the RTT varies significantly?

“The RACK reordering window MUST be bounded and this bound SHOULD
        be one round trip.”
- Which round trip estimate? Is the smoothed RTT, the RTO? the last 
measured RTT? I think this needs to be explained, because it also 
appears later

“  Since an ACK can also acknowledge retransmitted data packets, and
    retransmissions can be spurious, the sender must take care to avoid
    spurious inferences.  For example, if the sender were to use timing
    information from a spurious retransmission, the RACK.rtt could be
    vastly underestimated.”
- ‘must take care’? is that “needs” or if not, what does “take care” 
really mean?
- Why ‘vastly’ - I would have thought that was possible,  but doesn’t 
the problem emerge when any underestimate is present. Please clarify by 
removing “vastly” or explaining this.
—
    “Use the RTT measurements obtained via [RFC6298] or [RFC7323] to
    update the estimated minimum RTT in RACK.min_RTT.”
- I don’t see either of these specifications define a minimum RTT. I 
understand that QUIC doesn’t have a way to maintain an accurate minimum 
RTT,  how does a sender determine the minimum RTT?
- Considering robustness to path changes, how does the minimum RTT take 
into account the variation of the path RTT?
- How does this relate to RTO Consider 
(https://tools.ietf.org/html/draft-ietf-tcpm-rto-consider-10), which 
appears to place strong demands on timers tracking variance and path 
changes?
—
** “In such cases RACK may not detect losses from
    ACK events and the recovery would then resort to the (slower) TLP or
    RTO timer-based recovery.  However, such events should be rare and
    the connection would pick up the new minimum RTT when the recovery
    ends, so the sender can avoid repeated similar failures.”
- I’d strongly suggest changing that “may” to “could” to be really clear 
this is not permitting something.
- I’m also not sure how a sender would know that “such events should be 
rare” - isn’t that dependent on path characteristics?
—
“"To be more robust to reordering, RACK uses a more conservative RTT 
value to decide if an unacknowledged packet should be considered lost, 
RACK.rtt"
-
- Why is this not normative? I think this is either a SHOULD or a MUST?

There's suggestions that much of this may be covered in various places, 
and that could just need editorial work? As I noted before in the 
working group, I do like the idea of RACK, but I do have concerns about 
what happens on paths where there is a large variation in the RTT. With 
a DUPACK-based recovery the variability extends the recovery time, and 
RTO inflates - both are safe. I do not have any feeling that a sender 
would not be overly aggressive when used over a widely varying path RTT.

This seems like it is not safe unless you include a variance term in the 
RTT, or as an enabling function to using the method.

I really do think this has to be explained and addressed.

—
(3)

There are places where the text explcitly motivates a change to the 
network forwarding behaviour, saying almost that this is an allowed 
outcome of deploying RACK for TCP.

Section 4 in particular is a concern:

" The reordering behavior of networks can evolve (over years) in

    response to the behavior of transport protocols and applications, as
    well as the needs of network designers and operators.  From a network
    or link designer's viewpoint, parallelization (eg. link bonding) is
    the easiest way to get a network to go faster.  Therefore their main
    constraint on speed is reordering, and there is pressure to relax
    that constraint.  If RACK becomes widely deployed, the underlying
    networks may introduce more reordering for higher throughput. "
and later:
"  This handles reordering caused by path divergence in small time
    scales (reordering within the round-trip time of the shortest path),
    which should tolerate much of the reordering from link bonding,
    multipath routing, or link-layer out-of-order delivery."

As explained in TSVWG (within the context of L4S), my own opinion is 
that it would be good to have greater tolerance to reordering and that 
the presence of paths with significant reordering is a motivation for 
that work.

Beyond that, I would urge much caution: Writing anything that could 
suggest RACK enables greater reordering has implications on a wide range 
of IETF protocols and would represent a very significant change in 
advice for the network. I would even suggest that such advice is wrong, 
until methods such as RACK have been deployed in transports as a whole, 
not just TCP. I also think this sort of advice and reasoning does not 
belong within the specification of a protocol mechanism.

Gorry