[tcpm] draft-ietf-tcpm-rtorestart-04

Hi,

I've read -04 and I have a couple of questions and suggestions:

* Section 4: The "total number of outstanding and previously unsent segments" in sum of two variables, right? If so, why not use the term "sum"?

* Section 4: I also wonder if the text should define "previously unsent segments". The term "previously unsent data" is very well-known in the RFC series. A quick search revealed that "previously unsent segments" is usually only used to refer to situations when segments with this property have indeed been transmitted. Is my understanding correct that draft-ietf-tcpm-rtorestart-04 uses this term differently? I think it here refers to data that will be sent *in future*. In that case, I wonder if it is really clear how the sender can determine the number of "previously unsent segments". For instance, that number may depend on the segmentation strategy of the sender, in particular for applications that use many small write() calls. In this case a TCP stack may e.g. decide to send partial segments. I am not sure if the actual number of segments created from the "unsent data" is really known in advance always. Or do I miss something?

* Section 5.1: "Given RTOR's ability to only work when it is beneficial for the loss recovery process, it is suitable as a system-wide default mechanism for TCP traffic." I think other text in the draft asks for further experimentation regarding the actual trade-off between benefit and risk. Given that, I think another wording should be used (e.g., "Given that RTOR is a mostly conservative algorithm", ...). Given that this is not a PS document, I'd also prefer a more careful phrasing regarding the recommendation as system-wide default (e.g., "it is suitable for experimentation as system-wide default").

* Section 5.2: As mentioned in the last face-to-face meeting, spurious timeout negatively affect the network by needlessly sending data. This negative effect is not mentioned in this section. Why?

* Section 5.2: Mobile networks have a variable RTT, including e.g. longer delays during handovers (either without or with transient loss during the handover). A lot of past work on spurious timeout recovery has been motivated by these delay spikes. Reducing the RTO almost certainly has an impact in such environments; I think performance can both increase or decrease. If no data on that is available, mobile networks seem to me an area where more experimentation would be particularly useful, right?

* Section 5.2: Is there any data on the risk of spurious timeouts in networks where RTT and the minimum RTO are of the same order of magnitude? For instance, in satellite networks the RTT can be huge.

* Section 5.2: "However, with respect to RTOR spurious timeouts are only a problem for applications transmitting multiple bursts of data within a single flow." I think this section should be reworded and it should more explicitly discuss the impact of RTOR on HTTP/1.1 and HTTP/2.0 traffic, including e.g. adaptive video streaming. They all transmit "multiple bursts of data within a single flow". To me, the current sentence could imply that RTOR would be a "problem" for a vast majority of Internet traffic. A more explicit discussion on how RTOR affects HTTP/1.1 and HTTP/2.0 would IMHO make a lot of sense.

* Section 5.3: Given that not all stacks are segment-based, this section seems really relevant. However, the description how to determine outstanding segments is vague ("it is possible to exactly determine..."). I think this could be reworded to provide more guidance to implementers. The same applies to the calculation of "previously unsent segments"; for instance, the text does not explain what happens if the "number of bytes in the send queue" is not a multiple of the SMSS (see also above regarding the definition of this variable).

* Section 6: "In contrast, RTOR is trying to make the RTO more appropriate in cases where there is no need to be overly cautious." I think this sentence should use a more neutral language. I think whether an RTO value is "appropriate" is impossible to know (in advance), and whether a more aggressive timeout is "cautious" or not may depend on the network under consideration. For instance, an alternative would be to compare the complexity of RTOR and TLP.

* Section 9: I wonder if an attacker could try to send certain ACK patterns to increase the risk of timeouts e.g. at a Web Server, leveraging the smaller RTO value. If successful, the RTOs could significantly reduce application performance. Probably, such an attacker could cause much more harm by other means, i.e., this may not be a significant security risk. But the current security analysis is very short.

Thanks

Michael

> -----Original Message-----
> From: tcpm [mailto:tcpm-bounces@ietf.org] On Behalf Of Per Hurtig
> Sent: Tuesday, October 28, 2014 12:18 AM
> To: tcpm@ietf.org
> Subject: Re: [tcpm] I-D Action: draft-ietf-tcpm-rtorestart-04.txt
> 
> Hi all,
> 
> we have now updated the draft to address the outstanding comments. The
> main changes from last version are:
> 
>    o  Changed the algorithm to allow RTOR when there is unsent data
> available, but the cwnd does not allow transmission.
>    o  Changed the algorithm to not trigger if "RTO - T_earliest" <= 0,
> to avoid that ACKs to previous retransmissions trigger premature
> timeouts.
>    o  Made minor adjustments throughout the document to adjust for the
> algorithmic changes.
>    o  Improved the wording throughout the document.
> 
> 
> for experimental results and a Linux implementation, please visit:
> http://riteproject.eu/resources/rto-restart/
> 
> 
> 
> Cheers,
> Per
> 
> On 2014-10-28 00:11, internet-drafts@ietf.org wrote:
> >
> > A New Internet-Draft is available from the on-line Internet-Drafts
> directories.
> >   This draft is a work item of the TCP Maintenance and Minor
> Extensions Working Group of the IETF.
> >
> >          Title           : TCP and SCTP RTO Restart
> >          Authors         : Per Hurtig
> >                            Anna Brunstrom
> >                            Andreas Petlund
> >                            Michael Welzl
> > 	Filename        : draft-ietf-tcpm-rtorestart-04.txt
> > 	Pages           : 13
> > 	Date            : 2014-10-27
> >
> > Abstract:
> >     This document describes a modified algorithm for managing the TCP
> and
> >     SCTP retransmission timers that provides faster loss recovery
> when
> >     there is a small amount of outstanding data for a connection.
> The
> >     modification, RTO Restart (RTOR), allows the transport to restart
> its
> >     retransmission timer more aggressively in situations where fast
> >     retransmit cannot be used.  This enables faster loss detection
> and
> >     recovery for connections that are short-lived or application-
> limited.
> >
> >
> > The IETF datatracker status page for this draft is:
> > https://datatracker.ietf.org/doc/draft-ietf-tcpm-rtorestart/
> >
> > There's also a htmlized version available at:
> > http://tools.ietf.org/html/draft-ietf-tcpm-rtorestart-04
> >
> > A diff from the previous version is available at:
> > http://www.ietf.org/rfcdiff?url2=draft-ietf-tcpm-rtorestart-04
> >
> >
> > Please note that it may take a couple of minutes from the time of
> submission
> > until the htmlized version and diff are available at tools.ietf.org.
> >
> > Internet-Drafts are also available by anonymous FTP at:
> > ftp://ftp.ietf.org/internet-drafts/
> >
> > _______________________________________________
> > tcpm mailing list
> > tcpm@ietf.org
> > https://www.ietf.org/mailman/listinfo/tcpm
> >
> 
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm