[tsvwg] Questions on draft-han-tsvwg-cc-00

"Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com> Thu, 15 March 2018 23:17 UTC

From: "Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com>
To: "draft-han-tsvwg-cc@ietf.org" <draft-han-tsvwg-cc@ietf.org>
CC: "tcpm@ietf.org" <tcpm@ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>, "iccrg@irtf.org" <iccrg@irtf.org>
Thread-Topic: Questions on draft-han-tsvwg-cc-00
Thread-Index: AdO8pkvfd4lsbPnoT7CbLbNzdShcRw==
Date: Thu, 15 Mar 2018 23:17:06 +0000
Message-ID: <AM5PR0701MB2547AA3C16E849FDFED8857093D00@AM5PR0701MB2547.eurprd07.prod.outlook.com>
Accept-Language: en-US, de-DE
Content-Language: en-US
received-spf: None (protection.outlook.com: nokia.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_AM5PR0701MB2547AA3C16E849FDFED8857093D00AM5PR0701MB2547_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 5bd84f43-6ab8-47bf-39ed-08d58acadec4
X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Mar 2018 23:17:06.7321 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0701MB1841
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/BMp-FrTppoqPkAms_n9qEE3NTnQ>
Subject: [tsvwg] Questions on draft-han-tsvwg-cc-00
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Mar 2018 23:17:15 -0000

Authors, all,

I have read draft-han-tsvwg-cc-00. Below I have listed a number of questions, which I believe would have to be addressed when discussing such a mechanism in the IETF or IRTF.

This e-mail is strictly limited to the content of draft-han-tsvwg-cc-00. As the draft does not specify how the CIR and PIR will actually be guaranteed in the Internet, as well as how OAM signaling will work at Internet scale, I will not comment here on these assumptions, except regarding requirements that strictly follow from the content of the I-D. The technical, economical, and regulation aspects of the assumptions are not in scope of TCPM and they need to be discussed and solved elsewhere.

Questions on draft-han-tsvwg-cc-00:

1/ The document seems to implicitly assume that network resources are reserved for *every* single TCP connection, right?

* If that assumption is correct, it has to be spelt out explicitly in the text and it has to be noted that the underlying technology has to provide these capabilities *for every single* TCP connection.
* Otherwise sentences like "after a TCP session is successfully initiated its congestion window (cwnd) jumps to CIR" would not make sense as multiple TCP connections within an traffic aggregate policied by CIR/PIR could start to all send with CIR in parallel, which would trigger massive congestion.
* As an example, in my reading draft-han-6man-in-band-signaling-for-transport-qos-00 would allow also reservations e.g. for aggregates of multiple TCP connections. Such an operation mode seems not be compatible with the suggested mechanism in this I-D, as far as I understand. So the requirements have to be made explicit.
* Also, sentences such as "it is assumed that in bandwidth guaranteed networks there have been network resources (bandwidths, queues etc.) dedicated to the TCP flows" have to be corrected to specify that for the mechanism in this draft to work correctly, the resources have to be guaranteed to every single TCP connection, not multiple "flows".

2/ Why does the document not rely on ECN (and not even reference ECN)?

* For instance, the following requirement "It is important that OAM needs to be able to detect if any device's buffer depth has exceeded the pre-configured threshold, as this is an indication of potential congestion and packet drop" could possibly be solved by ECN, no?
* Even in case another OAM mechanism could be used in addition, a comprehensive TCP congestion control specification would have to also cover the reaction to ECN marks as well, as well as the potential combination of feedback results. Why is this missing?
* Or would the document mandates that ECN MUST NOT be enabled for TCP connections using this congestion control mechanism?

3/ Why does the document assume that congestion windows are calculated in segments and not in bytes?

* RFC 5681 as well as many other RFCs calculate CWND in bytes.
* However, I believe equations such "MinBandwidthWND = CIR * RTT/MSS" or "MaxBandwidthWND = PIR * RTT/MSS" would return a window counted in MSS segments.
* Apart from the mismatch with the TCP standards, this sort of equation might also requires a discussion on how to deal with integer division.

4/ How does the mechanism deal with IP and TCP header overhead?

* TCP window sizes are about the TCP bytestream, while the actual IP packets sent by a TCP/IP stack will include an IP and TCP header. If one neglects the IP and TCP headers in the congestion window calculation, the resulting IP packet rate will be larger than the CIR and PIR seen in the TCP layer. This could result in packet drops if CIR and PIR are enforced e.g. for IP packet length.
* How will this problem be solved? Note that TCP (and also IP) can include header options, which results in variable header sizes. The number of TCP options can be different for each TCP segment. How does this congestion control mechanism correctly handle the headers and the options in IP and TCP headers?

3/ How does the document deal with RTT variations? Is the assumption that the RTT is constant?

* As far as I can tell from experiments, the RTT estimation is important when applying a rate to window-based congestion control, which is what this document does.
* Equations such "MinBandwidthWND = CIR * RTT/MSS" or "MaxBandwidthWND = PIR * RTT/MSS" only provide a window equivalent to the bandwidth-delay product of the path if the RTT sample is a correct prediction of the actual delay that the segments in flight will experience. How does the mechanism suggested in this document correctly predict the future RTT of the segments that are sent by the sender at a given point in time?
* As an example, assume that the RTT at time t=10s is determined as 80ms. Assume PIR = 10 Mbps and neglect the questions 3/ and 4/. Then this document would probably assume that MaxBandwidthWND=100 kB is the bandwidth delay product of the path during t=10s and t=10.08s, i.e., the maximum amount of outstanding data that can be sent in that time without drops (or exceeding PIR). But assume that the actual round-trip delay of segments has dropped to 40ms after the last RTT management, which means that the maximum bandwidth delay product of the path at t=10s+epsilon is only 50 kB. As a result, 50 Kb out of the congestion window would likely be dropped during t=10s and t=10.08s. And, due to the wrong RTT value, the effective data rate of the sender could even be 20 Mbps, if the RTT mismatch is not detected immediately, or, e.g., if EWMA will delay the update of the weighted RTT parameter to the actual value.
* So how does the proposed scheme to indeed determine a window that meets the statement "This means a TCP sender is never allowed to send data at a rate larger than PIR" if the RTT is not constant? Does this assume rate pacing in the TCP sender for each TCP connection?

4/ How is it ensured that OAM alarms will reach the TCP sender in time in all possible "random failure" cases?

* As far as I understand, the following statement "When a sender receives the third duplicated ACK, but no previous OAM congestion alarm has been received, then it is considered that a segment is lost due to random failure not congestion. In this case the cwnd is not changed." mandates that an OAM alarm is received prior to the third duplicate ACK *in all potential cases* of congestion. If the OAM alarm got lost or delayed, this condition would imply that cwnd is not changed despite a segment has been dropped due to congestion, which would be a violation of fundamental Internet congestion control principles.
* Please expand on how this document ensures that cwnd will be reduces in all potential cases when a packet gets dropped due to congestion, and what requirements on the OAM alarms propagation follow from that. Of specific interest are effects such as packet drops of packets relevant for the OAM information, reordering of packets, asymmetric routing in forward and reverse direction, use of multiple paths in parallel (ECMP), and the like. If the document makes assumption about the path such as in-order packet delivery or the like, these assumptions need to be spelt out explicitly.
* I understand that the OAM could be solved in different ways and the solution is independent of this document. But this document has to comprehensively specify all technical requirements that the OAM mechanism has to meet in order to ensure that every single packet drop due to congestion always results in a cwnd reduction. Otherwise the algorithm has to change as it does not safely prevent congestion.

5/ What is the expected performance benefit? Are there situations in which performance will be worse than standard TCP congestion control?

* The document does not contain any data of potential improvements or deteriorations as compared to the TCP standard congestion control. I assume that such data will be presented to explain why this mechanism is proposed, and what benefits it has.
* As I have experimented with similar mechanisms quite a bit, I believe there will be cases in which this congestion control mechanism will perform worse than a TCP sender fully compliant to RFC 5681, when using the same network path with CIR and PIR guarantees. I believe this document should analyze these cases and reason why a worse performance than standard TCP congestion control will be acceptable. IMHO this issue will specifically apply in cases when PIR is significantly larger than CIR, and if the RTT is large. As far as I can see, this draft mandates to start the data transfer in congestion avoidance at CIR rate, which means that it can take many RTTs until the sender reaches the PIR. In contrast, RFC 5681 will run slow-start, and RFC 5681 states that the "initial value of ssthresh SHOULD be set arbitrarily high". This means that the TCP sender can reach PIR within few RTTs and thus can send with full PIR speed, while a sender using draft-han-tsvwg-cc-00 will send with a much lower speed CIR+epsilon. For short and medium-sized data transfers, IMHO it can happen that the congestion control according to RFC 5681 will significantly outperform the mechanism suggested in this document, i.e., it will complete data transfers orders of magnitude faster even without any knowledge about CIR and PIR. Have the authors compared this mechanism to the performance of RFC 5681?
* Also, have the authors compared the performance of this mechanism as compared to a modern TCP stack, which often use RFC 6928 (IW 10) and RFC 8312 (CUBIC)? In what cases has the suggested congestion control a better performance? I ask this because I have performed experiments 10 years ago with congestion control schemes that have some similarity to what is suggested here, and they also used knowledge about the path properties. In those experiments, it turned out to be quite difficult to design an algorithm that uses knowledge about the path (such as CIR/PIR) and that outperforms CUBIC in combination IW 10, even if such a stack is totally unaware of the path. This has been discussed e.g. in ICCRG https://www.ietf.org/proceedings/73/slides/iccrg-2.pdf. As context, "more-start" in this document is somewhat similar to what is proposed in this I-D (but applied to CUBIC), while the "initial-start" graphs somehow corresponds to what was later specified in RFC 6928 (IW 10) and RFC 8312 (CUBIC).

6/ For what application traffic patterns is this mechanism proposed?

* The document states in Section 3 "... with the development of new applications, such as AR/VR". What properties do such applications have to leverage the mechanism suggested in this document? Is it possible to characterize what the "new" requirements are, and how the suggested algorithm meets these requirements?
* Is it suggested to apply this congestion control to real-time media traffic over TCP? If so, what would be the benefit of using TCP in general and of the specific mechanism compared to congestion control algorithm for such traffic (e.g., the RMCAT working group)?

This list of questions is not comprehensive, but I'll stop here.

Regarding potential next steps of this document in the TCPM working group, I believe that the applicable TCPM charter wording is: "In addition, TCPM may document alternative TCP congestion control algorithms that are known to be widely deployed, and that are considered safe for large-scale deployment in the Internet." Until these prerequisites are fulfilled in the Internet, in my view the document cannot be adopted in TCPM. Research could be performed in the IRTF, e.g., in ICCRG.

Thanks

Michael (with no hat on)

[tsvwg] Questions on draft-han-tsvwg-cc-00 Scharf, Michael (Nokia - DE/Stuttgart)
Re: [tsvwg] Questions on draft-han-tsvwg-cc-00 Yingzhen Qu
Re: [tsvwg] Questions on draft-han-tsvwg-cc-00 Scharf, Michael (Nokia - DE/Stuttgart)