Re: [tcpm] [tsvwg] draft-han-tsvwg-cc

Toerless Eckert <tte@cs.fau.de> Fri, 16 March 2018 04:02 UTC

Date: Fri, 16 Mar 2018 05:02:09 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: "Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com>
Cc: Thomas Nadeau <tnadeau@lucidvision.com>, "tcpm@ietf.org" <tcpm@ietf.org>, Yingzhen Qu <yingzhen.qu@huawei.com>
Message-ID: <20180316040208.GA9492@faui40p.informatik.uni-erlangen.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/APtM624q8-MloZ0iPd1b4YRiZnU>
Subject: Re: [tcpm] [tsvwg] draft-han-tsvwg-cc
Precedence: list

On Thu, Mar 15, 2018 at 11:42:08PM +0000, Scharf, Michael (Nokia - DE/Stuttgart) wrote:
> Authors, all,
>
> I have read draft-han-tsvwg-cc-00. Below I have listed a number of questions, which I believe would have to be addressed when discussing such a mechanism in the IETF or IRTF.

Thanks, much appreciated. Not a listed author, but let me give you
some of my thoughts.

I started trying to answer your questions individually, but it
became IMHO too unstructured, long. So let me try instead some
context setting and please tell me if you agree/disagree with
the individual points. This is btw. just my personal opinions,
the authors may disagree.

Fundamentally, to comply to existing IETF QoS architecture,
this method should be used in traffic classes with DS code
points intended to be used with admission control/bandwidth
management. Such as AF. or EF, or (forgot the others, but
we'll look them up). Definitel not BE.

Note: All the standards RFCs about those traffic classes do not
mention whether they're intended for the Internet or not.
Effectively they are almost never deployed on an Internet service,
but on what we would call "controlled networks". But they don't
say this.

Note 2: All those traffic class RFCs also are quite unspecific
to the details of those bandwidth management/traffic control
mechanisms. YOu asked for more details about e.g.: our
in-band-signaling. On one hand i'd love to discuss it as
much as possible, but just for the purpose of this CC draft,
i think we should really only discuss any details of
those bandwidth manaement/admission control that would
impact the CC approach. See below for one example i have (ECN parameter).

For example, the 10 year old draft-lochin-ietf-tsvwg-gtfrc
was spec'ed with the example of AF and got as far as i understood
it fairly positive WG feedback (just died because of authors
changing jobs/priorities) (Given how it was TFRC based it was
not primarily targeting TCP).

I would very much like to avoid changing the DSCP/PHP behavior
of any classes, so this can not go into BE as a target
solution. I would like to have a sentence like
"any use in BE or other non-admission controled code points
MUST stay within controlled networks"

The reason to even mention this is just how SPs abuse DSCPs:
inelastic video (non-congestion controlled) from
service provider to their subs is often just using
DSCP 0 for lazyness (overprovisioning). I mostly
know this for multicast, but also VoD unicast (UDP).
Even more so for voice (all inelastic codecs). So,
if this approach is useful to those type of constrained
networks... oh well...

If i where to come up with a more complete TCP stack and API
description, i would probably default these type of flows to
send with an AF class, and if the SP resets these to 0 instead
of filtering them... see below.

Having said this, the whole discussion about CC details
is of course still core and all your questions are
valid:

I think we definitely should define a circuit breaker
based on TCP behavior, based on the principles of RFC8084.

Something like:
- No TCP replies for more than N secs, break circuit
- Only loss indication for more than M secs, fall back
to CIR=0 behavior or break circuit (app based).

Maybe also based on other TCP derived error recognition.
Suggestions welcome. RFC8084 is so new, i have not seem good
practices that could be applied to TCP. And i guess unlikely
others have thought about this, given how there was no
minimum CIR inelastic behavior for TCP..

The circuit breaker MUST IMHO be so strict it would
pass Internet/BE muster - because of the unintended
case where an SP resets the DSCP to 0 (instead of filtering
them) and these flows do unintentionally get into the
Internet without any real admission control. Don't think
we have this case as a mandatory requirement (against screwed up SPs)
in TSV review, but it would be really nice to meet.

ECN was IMHO just an oversight of the -00. ECN MUST be
used unchanged from existing recommendations when sending
with more than CIR.

When sending with less than CIR, it should be subject
to a (binary) parameter whether or not to use ECN
as an explicit congestion OAM indication resulting
in a faster circuit breaker/revert to CIR=0, or whether
to ignore it completely (and use the above loss/other
TCP parameter based circuit breaker).

Parameter really depends on how DSCP is set up:

When you end up in a single queue / same drop
priority with flows sending more than CIR,
then you will see congestion even though you're
sending below your admission approved CIR, so
you need to ignore ECN. This is not an ideal case,
but would happen if you only use external,
eg: SDN/PCEP admission control schemes without e.g.:
our in-band signaling (mileage with RSVP would vary
based on how it controls the forwarding plane).

I would definitely default ECN=on to ensure some
integration with external bandwidth management explicitly
turns it off when its known to break the scheme.

To re-summarize:

- Only use on bandwidth management compatible traffic classes
- Constrain to controlled network otherwise
- no redefinition of intended use of traffic classes
(i think, need to revisit some AF details)
- circuit breaker making it safe against
use across ERRONEOUS use across non-controlled/Internete/BE.
- ECN > CIR, ECN as parameter < CIR

Of course, lots of still unanswered points from your side,
but hope this is a good start.

Cheers
toerless

P.S: You said no hat on, but i still see the pointy TCP wizard hat.

Authors, all,

I have read draft-han-tsvwg-cc-00. Below I have listed a number of questions, which I believe would have to be addressed when discussing such a mechanism in the IETF or IRTF.

This e-mail is strictly limited to the content of draft-han-tsvwg-cc-00. As the draft does not specify how the CIR and PIR will actually be guaranteed in the Internet, as well as how OAM signaling will work at Internet scale, I will not comment here on these assumptions, except regarding requirements that strictly follow from the content of the I-D. The technical, economical, and regulation aspects of the assumptions are not in scope of TCPM and they need to be discussed and solved elsewhere.

Questions on draft-han-tsvwg-cc-00:

1/ The document seems to implicitly assume that network resources are reserved for *every* single TCP connection, right?

* If that assumption is correct, it has to be spelt out explicitly in the text and it has to be noted that the underlying technology has to provide these capabilities *for every single* TCP connection.
* Otherwise sentences like "after a TCP session is successfully initiated its congestion window (cwnd) jumps to CIR" would not make sense as multiple TCP connections within an traffic aggregate policied by CIR/PIR could start to all send with CIR in parallel, which would trigger massive congestion.
* As an example, in my reading draft-han-6man-in-band-signaling-for-transport-qos-00 would allow also reservations e.g. for aggregates of multiple TCP connections. Such an operation mode seems not be compatible with the suggested mechanism in this I-D, as far as I understand. So the requirements have to be made explicit.
* Also, sentences such as "it is assumed that in bandwidth guaranteed networks there have been network resources (bandwidths, queues etc.) dedicated to the TCP flows" have to be corrected to specify that for the mechanism in this draft to work correctly, the resources have to be guaranteed to every single TCP connection, not multiple "flows".

2/ Why does the document not rely on ECN (and not even reference ECN)?

* For instance, the following requirement "It is important that OAM needs to be able to detect if any device's buffer depth has exceeded the pre-configured threshold, as this is an indication of potential congestion and packet drop" could possibly be solved by ECN, no?
* Even in case another OAM mechanism could be used in addition, a comprehensive TCP congestion control specification would have to also cover the reaction to ECN marks as well, as well as the potential combination of feedback results. Why is this missing?
* Or would the document mandates that ECN MUST NOT be enabled for TCP connections using this congestion control mechanism?

3/ Why does the document assume that congestion windows are calculated in segments and not in bytes?

* RFC 5681 as well as many other RFCs calculate CWND in bytes.
* However, I believe equations such "MinBandwidthWND = CIR * RTT/MSS" or "MaxBandwidthWND = PIR * RTT/MSS" would return a window counted in MSS segments.
* Apart from the mismatch with the TCP standards, this sort of equation might also requires a discussion on how to deal with integer division.

4/ How does the mechanism deal with IP and TCP header overhead?

* TCP window sizes are about the TCP bytestream, while the actual IP packets sent by a TCP/IP stack will include an IP and TCP header. If one neglects the IP and TCP headers in the congestion window calculation, the resulting IP packet rate will be larger than the CIR and PIR seen in the TCP layer. This could result in packet drops if CIR and PIR are enforced e.g. for IP packet length.
* How will this problem be solved? Note that TCP (and also IP) can include header options, which results in variable header sizes. The number of TCP options can be different for each TCP segment. How does this congestion control mechanism correctly handle the headers and the options in IP and TCP headers?

3/ How does the document deal with RTT variations? Is the assumption that the RTT is constant?

* As far as I can tell from experiments, the RTT estimation is important when applying a rate to window-based congestion control, which is what this document does.
* Equations such "MinBandwidthWND = CIR * RTT/MSS" or "MaxBandwidthWND = PIR * RTT/MSS" only provide a window equivalent to the bandwidth-delay product of the path if the RTT sample is a correct prediction of the actual delay that the segments in flight will experience. How does the mechanism suggested in this document correctly predict the future RTT of the segments that are sent by the sender at a given point in time?
* As an example, assume that the RTT at time t=10s is determined as 80ms. Assume PIR = 10 Mbps and neglect the questions 3/ and 4/. Then this document would probably assume that MaxBandwidthWND=100 kB is the bandwidth delay product of the path during t=10s and t=10.08s, i.e., the maximum amount of outstanding data that can be sent in that time without drops (or exceeding PIR). But assume that the actual round-trip delay of segments has dropped to 40ms after the last RTT management, which means that the maximum bandwidth delay product of the path at t=10s+epsilon is only 50 kB. As a result, 50 Kb out of the congestion window would likely be dropped during t=10s and t=10.08s. And, due to the wrong RTT value, the effective data rate of the sender could even be 20 Mbps, if the RTT mismatch is not detected immediately, or, e.g., if EWMA will delay the update of the weighted RTT parameter to the actual value.
* So how does the proposed scheme to indeed determine a window that meets the statement "This means a TCP sender is never allowed to send data at a rate larger than PIR" if the RTT is not constant? Does this assume rate pacing in the TCP sender for each TCP connection?

4/ How is it ensured that OAM alarms will reach the TCP sender in time in all possible "random failure" cases?

* As far as I understand, the following statement "When a sender receives the third duplicated ACK, but no previous OAM congestion alarm has been received, then it is considered that a segment is lost due to random failure not congestion. In this case the cwnd is not changed." mandates that an OAM alarm is received prior to the third duplicate ACK *in all potential cases* of congestion. If the OAM alarm got lost or delayed, this condition would imply that cwnd is not changed despite a segment has been dropped due to congestion, which would be a violation of fundamental Internet congestion control principles.
* Please expand on how this document ensures that cwnd will be reduces in all potential cases when a packet gets dropped due to congestion, and what requirements on the OAM alarms propagation follow from that. Of specific interest are effects such as packet drops of packets relevant for the OAM information, reordering of packets, asymmetric routing in forward and reverse direction, use of multiple paths in parallel (ECMP), and the like. If the document makes assumption about the path such as in-order packet delivery or the like, these assumptions need to be spelt out explicitly.
* I understand that the OAM could be solved in different ways and the solution is independent of this document. But this document has to comprehensively specify all technical requirements that the OAM mechanism has to meet in order to ensure that every single packet drop due to congestion always results in a cwnd reduction. Otherwise the algorithm has to change as it does not safely prevent congestion.

5/ What is the expected performance benefit? Are there situations in which performance will be worse than standard TCP congestion control?

* The document does not contain any data of potential improvements or deteriorations as compared to the TCP standard congestion control. I assume that such data will be presented to explain why this mechanism is proposed, and what benefits it has.
* As I have experimented with similar mechanisms quite a bit, I believe there will be cases in which this congestion control mechanism will perform worse than a TCP sender fully compliant to RFC 5681, when using the same network path with CIR and PIR guarantees. I believe this document should analyze these cases and reason why a worse performance than standard TCP congestion control will be acceptable. IMHO this issue will specifically apply in cases when PIR is significantly larger than CIR, and if the RTT is large. As far as I can see, this draft mandates to start the data transfer in congestion avoidance at CIR rate, which means that it can take many RTTs until the sender reaches the PIR. In contrast, RFC 5681 will run slow-start, and RFC 5681 states that the "initial value of ssthresh SHOULD be set arbitrarily high". This means that the TCP sender can reach PIR within few RTTs and thus can send with full PIR speed, while a sender using draft-han-tsvwg-cc-00 will send with a much lower speed CIR+epsilon. For short and medium-sized data transfers, IMHO it can happen that the congestion control according to RFC 5681 will significantly outperform the mechanism suggested in this document, i.e., it will complete data transfers orders of magnitude faster even without any knowledge about CIR and PIR. Have the authors compared this mechanism to the performance of RFC 5681?
* Also, have the authors compared the performance of this mechanism as compared to a modern TCP stack, which often use RFC 6928 (IW 10) and RFC 8312 (CUBIC)? In what cases has the suggested congestion control a better performance? I ask this because I have performed experiments 10 years ago with congestion control schemes that have some similarity to what is suggested here, and they also used knowledge about the path properties. In those experiments, it turned out to be quite difficult to design an algorithm that uses knowledge about the path (such as CIR/PIR) and that outperforms CUBIC in combination IW 10, even if such a stack is totally unaware of the path. This has been discussed e.g. in ICCRG https://www.ietf.org/proceedings/73/slides/iccrg-2.pdf. As context, "more-start" in this document is somewhat similar to what is proposed in this I-D (but applied to CUBIC), while the "initial-start" graphs somehow corresponds to what was later specified in RFC 6928 (IW 10) and RFC 8312 (CUBIC).

6/ For what application traffic patterns is this mechanism proposed?

* The document states in Section 3 "... with the development of new applications, such as AR/VR". What properties do such applications have to leverage the mechanism suggested in this document? Is it possible to characterize what the "new" requirements are, and how the suggested algorithm meets these requirements?
* Is it suggested to apply this congestion control to real-time media traffic over TCP? If so, what would be the benefit of using TCP in general and of the specific mechanism compared to congestion control algorithm for such traffic (e.g., the RMCAT working group)?

This list of questions is not comprehensive, but I'll stop here.

Regarding potential next steps of this document in the TCPM working group, I believe that the applicable TCPM charter wording is: "In addition, TCPM may document alternative TCP congestion control algorithms that are known to be widely deployed, and that are considered safe for large-scale deployment in the Internet." Until these prerequisites are fulfilled in the Internet, in my view the document cannot be adopted in TCPM. Research could be performed in the IRTF, e.g., in ICCRG.

Thanks

Michael (with no hat on)

Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Scharf, Michael (Nokia - DE/Stuttgart)
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Eggert, Lars
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Yingzhen Qu
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Scharf, Michael (Nokia - DE/Stuttgart)
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Scharf, Michael (Nokia - DE/Stuttgart)
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Yingzhen Qu