Re: [tcpm] [tsvwg] draft-han-tsvwg-cc

"Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com> Fri, 16 March 2018 13:44 UTC

From: "Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com>
To: Toerless Eckert <tte@cs.fau.de>
CC: Thomas Nadeau <tnadeau@lucidvision.com>, "tcpm@ietf.org" <tcpm@ietf.org>, Yingzhen Qu <yingzhen.qu@huawei.com>
Thread-Topic: [tsvwg] draft-han-tsvwg-cc
Thread-Index: AQHTvNuRsY1NVnC/X06wq/ofyfzol6PSzbEQ
Date: Fri, 16 Mar 2018 13:44:48 +0000
Message-ID: <VI1PR0701MB255800F95712D680DECC977C93D70@VI1PR0701MB2558.eurprd07.prod.outlook.com>
References: <20180316040208.GA9492@faui40p.informatik.uni-erlangen.de>
In-Reply-To: <20180316040208.GA9492@faui40p.informatik.uni-erlangen.de>
Accept-Language: en-US, de-DE
Content-Language: en-US
received-spf: None (protection.outlook.com: nokia.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 3f95d880-d4aa-4a4f-48a9-08d58b44164c
X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Mar 2018 13:44:48.9834 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1SPR00MB65
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/YpZ9WaFLRF_Vv-bWT5axqBCu8B4>
Subject: Re: [tcpm] [tsvwg] draft-han-tsvwg-cc
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Mar 2018 13:44:56 -0000

> > Authors, all,
> >
> > I have read draft-han-tsvwg-cc-00. Below I have listed a number of
> questions, which I believe would have to be addressed when discussing such
> a mechanism in the IETF or IRTF.
> 
> Thanks, much appreciated. Not a listed author, but let me give you some of
> my thoughts.
> 
> I started trying to answer your questions individually, but it became IMHO
> too unstructured, long. So let me try instead some context setting and please
> tell me if you agree/disagree with the individual points. This is btw. just my

Your points seem not to be about my questions on why draft-han-tsvwg-cc-00 would be better as compared to running today's TCP congestion control over the same scenario, and how the congestion control would be ensured to work properly.

> personal opinions, the authors may disagree.
> 
> Fundamentally, to comply to existing IETF QoS architecture, this method
> should be used in traffic classes with DS code points intended to be used
> with admission control/bandwidth management.  Such as AF.  or EF, or
> (forgot the others, but we'll look them up). Definitel not BE.

My reading of the document is that it assumes that CIR and PIR applies to a single TCP connection and CIR is guaranteed *for each connection*. So the admission control and bandwidth management must apply for each TCP connection using this congestion control scheme, not for traffic aggregates.

This specifically means that TCP connections will have to be rejected by the network if they intent to use this congestion control mechanism but the CIR cannot be guaranteed, or the network would have to ensure that the congestion control falls back to TCP Reno in that case. This has to be specified, otherwise the congestion control is not safe.
 
> Note: All the standards RFCs about those traffic classes do not mention
> whether they're intended for the Internet or not.
> Effectively they are almost never deployed on an Internet service, but on
> what we would call "controlled networks". But they don't say this.
>
> Note 2: All those traffic class RFCs also are quite unspecific to the details of
> those bandwidth management/traffic control mechanisms. YOu asked for
> more details about e.g.: our in-band-signaling. On one hand i'd love to discuss
> it as much as possible, but just for the purpose of this CC draft, i think we
> should really only discuss any details of those bandwidth
> manaement/admission control that would impact the CC approach. See
> below for one example i have (ECN parameter).
> 
> For example, the 10 year old draft-lochin-ietf-tsvwg-gtfrc was spec'ed with
> the example of AF and got as far as i understood it fairly positive WG
> feedback (just died because of authors changing jobs/priorities) (Given how
> it was TFRC based it was not primarily targeting TCP).
> 
> I would very much like to avoid changing the DSCP/PHP behavior of any
> classes, so this can not go into BE as a target solution. I would like to have a
> sentence like "any use in BE or other non-admission controled code points
> MUST stay within controlled networks"
> 
> The reason to even mention this is just how SPs abuse DSCPs:
> inelastic video (non-congestion controlled) from service provider to their
> subs is often just using DSCP 0 for lazyness (overprovisioning). I mostly know
> this for multicast, but also VoD unicast (UDP).
> Even more so for voice (all inelastic codecs). So, if this approach is useful to
> those type of constrained networks... oh well...
> 
> If i where to come up with a more complete TCP stack and API description, i
> would probably default these type of flows to send with an AF class, and if
> the SP resets these to 0 instead of filtering them... see below.
> 
> Having said this, the whole discussion about CC details is of course still core
> and all your questions are
> valid:
> 
> I think we definitely should define a circuit breaker based on TCP behavior,
> based on the principles of RFC8084.

The TCP congestion control already takes care of this, no?
 
> Something like:
> - No TCP replies for more than N secs, break circuit
> - Only loss indication for more than M secs, fall back
>   to CIR=0 behavior or break circuit (app based).

This is basically the TCP RTO.

> Maybe also based on other TCP derived error recognition.
> Suggestions welcome. RFC8084 is so new, i have not seem good practices
> that could be applied to TCP. And i guess unlikely others have thought about
> this, given how there was no minimum CIR inelastic behavior for TCP..

If the network guarantees a minimum CIR to a single TCP connection, the TCP congestion control will fully use this CIR as long as there is no congestion, in particular if one uses a modern TCP stack with high-speed congestion control, efficient error recovery (SACK), and possibly even ECN (to avoid packet drops if PIR is reached, which can make sense for latency-sensitive apps). So an inelastic application that needs at minimum CIR should work perfectly with existing TCP over that path with CIR guarantees, right?

And the same TCP congestion control will also work very well if for some reason CIR cannot be guaranteed in the network. That can e.g. occur even in a network that can guarantee CIR and PIR, e.g., if there is network misconfiguration. In that case, the app will need to throttle down their sending rate, but as the network does not deliver CIR, there is no alternative to reduce the rate anyway.

> The circuit breaker MUST IMHO be so strict it would pass Internet/BE muster
> - because of the unintended case where an SP resets the DSCP to 0 (instead
> of filtering
> them) and these flows do unintentionally get into the Internet without any
> real admission control. Don't think we have this case as a mandatory
> requirement (against screwed up SPs) in TSV review, but it would be really
> nice to meet.
> 
> ECN was IMHO just an oversight of the -00. ECN MUST be used unchanged
> from existing recommendations when sending with more than CIR.
> 
> When sending with less than CIR, it should be subject to a (binary) parameter
> whether or not to use ECN as an explicit congestion OAM indication resulting
> in a faster circuit breaker/revert to CIR=0, or whether to ignore it completely
> (and use the above loss/other TCP parameter based circuit breaker).

That seems to be a different algorithm than what is currently in the draft. The starting point for a discussion about congestion control is a complete specification and measurement results. This could possibly also be done at research conferences.
 
> Parameter really depends on how DSCP is set up:
> 
> When you end up in a single queue / same drop priority with flows sending
> more than CIR, then you will see congestion even though you're sending
> below your admission approved CIR, so you need to ignore ECN. This is not
> an ideal case, but would happen if you only use external,
> eg: SDN/PCEP admission control schemes without e.g.:
> our in-band signaling (mileage with RSVP would vary based on how it controls
> the forwarding plane).

I cannot follow why it would be appropriate to ignore ECN as in that case a router is clearly congested. 

As far as I recall as "TCP wizard", the acronyms "SDN" and "PCEP" have not been mentioned in TCPM so far, albeit PCEP is a TCP-based protocol. As this have not come up in TCPM so far, I would assume that the use of "SDN" and "PCEP" technology could also work with the existing TCP standards. What do I miss?

> I would definitely default ECN=on to ensure some integration with external
> bandwidth management explicitly turns it off when its known to break the
> scheme.
> 
> To re-summarize:
> 
> - Only use on bandwidth management compatible traffic classes

No, as far as I understand, draft-han-tsvwg-cc requires bandwidth management per TCP connection

> - Constrain to controlled network otherwise

No to "otherwise"

> - no redefinition of intended use of traffic classes
>   (i think, need to revisit some AF details)

No, out of my head per-TCP connection admission control would require changes in AF (but I don't recall for sure)

> - circuit breaker making it safe against
>   use across ERRONEOUS use across non-controlled/Internete/BE.

No, circuit breaker is not needed in TCP as long as applications are elastic. For totally unelastic traffic, the question would first be why to use TCP first of all and I don't see this answered in the drafts.

> - ECN > CIR, ECN as parameter < CIR

No, at first sight that does not make sense but I would need to see a full spec, measurement, etc. 

Thanks

Michael


> Of course, lots of still unanswered points from your side, but hope this is a
> good start.
> 
> Cheers
>     toerless
> 
> P.S: You said no hat on, but i still see the pointy TCP wizard hat.
> 
> Authors, all,
> 
> I have read draft-han-tsvwg-cc-00. Below I have listed a number of
> questions, which I believe would have to be addressed when discussing such
> a mechanism in the IETF or IRTF.
> 
> This e-mail is strictly limited to the content of draft-han-tsvwg-cc-00. As the
> draft does not specify how the CIR and PIR will actually be guaranteed in the
> Internet, as well as how OAM signaling will work at Internet scale, I will not
> comment here on these assumptions, except regarding requirements that
> strictly follow from the content of the I-D. The technical, economical, and
> regulation aspects of the assumptions are not in scope of TCPM and they
> need to be discussed and solved elsewhere.
> 
> Questions on draft-han-tsvwg-cc-00:
> 
> 1/ The document seems to implicitly assume that network resources are
> reserved for *every* single TCP connection, right?
> 
> 
>   *   If that assumption is correct, it has to be spelt out explicitly in the text
> and it has to be noted that the underlying technology has to provide these
> capabilities *for every single* TCP connection.
>   *   Otherwise sentences like "after a TCP session is successfully  initiated its
> congestion window (cwnd) jumps to CIR" would not make sense as multiple
> TCP connections within an traffic aggregate policied by CIR/PIR could start to
> all send with CIR in parallel, which would trigger massive congestion.
>   *   As an example, in my reading draft-han-6man-in-band-signaling-for-
> transport-qos-00 would allow also reservations e.g. for aggregates of
> multiple TCP connections. Such an operation mode seems not be compatible
> with the suggested mechanism in this I-D, as far as I understand. So the
> requirements have to be made explicit.
>   *   Also, sentences such as "it is assumed that in bandwidth guaranteed
> networks there have been network resources (bandwidths, queues etc.)
> dedicated to the TCP flows" have to be corrected to specify that for the
> mechanism in this draft to work correctly, the resources have to be
> guaranteed to every single TCP connection, not multiple "flows".
> 
> 2/  Why does the document not rely on ECN (and not even reference ECN)?
> 
> 
>   *   For instance, the following requirement "It is important that OAM needs
> to be able to detect if any device's  buffer depth has exceeded the pre-
> configured threshold, as this is an indication of potential congestion and
> packet drop" could possibly be solved by ECN, no?
>   *   Even in case another OAM mechanism could be used in addition, a
> comprehensive TCP congestion control specification would have to also cover
> the reaction to ECN marks as well, as well as the potential combination of
> feedback results. Why is this missing?
>   *   Or would the document mandates that ECN MUST NOT be enabled for
> TCP connections using this congestion control mechanism?
> 
> 3/ Why does the document assume that congestion windows are calculated
> in segments and not in bytes?
> 
> 
>   *   RFC 5681 as well as many other RFCs calculate CWND in bytes.
>   *   However, I believe equations such "MinBandwidthWND = CIR *
> RTT/MSS" or "MaxBandwidthWND = PIR * RTT/MSS" would return a window
> counted in MSS segments.
>   *   Apart from the mismatch with the TCP standards, this sort of equation
> might also requires a discussion on how to deal with integer division.
> 
> 4/ How does the mechanism deal with IP and TCP header overhead?
> 
> 
>   *   TCP window sizes are about the TCP bytestream, while the actual IP
> packets sent by a TCP/IP stack will include an IP and TCP header. If one
> neglects the IP and TCP headers in the congestion window calculation, the
> resulting IP packet rate will be larger than the CIR and PIR seen in the TCP
> layer. This could result in packet drops if CIR and PIR are enforced e.g. for IP
> packet length.
>   *   How will this problem be solved? Note that TCP (and also IP) can include
> header options, which results in variable header sizes. The number of TCP
> options can be different for each TCP segment. How does this congestion
> control mechanism correctly handle the headers and the options in IP and
> TCP headers?
> 
> 3/ How does the document deal with RTT variations? Is the assumption that
> the RTT is constant?
> 
> 
>   *   As far as I can tell from experiments, the RTT estimation is important
> when applying a rate to window-based congestion control, which is what this
> document does.
>   *   Equations such "MinBandwidthWND = CIR * RTT/MSS" or
> "MaxBandwidthWND = PIR * RTT/MSS" only provide a window equivalent to
> the bandwidth-delay product of the path if the RTT sample is a correct
> prediction of the actual delay that the segments in flight will experience. How
> does the mechanism suggested in this document correctly predict the future
> RTT of the segments that are sent by the sender at a given point in time?
>   *   As an example, assume that the RTT at time t=10s is determined as 80ms.
> Assume PIR = 10 Mbps and neglect the questions 3/ and 4/. Then this
> document would probably assume that MaxBandwidthWND=100 kB is the
> bandwidth delay product of the path during t=10s and t=10.08s, i.e., the
> maximum amount of outstanding data that can be sent in that time without
> drops (or exceeding PIR). But assume that the actual round-trip delay of
> segments has dropped to 40ms after the last RTT management, which means
> that the maximum bandwidth delay product of the path at t=10s+epsilon is
> only 50 kB. As a result, 50 Kb out of the congestion window would likely be
> dropped during t=10s and t=10.08s. And, due to the wrong RTT value, the
> effective data rate of the sender could even be 20 Mbps, if the RTT mismatch
> is not detected immediately, or, e.g., if EWMA will delay the update of the
> weighted RTT parameter to the actual value.
>   *   So how does the proposed scheme to indeed determine a window that
> meets the statement "This means a TCP sender is never allowed to send data
> at a rate larger than PIR"  if the RTT is not constant? Does this assume rate
> pacing in the TCP sender for each TCP connection?
> 
> 4/ How is it ensured that OAM alarms will reach the TCP sender in time in all
> possible "random failure" cases?
> 
> 
>   *   As far as I understand, the following statement "When a sender receives
> the third duplicated ACK, but no previous OAM congestion alarm has been
> received, then it is considered that a segment is lost due to random failure
> not congestion.  In this case the cwnd is not changed." mandates that an
> OAM alarm is received prior to the third duplicate ACK *in all potential cases*
> of congestion. If the OAM alarm got lost or delayed, this condition would
> imply that cwnd is not changed despite a segment has been dropped due to
> congestion, which would be a violation of fundamental Internet congestion
> control principles.
>   *   Please expand on how this document ensures that cwnd will be reduces
> in all potential cases when a packet gets dropped due to congestion, and
> what requirements on the OAM alarms propagation follow from that. Of
> specific interest are effects such as packet drops of packets relevant for the
> OAM information, reordering of packets, asymmetric routing in forward and
> reverse direction, use of multiple paths in parallel (ECMP), and the like. If the
> document makes assumption about the path such as in-order packet delivery
> or the like, these assumptions need to be spelt out explicitly.
>   *   I understand that the OAM could be solved in different ways and the
> solution is independent of this document. But this document has to
> comprehensively specify all technical requirements that the OAM
> mechanism has to meet in order to ensure that every single packet drop due
> to congestion always results in a cwnd reduction. Otherwise the algorithm
> has to change as it does not safely prevent congestion.
> 
> 5/ What is the expected performance benefit? Are there situations in which
> performance will be worse than standard TCP congestion control?
> 
> 
>   *   The document does not contain any data of potential improvements or
> deteriorations as compared to the TCP standard congestion control. I assume
> that such data will be presented to explain why this mechanism is proposed,
> and what benefits it has.
>   *   As I have experimented with similar mechanisms quite a bit, I believe
> there will be cases in which this congestion control mechanism will perform
> worse than a TCP sender fully compliant to RFC 5681, when using the same
> network path with CIR and PIR guarantees. I believe this document should
> analyze these cases and reason why a worse performance than standard TCP
> congestion control will be acceptable. IMHO this issue will specifically apply in
> cases when PIR is significantly larger than CIR, and if the RTT is large. As far as
> I can see, this draft mandates to start the data transfer in congestion
> avoidance at CIR rate, which means that it can take many RTTs until the
> sender reaches the PIR. In contrast, RFC 5681 will run slow-start, and RFC
> 5681 states that the "initial value of ssthresh SHOULD be set arbitrarily high".
> This means that the TCP sender can reach PIR within few RTTs and thus can
> send with full PIR speed, while a sender using draft-han-tsvwg-cc-00 will
> send with a much lower speed CIR+epsilon. For short and medium-sized data
> transfers, IMHO it can happen that the congestion control according to RFC
> 5681 will significantly outperform the mechanism suggested in this
> document, i.e., it will complete data transfers orders of magnitude faster
> even without any knowledge about CIR and PIR. Have the authors compared
> this mechanism to the performance of RFC 5681?
>   *   Also, have the authors compared the performance of this mechanism as
> compared to a modern TCP stack, which often use RFC 6928 (IW 10) and RFC
> 8312 (CUBIC)? In what cases has the suggested congestion control a better
> performance? I ask this because I have performed experiments 10 years ago
> with congestion control schemes that have some similarity to what is
> suggested here, and they also used knowledge about the path properties. In
> those experiments, it turned out to be quite difficult to design an algorithm
> that uses knowledge about the path (such as CIR/PIR) and that outperforms
> CUBIC in combination IW 10, even if such a stack is totally unaware of the
> path. This has been discussed e.g. in ICCRG
> https://www.ietf.org/proceedings/73/slides/iccrg-2.pdf. As context, "more-
> start" in this document is somewhat similar to what is proposed in this I-D
> (but applied to CUBIC), while the "initial-start" graphs somehow corresponds
> to what was later specified in RFC 6928 (IW 10) and RFC 8312 (CUBIC).
> 
> 6/ For what application traffic patterns is this mechanism proposed?
> 
> 
>   *   The document states in Section 3 "... with the development of new
> applications, such as AR/VR". What properties do such applications have to
> leverage the mechanism suggested in this document? Is it possible to
> characterize what the "new" requirements are, and how the suggested
> algorithm meets these requirements?
>   *   Is it suggested to apply this congestion control to real-time media traffic
> over TCP? If so, what would be the benefit of using TCP in general and of the
> specific mechanism compared to congestion control algorithm for such traffic
> (e.g., the RMCAT working group)?
> 
> This list of questions is not comprehensive, but I'll stop here.
> 
> Regarding potential next steps of this document in the TCPM working group,
> I believe that the applicable TCPM charter wording is: "In addition, TCPM may
> document alternative TCP congestion control algorithms that are known to
> be widely deployed, and that are considered safe for large-scale deployment
> in the Internet." Until these prerequisites are fulfilled in the Internet, in my
> view the document cannot be adopted in TCPM. Research could be
> performed in the IRTF, e.g., in ICCRG.
> 
> Thanks
> 
> Michael (with no hat on)

Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Scharf, Michael (Nokia - DE/Stuttgart)
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Eggert, Lars
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Yingzhen Qu
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Scharf, Michael (Nokia - DE/Stuttgart)
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Toerless Eckert
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Scharf, Michael (Nokia - DE/Stuttgart)
Re: [tcpm] [tsvwg] draft-han-tsvwg-cc Yingzhen Qu