Re: [tcpm] Questions on draft-han-tsvwg-cc-00

"Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com> Sun, 18 March 2018 23:39 UTC

Return-Path: <michael.scharf@nokia.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C32612D82F; Sun, 18 Mar 2018 16:39:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.91
X-Spam-Level:
X-Spam-Status: No, score=-2.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=-1, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nokia.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id taDQELDzXRgA; Sun, 18 Mar 2018 16:38:54 -0700 (PDT)
Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20125.outbound.protection.outlook.com [40.107.2.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 12C5012D810; Sun, 18 Mar 2018 16:38:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nokia.onmicrosoft.com; s=selector1-nokia-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=f1sbeDhW9doWYRcU91N84kXq5q3ti4ttDeV47jK/NGc=; b=aspF45AjRG8lsRVGzshlA89Eau7RRhR9FlALobQh3AUVdRwR4pPcHkNVQzrl9476Ew6TCUGX3S44yHFFpc7DZUWS+iuGVZ2CL8/oJ64JGsJdxRbomMKlDXyReQcyvvwVEWwVvUnzVthvj8M7N4IMiq05I24PhV/rNcjddEBJSmQ=
Received: from AM5PR0701MB2547.eurprd07.prod.outlook.com (10.173.92.15) by AM5PR0701MB2833.eurprd07.prod.outlook.com (10.168.155.136) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.609.6; Sun, 18 Mar 2018 23:38:49 +0000
Received: from AM5PR0701MB2547.eurprd07.prod.outlook.com ([fe80::4935:9288:dcd6:7db0]) by AM5PR0701MB2547.eurprd07.prod.outlook.com ([fe80::4935:9288:dcd6:7db0%5]) with mapi id 15.20.0609.008; Sun, 18 Mar 2018 23:38:49 +0000
From: "Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com>
To: Lin Han <Lin.Han@huawei.com>
CC: "tcpm@ietf.org" <tcpm@ietf.org>, "tsvwg-chairs@ietf.org" <tsvwg-chairs@ietf.org>, "'iccrg@irtf.org'" <iccrg@irtf.org>, Thomas Nadeau <tnadeau@lucidvision.com>, Yingzhen Qu <yingzhen.qu@huawei.com>
Thread-Topic: Questions on draft-han-tsvwg-cc-00
Thread-Index: AdO8pkvfd4lsbPnoT7CbLbNzdShcRwBnm4KAAAy5PwAADe51sAALJbCwAA6pEJAABBV+4A==
Date: Sun, 18 Mar 2018 23:38:49 +0000
Message-ID: <AM5PR0701MB25470F197ABD05A1374CDCC493D50@AM5PR0701MB2547.eurprd07.prod.outlook.com>
References: <AM5PR0701MB2547AA3C16E849FDFED8857093D00@AM5PR0701MB2547.eurprd07.prod.outlook.com> <EB58F23F-C561-4D9D-A926-43ED428F36D5@huawei.com> <27982D64-95F7-48DB-AADE-F8D3015CF790@huawei.com> <1D30AF33624CDD4A99E8C395069A2A162CDBC402@sjceml521-mbs.china.huawei.com> <AM5PR0701MB2547979A2B7D9D3CE40475E893D50@AM5PR0701MB2547.eurprd07.prod.outlook.com> <1D30AF33624CDD4A99E8C395069A2A162CDBC911@sjceml521-mbs.china.huawei.com>
In-Reply-To: <1D30AF33624CDD4A99E8C395069A2A162CDBC911@sjceml521-mbs.china.huawei.com>
Accept-Language: en-US, de-DE
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-originating-ip: [5.148.130.171]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; AM5PR0701MB2833; 7:jYE22n3vspzfJbI+GWBRGVy/EqYmtiS+ThOLz3KbwUBG1MWmk58kDtcnN9DO1SXy9bZj0aM6zQcAoHiv5/UWCkwVqHbV/rfsTUckOhsMRH2H/dChCHKGSImE3+VSa9clkvVut3lVhEYquM6q3NuIrXR/WB41axcNHd4AM7Sy52qJc3eKuRMkDl+HgUY/RFnUQqOS1PTTLxz6rU+pTHwAw4cHSaPZNG9LUVqsp6JkW+Rl/OAC2MSe8XXn3OJF5Tl/
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: 667fa50e-6956-4649-15a9-08d58d296672
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(49563074)(7193020); SRVR:AM5PR0701MB2833;
x-ms-traffictypediagnostic: AM5PR0701MB2833:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=michael.scharf@nokia.com;
x-microsoft-antispam-prvs: <AM5PR0701MB2833ABB4EE025A59B1895F9D93D50@AM5PR0701MB2833.eurprd07.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(28532068793085)(190756311086443)(278428928389397)(50582790962513)(82608151540597)(21748063052155)(17755550239193);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(102415395)(6040522)(2401047)(8121501046)(5005006)(3231221)(11241501184)(806099)(944501300)(52105095)(3002001)(93006095)(93001095)(10201501046)(6055026)(6041310)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123562045)(20161123558120)(6072148)(201708071742011); SRVR:AM5PR0701MB2833; BCL:0; PCL:0; RULEID:; SRVR:AM5PR0701MB2833;
x-forefront-prvs: 06157D541C
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(376002)(39860400002)(39380400002)(366004)(346002)(396003)(189003)(199004)(4326008)(7696005)(102836004)(99286004)(2906002)(99936001)(97736004)(16200700003)(76176011)(8936002)(229853002)(5660300001)(25786009)(93886005)(59450400001)(186003)(26005)(316002)(53546011)(6506007)(66066001)(2900100001)(53936002)(81166006)(74316002)(81156014)(733005)(71446004)(106356001)(7736002)(606006)(966005)(478600001)(5250100002)(14454004)(86362001)(54906003)(105586002)(6246003)(8676002)(3660700001)(3280700002)(2950100002)(68736007)(53946003)(561944003)(6916009)(9686003)(6116002)(3846002)(6436002)(33656002)(54556002)(54896002)(55016002)(236005)(6306002)(579004)(569006); DIR:OUT; SFP:1102; SCL:1; SRVR:AM5PR0701MB2833; H:AM5PR0701MB2547.eurprd07.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en;
received-spf: None (protection.outlook.com: nokia.com does not designate permitted sender hosts)
x-microsoft-antispam-message-info: EqU5sjYkWgbYC4s5kCiDqaKNtAG2FpZXZHh9PHd1P5cPXR2ZayG6XN/XAbiQNnHlvsyeUVB9Twb0n3EIsK40lW98ET+m8+HwGHoF/Ue6ur+itxmgN+9feHW9xTq3TkSigfVk/8TOgfZ/eE6ZO5w8i1XdCHN6q8OglbiqeZq/gA16DiJRInxgABwyy55kMEMIrFVhrzZ51ToX1rktJUkJCmEJicj7F9yC+OqIuEIcA0yfkXL9617H5Ddr6N20Re1Rq3+xmq6ZrQvkH2thXHyI68i9TqKRsdurQEVQXVhy4LuV9JYiD/kF8I3bX08kTcsItm6xVmb/J+obkuQmV++rFG/2O/FC61sljyUcn0Gq03Besp1/3cH+Viv5wNYk9H+O
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/related; boundary="_012_AM5PR0701MB25470F197ABD05A1374CDCC493D50AM5PR0701MB2547_"; type="multipart/alternative"
MIME-Version: 1.0
X-OriginatorOrg: nokia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 667fa50e-6956-4649-15a9-08d58d296672
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Mar 2018 23:38:49.3200 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0701MB2833
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/IRVTcsNdhOPs2CiTDnbl5WoNJRw>
X-Mailman-Approved-At: Sun, 18 Mar 2018 16:40:05 -0700
Subject: Re: [tcpm] Questions on draft-han-tsvwg-cc-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Mar 2018 23:39:01 -0000

Thanks a lot for confirming that in draft-han-6man-in-band-signaling-for-transport-qos “more study and research need to be done” for applying the mechanism e.g. to Ethernet, which seems to me a pretty common technology to transport TCP connections. So there seem to be pretty fundamental gaps that need research. Research can e.g. be published at conferences or in journals.

Furthermore, it is obvious that a rate policer could be applied in the host in addition to the proposed algorithm. But I still believe that the proposed algorithm itself will result in data rates larger than PIR in the scenarios I have explained (drops of the RTT), i.e., it does not meet the objective that “a TCP sender is never allowed to send data at a rate larger than PIR”. Depending on how it is implemented, an additional rate policer for PIR may drop packets already locally inside the host, because the congestion control algorithm plays out too many packets. That would be very ineffective. If your objective is a congestion control algorithm that at most results in a rate of PIR, your algorithm needs to change, IMHO.

Also, I have said before that I believe TCP Reno can outperform your algorithm for the same path with CIR/PIR. If you don’t believe it, I suggest to look at the transfer times of medium sized transfers with a CIR << PIR over an larger RTT, and compare the performance of TCP Reno with your algorithm. IMHO you will find quite a number of cases in which TCP Reno will complete such a data transfers significantly faster than the algorithm in draft-han-tsvwg-cc-00. I have already explained the root cause earlier. This effect should occur even if there is no PIR capping. It means that a user could still request the guaranteed resources, but just use the existing TCP congestion control with it, and he will get better throughput and/or response times over the resources that were requested. In that case, what benefit would a user have from using draft-han-tsvwg-cc-00?

Finally, I cannot parse the figure in the e-mail below; this is useless without labels and the like. A statement “We will have more comparison test results (CC vs new CC) in the future” is somewhat concerning. Before proposing an algorithm for IETF standardization, I believe that such a proposal must be comprehensively tested in simulations and measurements with real TCP stacks to ensure that the algorithm is properly designed, works as expected, and indeed has benefits over what already exists. None of this is clear for draft-han-tsvwg-cc-00.

Michael

From: Lin Han [mailto:Lin.Han@huawei.com]
Sent: Sunday, March 18, 2018 12:09 PM
To: Scharf, Michael (Nokia - DE/Stuttgart) <michael.scharf@nokia.com>
Cc: tcpm@ietf.org; tsvwg-chairs@ietf.org; 'iccrg@irtf.org' <iccrg@irtf.org>; Thomas Nadeau <tnadeau@lucidvision.com>; Yingzhen Qu <yingzhen.qu@huawei.com>
Subject: RE: Questions on draft-han-tsvwg-cc-00



From: Scharf, Michael (Nokia - DE/Stuttgart) [mailto:michael.scharf@nokia.com]
Sent: Sunday, March 18, 2018 12:09 AM
To: Lin Han <Lin.Han@huawei.com<mailto:Lin.Han@huawei.com>>
Cc: tcpm@ietf.org<mailto:tcpm@ietf.org>; tsvwg-chairs@ietf.org<mailto:tsvwg-chairs@ietf.org>; 'iccrg@irtf.org' <iccrg@irtf.org<mailto:iccrg@irtf.org>>; Thomas Nadeau <tnadeau@lucidvision.com<mailto:tnadeau@lucidvision.com>>; Yingzhen Qu <yingzhen.qu@huawei.com<mailto:yingzhen.qu@huawei.com>>
Subject: RE: Questions on draft-han-tsvwg-cc-00

Some follow-up remarks inline [ms]

Michael

From: Lin Han [mailto:Lin.Han@huawei.com]
Sent: Sunday, March 18, 2018 12:40 AM
To: Scharf, Michael (Nokia - DE/Stuttgart) <michael.scharf@nokia.com<mailto:michael.scharf@nokia.com>>
Cc: tcpm@ietf.org<mailto:tcpm@ietf.org>; tsvwg-chairs@ietf.org<mailto:tsvwg-chairs@ietf.org>; 'iccrg@irtf.org' <iccrg@irtf.org<mailto:iccrg@irtf.org>>; Thomas Nadeau <tnadeau@lucidvision.com<mailto:tnadeau@lucidvision.com>>; Yingzhen Qu <yingzhen.qu@huawei.com<mailto:yingzhen.qu@huawei.com>>
Subject: FW: Questions on draft-han-tsvwg-cc-00

 Hi, Scharf

For some reason, I did  not receive this thread, maybe my filter set has problem.

See below for more clarification with [LH]

Regards

Lin

From: "Scharf, Michael (Nokia - DE/Stuttgart)" <michael.scharf@nokia.com<mailto:michael.scharf@nokia.com>>
Date: Thursday, March 15, 2018 at 4:17 PM
To: "draft-han-tsvwg-cc@ietf.org<mailto:draft-han-tsvwg-cc@ietf.org>" <draft-han-tsvwg-cc@ietf.org<mailto:draft-han-tsvwg-cc@ietf.org>>
Cc: "tcpm@ietf.org<mailto:tcpm@ietf.org>" <tcpm@ietf.org<mailto:tcpm@ietf.org>>, "tsvwg@ietf.org<mailto:tsvwg@ietf.org>" <tsvwg@ietf.org<mailto:tsvwg@ietf.org>>, "iccrg@irtf.org<mailto:iccrg@irtf.org>" <iccrg@irtf.org<mailto:iccrg@irtf.org>>
Subject: Questions on draft-han-tsvwg-cc-00
Resent-From: <alias-bounces@ietf.org<mailto:alias-bounces@ietf.org>>
Resent-To: <yingzhen.qu@huawei.com<mailto:yingzhen.qu@huawei.com>>, <tnadeau@lucidvision.com<mailto:tnadeau@lucidvision.com>>
Resent-Date: Thursday, March 15, 2018 at 4:17 PM

Authors, all,

I have read draft-han-tsvwg-cc-00. Below I have listed a number of questions, which I believe would have to be addressed when discussing such a mechanism in the IETF or IRTF.

This e-mail is strictly limited to the content of draft-han-tsvwg-cc-00. As the draft does not specify how the CIR and PIR will actually be guaranteed in the Internet, as well as how OAM signaling will work at Internet scale, I will not comment here on these assumptions, except regarding requirements that strictly follow from the content of the I-D. The technical, economical, and regulation aspects of the assumptions are not in scope of TCPM and they need to be discussed and solved elsewhere.

Questions on draft-han-tsvwg-cc-00:

1/ The document seems to implicitly assume that network resources are reserved for *every* single TCP connection, right?


  *   If that assumption is correct, it has to be spelt out explicitly in the text and it has to be noted that the underlying technology has to provide these capabilities *for every single* TCP connection.
  *   Otherwise sentences like “after a TCP session is successfully  initiated its congestion window (cwnd) jumps to CIR” would not make sense as multiple TCP connections within an traffic aggregate policied by CIR/PIR could start to all send with CIR in parallel, which would trigger massive congestion.
  *   As an example, in my reading draft-han-6man-in-band-signaling-for-transport-qos-00 would allow also reservations e.g. for aggregates of multiple TCP connections. Such an operation mode seems not be compatible with the suggested mechanism in this I-D, as far as I understand. So the requirements have to be made explicit.
  *   Also, sentences such as “it is assumed that in bandwidth guaranteed networks there have been network resources (bandwidths, queues etc.) dedicated to the TCP flows” have to be corrected to specify that for the mechanism in this draft to work correctly, the resources have to be guaranteed to every single TCP connection, not multiple “flows”.
[YQ]: no, it doesn’t assume that network resources are reserved for EVERY single TCP connection. It assumes when a TCP connection uses this proposed congestion control, network resources need to be reserved. The TCP example used in draft-han-6man-in-band-signaling-for-transport-qos-00 is per-flow based, and the congestion control draft also assumes the resources are reserved per-flow. We will add this clarification in the next version of the draft.

[LH] The draft-han-tsvwg-cc-00 is about the CC algorithm for “NEW TCP” setup by draft-han-6man-in-band-signaling-for-transport-qos (I call it as “in-band signaling method” below) or any other method that can provide a CIR/PIR QoS service for the “NEW TCP”.
As we said in in-band signaling method, the new in-band signaling method is not supposed to be used for applications that normal TCP (reno, cubic, etc) can satisfy the throughput requirement. This is because the new method is more costly than normal TCP and involve network devices. Off course, SP may charge it, but this is business issue.
In the “in-band signaling method”, we do talk about three level of granularity for the signaling, but only the “Flow level in-band signaling” is discussed in details. So, for this new CC draft, we can say each “NEW TCP” session or flow will have a PIR/CIR.
Having said above, we can come up some summary as follows:

  1.  In a real deployment, We will use the in-band signaling method to setup “NEW TCP”, and each “NEW TCP” will have PIR/CIR associated, and the CIR is guaranteed end-to-end for the session. For all these “NEW TCP” sessions, we suggest to use the CC method described in draft-han-tsvwg-cc-00, since old CC is not very adaptive to the traffic behavior and network significance for “NEW TCP”; i.e, the minimum rate is guaranteed in network level to pass without any congestion; the given maximum rate PIR cannot be exceeded, etc.
  2.  In the same network, there certainly have much more regular TCP sessions. For those TCP, they still use the old CC algorithm.
  3.  The bandwidth resource is shared between “NEW TCP” sessions and old TCP sessions in a simple way

     *   The total reserved bandwidth for all “NEW TCP” sessions is sum(CIR),
     *   The reserved bandwidth sum(CIR) still can be used for old TCP if the total “NEW TCP” real bandwidth usage is less than sum(CIR).
     *   If total “NEW TCP” real bandwidth usage is equal or less than sum(CIR), “NEW TCP” will always grab the required bandwidth from the link un-used bandwidth and/or the bandwidth old TCP sessions are using.
     *   If total “NEW TCP” real bandwidth usage is great than sum(CIR), how much “NEW TCP” can use are dependent on the remained bandwidth for the link.

[ms] As far as I can see, draft-han-6man-in-band-signaling-for-transport-qos ONLY is not sufficient to guarantee the assumptions of this algorithm. IP packets using this congestion control mechanism could be dropped due to congestion at devices that are not routers and do not process the IP or TCP header (e.g., a simple unmanaged Ethernet switch). So, the draft must explicitly mention that an IP signaling mechanism such as draft-han-6man-in-band-signaling-for-transport-qos is absolutely not sufficient for enabling this TCP congestion control. Depending on the technology below IP used for different IP hops further mechanisms must be used. It could also make sense to exclude in deployment section these sub-IP techniques for which there is currently no known technique to meet the requirements of deploying the algorithm. Such a list of technologies would probably be useful to network administrators as they have to ensure that the network using this algorithm does not include any link with such a technology.

[LH] What you said is correct, the “draft-han-6man-in-band-signaling-for-transport-qos” only provides a method for IP network to support QoS for upper layer like TCP, but it does not help if IP network is tunneling through a non-IP network, like MPLS or Ethernet switch. This is a topic needs further study and mentioned in the section 5.4 “Heterogeneous network” in the “draft-han-6man-in-band-signaling-for-transport-qos”.


2/  Why does the document not rely on ECN (and not even reference ECN)?


  *   For instance, the following requirement “It is important that OAM needs to be able to detect if any device's  buffer depth has exceeded the pre-configured threshold, as this is an indication of potential congestion and packet drop” could possibly be solved by ECN, no?
  *   Even in case another OAM mechanism could be used in addition, a comprehensive TCP congestion control specification would have to also cover the reaction to ECN marks as well, as well as the potential combination of feedback results. Why is this missing?
  *   Or would the document mandates that ECN MUST NOT be enabled for TCP connections using this congestion control mechanism?
[YQ]: This is a good point. We haven’t thought about combining OAM and ECN together. Will need to do some research and figure it out.

[LH] We just don’t want to mess up with the current ECN usage; i.e, in a same device, “NEW TCP” can coexist with all current TCP variations, and new CC does not impact them.

[ms] The “co-existience” would have to be specified, including the reaction to ECN marks.

[LH] we will specify this, and we don’t want to interfere with ECN mark and its algorithm

3/ Why does the document assume that congestion windows are calculated in segments and not in bytes?


  *   RFC 5681 as well as many other RFCs calculate CWND in bytes.
  *   However, I believe equations such “MinBandwidthWND = CIR * RTT/MSS” or “MaxBandwidthWND = PIR * RTT/MSS” would return a window counted in MSS segments.
  *   Apart from the mismatch with the TCP standards, this sort of equation might also requires a discussion on how to deal with integer division.
[YQ]: you’re right, indeed in this draft the congestion windows are calculated in segments. We’ll some calculations also in bytes, or how to make the conversion.


4/ How does the mechanism deal with IP and TCP header overhead?


  *   TCP window sizes are about the TCP bytestream, while the actual IP packets sent by a TCP/IP stack will include an IP and TCP header. If one neglects the IP and TCP headers in the congestion window calculation, the resulting IP packet rate will be larger than the CIR and PIR seen in the TCP layer. This could result in packet drops if CIR and PIR are enforced e.g. for IP packet length.
  *   How will this problem be solved? Note that TCP (and also IP) can include header options, which results in variable header sizes. The number of TCP options can be different for each TCP segment. How does this congestion control mechanism correctly handle the headers and the options in IP and TCP headers?
[YQ]: good point. The more accurate way is: the size of real packets that got transmitted on the wire should be used to calculate how many packets can be transmitted using the bandwidth reserved. We’ll add this in the next version.

3/ How does the document deal with RTT variations? Is the assumption that the RTT is constant?


  *   As far as I can tell from experiments, the RTT estimation is important when applying a rate to window-based congestion control, which is what this document does.
  *   Equations such “MinBandwidthWND = CIR * RTT/MSS” or “MaxBandwidthWND = PIR * RTT/MSS” only provide a window equivalent to the bandwidth-delay product of the path if the RTT sample is a correct prediction of the actual delay that the segments in flight will experience. How does the mechanism suggested in this document correctly predict the future RTT of the segments that are sent by the sender at a given point in time?
  *   As an example, assume that the RTT at time t=10s is determined as 80ms. Assume PIR = 10 Mbps and neglect the questions 3/ and 4/. Then this document would probably assume that MaxBandwidthWND=100 kB is the bandwidth delay product of the path during t=10s and t=10.08s, i.e., the maximum amount of outstanding data that can be sent in that time without drops (or exceeding PIR). But assume that the actual round-trip delay of segments has dropped to 40ms after the last RTT management, which means that the maximum bandwidth delay product of the path at t=10s+epsilon is only 50 kB. As a result, 50 Kb out of the congestion window would likely be dropped during t=10s and t=10.08s. And, due to the wrong RTT value, the effective data rate of the sender could even be 20 Mbps, if the RTT mismatch is not detected immediately, or, e.g., if EWMA will delay the update of the weighted RTT parameter to the actual value.
  *   So how does the proposed scheme to indeed determine a window that meets the statement “This means a TCP sender is never allowed to send data at a rate larger than PIR”  if the RTT is not constant? Does this assume rate pacing in the TCP sender for each TCP connection?
 [YQ]: no, RTT is not assumed to be constant in this draft. It’s calculated using RTT = a* old RTT + (1-a) * new RTT   (0 < a < 1)   (1)

[ms] And, as explained in my example, the statement “This means a TCP sender is never allowed to send data at a rate larger than PIR” is not met by the current design of the algorithm. I have provided an actual example where the algorithm breaks. Please explain how the algorithm ensures that a TCP sender never sends more than PIR over links with variable RTT.

[LH] Do not allow the rate exceeding the PIR is to regulate the traffic at ingress to reduce the burst of traffic. This will effectively reduce the buffer depth and the worst case latency.
After per flow shaping, we will mark the color to different rate of traffic: GREEN (rate<CIR); YELLOW(CIR<rate<PIR) and RED (rate>PIR). The GREEN traffic will always get scheduled at egress side after IP forwarding process; the YELLOW traffic will be scheduled if the egress buffer depth is less than some pre-configured value (i.e, 70%), and will be dropped if the buffer depth is exceeding the pre-configured value.
We can easily do nothing after rate exceeding CIR, but this will make PIR parameter meaningless. We use PIR and CIR instead of CIR only is because this is a traditional definition from QoS point of view. We have already simplified the definition, and removed some parameters like CB (committed burst) and ECB (Exceeding Committed burst), etc.
If we want to have the behavior of reno, user can easily set the PIR as a very high value or even link bandwidth, the traffic will get to the equilibrium point caused by congestion before reaching up its PIR. If the community think this is more reasonable solution (don’t not cap the rate to PIR at ingress), we can easily change the rule in hardware and CC algorithm accordingly.

4/ How is it ensured that OAM alarms will reach the TCP sender in time in all possible “random failure” cases?


  *   As far as I understand, the following statement “When a sender receives the third duplicated ACK, but no previous OAM congestion alarm has been received, then it is considered that a segment is lost due to random failure not congestion.  In this case the cwnd is not changed.” mandates that an OAM alarm is received prior to the third duplicate ACK *in all potential cases* of congestion. If the OAM alarm got lost or delayed, this condition would imply that cwnd is not changed despite a segment has been dropped due to congestion, which would be a violation of fundamental Internet congestion control principles.
  *   Please expand on how this document ensures that cwnd will be reduces in all potential cases when a packet gets dropped due to congestion, and what requirements on the OAM alarms propagation follow from that. Of specific interest are effects such as packet drops of packets relevant for the OAM information, reordering of packets, asymmetric routing in forward and reverse direction, use of multiple paths in parallel (ECMP), and the like. If the document makes assumption about the path such as in-order packet delivery or the like, these assumptions need to be spelt out explicitly.
  *   I understand that the OAM could be solved in different ways and the solution is independent of this document. But this document has to comprehensively specify all technical requirements that the OAM mechanism has to meet in order to ensure that every single packet drop due to congestion always results in a cwnd reduction. Otherwise the algorithm has to change as it does not safely prevent congestion.
 [YQ]: in case an OAM alarm for congestion is lost, currently cwnd is not changed. The chance of packet loss due to congestion might be increased, so an OAM alarm should be received soon. But I agree more design/considerations should be added here.

[LH] if the OAM alarm msg is lost, we loss the capability to distinguish the random failure and the congestion loss, we will treat the loss as congestion loss. This is not worse than current reno.

[ms] This is not what is written in the draft so the algorithm will need to change. As a side note, the IETF has tried many years to find a solution of how to distinguish congestion loss and random failures. If the algorithm wants to make that distinction, it has to explain how all congestion packet losses will always be detected.

[LH] will make it clear in the document.

5/ What is the expected performance benefit? Are there situations in which performance will be worse than standard TCP congestion control?


  *   The document does not contain any data of potential improvements or deteriorations as compared to the TCP standard congestion control. I assume that such data will be presented to explain why this mechanism is proposed, and what benefits it has.
  *   As I have experimented with similar mechanisms quite a bit, I believe there will be cases in which this congestion control mechanism will perform worse than a TCP sender fully compliant to RFC 5681, when using the same network path with CIR and PIR guarantees. I believe this document should analyze these cases and reason why a worse performance than standard TCP congestion control will be acceptable. IMHO this issue will specifically apply in cases when PIR is significantly larger than CIR, and if the RTT is large. As far as I can see, this draft mandates to start the data transfer in congestion avoidance at CIR rate, which means that it can take many RTTs until the sender reaches the PIR. In contrast, RFC 5681 will run slow-start, and RFC 5681 states that the “initial value of ssthresh SHOULD be set arbitrarily high”. This means that the TCP sender can reach PIR within few RTTs and thus can send with full PIR speed, while a sender using draft-han-tsvwg-cc-00 will send with a much lower speed CIR+epsilon. For short and medium-sized data transfers, IMHO it can happen that the congestion control according to RFC 5681 will significantly outperform the mechanism suggested in this document, i.e., it will complete data transfers orders of magnitude faster even without any knowledge about CIR and PIR. Have the authors compared this mechanism to the performance of RFC 5681?
  *   Also, have the authors compared the performance of this mechanism as compared to a modern TCP stack, which often use RFC 6928 (IW 10) and RFC 8312 (CUBIC)? In what cases has the suggested congestion control a better performance? I ask this because I have performed experiments 10 years ago with congestion control schemes that have some similarity to what is suggested here, and they also used knowledge about the path properties. In those experiments, it turned out to be quite difficult to design an algorithm that uses knowledge about the path (such as CIR/PIR) and that outperforms CUBIC in combination IW 10, even if such a stack is totally unaware of the path. This has been discussed e.g. in ICCRG https://www.ietf.org/proceedings/73/slides/iccrg-2.pdf. As context, “more-start” in this document is somewhat similar to what is proposed in this I-D (but applied to CUBIC), while the “initial-start” graphs somehow corresponds to what was later specified in RFC 6928 (IW 10) and RFC 8312 (CUBIC).
[YQ]: Totally agree. This is not a one to fit all solution, it has its usage limitations. Detail comparisons and suggestion on when to use this algorithm will be added later.

[LH] Cannot be, as I said for question one, the new CC is just a improvement over the current TCP-reno when used for “NEW TCP”. It makes the TCP-reno adapted to the new behavior of “NEW TCP”, the logic is very straightforward.

[ms] As I shown in my example, there will be cases in which this proposed congestion control is *not* an improvement over TCP Reno. There will be cases in which it transfers data significantly slower than TCP Reno, and this will affect interactive applications that need e.g. very fast response times. An application designer will observe that even TCP Reno can give better performance over the same network. For some cases I think one can even calculate explicitly how much slower this algorithm is as compared to TCP Reno, and how much worse application performance will be – the effect and its root cause is so simple that one does not even need any implementation for calculating how much worse this algorithm is than TCP Reno. I believe that application developers would be interested in understanding when (and why) this algorithm is slower than a standard-compliant TCP over the same network.

[LH]See the explanation for question 3.
I guess this is the major misunderstanding point from our document, we need to make document more clear in this regard.
The new CC is not to compete with reno in throughput for any case since it does not need to, it try to make reno more adaptive to the new traffic behavior enforced by the QoS. The QoS itself will provide completely different behavior with traditional TCP CC, see below (PIR=CIR; B1, B2 are new TCP set by in-band signaling and without changes of CC, n1 and n2 are reno).
It seems that In-band signaling method can have “better throughput” than traditional TCP, but is it surprised? No, Off course not, QoS will be better than best-effort! The better throughput is not caused by new CC, it is caused by QoS. Directly comparing new CC with reno for throughput does not make sense -☺ New CC means nothing without the per-flow QoS .
Picture below also shows the new TCP will always provide the expected throughput no matter how much bandwidth used by other traditional TCP. We will have more comparison test results (CC vs new CC) in the future.

[cid:image001.png@01D3BF0F.74A0B460][cid:image002.png@01D3BF0F.74A0B460][Rate(Kbps)][Rate(Kbps)][Time(s)][Time(s)]



6/ For what application traffic patterns is this mechanism proposed?


  *   The document states in Section 3 “… with the development of new applications, such as AR/VR”. What properties do such applications have to leverage the mechanism suggested in this document? Is it possible to characterize what the “new” requirements are, and how the suggested algorithm meets these requirements?
  *   Is it suggested to apply this congestion control to real-time media traffic over TCP? If so, what would be the benefit of using TCP in general and of the specific mechanism compared to congestion control algorithm for such traffic (e.g., the RMCAT working group)?
[YQ] same as above answer, the exact traffic patterns that can benefit from this algorithm will be studied in more detail and provided in later version of the draft.

[ms] I assume that “later version” means “next version”. It is pointless to further discuss a congestion control algorithm without details about workloads and the algorithm performance, resulting from simulations, actual measurement from implementations, etc. Such data can also be published at research conferences or in research journals prior to presenting it in the IRTF or IETF.

Michael


This list of questions is not comprehensive, but I’ll stop here.

Regarding potential next steps of this document in the TCPM working group, I believe that the applicable TCPM charter wording is: “In addition, TCPM may document alternative TCP congestion control algorithms that are known to be widely deployed, and that are considered safe for large-scale deployment in the Internet.” Until these prerequisites are fulfilled in the Internet, in my view the document cannot be adopted in TCPM. Research could be performed in the IRTF, e.g., in ICCRG.

Thanks

Michael (with no hat on)