Re: [Rdma-cc-interest] Congestion Control for Large Scale Data Center Networks

"Roni Even (A)" <roni.even@huawei.com> Wed, 25 September 2019 04:37 UTC

Return-Path: <roni.even@huawei.com>
X-Original-To: rdma-cc-interest@ietfa.amsl.com
Delivered-To: rdma-cc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2157E1200E7 for <rdma-cc-interest@ietfa.amsl.com>; Tue, 24 Sep 2019 21:37:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, LOTS_OF_MONEY=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qwdzbz2Uk_W8 for <rdma-cc-interest@ietfa.amsl.com>; Tue, 24 Sep 2019 21:36:59 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0C356120086 for <rdma-cc-interest@ietf.org>; Tue, 24 Sep 2019 21:36:59 -0700 (PDT)
Received: from LHREML714-CAH.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id 6A609A867547FE18ABC1 for <rdma-cc-interest@ietf.org>; Wed, 25 Sep 2019 05:36:56 +0100 (IST)
Received: from DGGEMM424-HUB.china.huawei.com (10.1.198.41) by LHREML714-CAH.china.huawei.com (10.201.108.37) with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 25 Sep 2019 05:36:55 +0100
Received: from DGGEMM506-MBX.china.huawei.com ([169.254.3.207]) by dggemm424-hub.china.huawei.com ([10.1.198.41]) with mapi id 14.03.0439.000; Wed, 25 Sep 2019 12:36:49 +0800
From: "Roni Even (A)" <roni.even@huawei.com>
To: "Black, David" <David.Black@dell.com>, "rdma-cc-interest@ietf.org" <rdma-cc-interest@ietf.org>
Thread-Topic: Congestion Control for Large Scale Data Center Networks
Thread-Index: AdVypdasUsSCz31jTEKThKlG/zI4pAANoukwAB8E4lA=
Date: Wed, 25 Sep 2019 04:36:48 +0000
Message-ID: <6E58094ECC8D8344914996DAD28F1CCD23D6B9F5@DGGEMM506-MBX.china.huawei.com>
References: <6E58094ECC8D8344914996DAD28F1CCD23D6B64E@DGGEMM506-MBX.china.huawei.com> <CE03DB3D7B45C245BCA0D243277949363070F8D1@MX307CL04.corp.emc.com>
In-Reply-To: <CE03DB3D7B45C245BCA0D243277949363070F8D1@MX307CL04.corp.emc.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.200.202.58]
Content-Type: multipart/alternative; boundary="_000_6E58094ECC8D8344914996DAD28F1CCD23D6B9F5DGGEMM506MBXchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/rdma-cc-interest/3vwMB3EFFBonLJNDLtXm8_uqj80>
Subject: Re: [Rdma-cc-interest] Congestion Control for Large Scale Data Center Networks
X-BeenThere: rdma-cc-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Congestion Control for Large Scale HPC/RDMA Data Centers <rdma-cc-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rdma-cc-interest/>
List-Post: <mailto:rdma-cc-interest@ietf.org>
List-Help: <mailto:rdma-cc-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Sep 2019 04:37:02 -0000

Hi David,
I agree that "fairness" is important, at least in DCs we can see ROCEv2 over UDP and TCP traffic. We are also not looking at new congestion control algorithm but more about which ones we can use in the environment and how to implement them. Looking also at HPCC  https://liyuliang001.github.io/publications/hpcc.pdf and RCP http://yuba.stanford.edu/~nanditad/thesis-NanditaD.pdf.  The DC network is not the general Internet, it is a managed network and may also be a single network domain.

As for the work in TSVWG , of course I looked at it and in my view dualq-coupled is orthogonal to the congestion information I mention here.
As for SCE , it does not describe what is the congestion behavior from the sender side and  I also looked at https://tools.ietf.org/id/draft-heist-tsvwg-sce-one-and-two-flow-tests-00.txt and it is clear that for high throughput SCE flows back off in term of competition due to the earlier notification comparing to non SCE flows that will wait for drop or CE marks.

BTW: this is part of the draft I intend to submit

Roni Even


From: Black, David [mailto:David.Black@dell.com]
Sent: Tuesday, September 24, 2019 4:50 PM
To: Roni Even (A); rdma-cc-interest@ietf.org
Subject: RE: Congestion Control for Large Scale Data Center Networks

Hi Roni,

> it will be good if the IETF will be able to recommend an e2e congestion protocol that will allow interoperability between vendors and that will leverage the information from the network .

... and I'd like $10 million in small unmarked bills, please :-).

Seriously, IMHO, the goal for the network ought to be robustness to the variety of current congestion control protocols, not all of which coexist well in the same switch queue.   In other words, I think the better goal to pursue is "to provide fairness between different applications that may use different congestion algorithms" although I would suggest avoiding the word "fairness" in this context as it's typically used to mean "TCP fairness" in this context, whereas something more general is wanted.   In contrast, pursuit of a single e2e congestion protocol that will be universally adopted appears unreasonable and unachievable at this juncture.

There is important work going on in the TSVWG WG that will improve that congestion control protocol coexistence and help achieve low latency (which is not possible in all cases for traffic mix reasons that I won't explain here for brevity).

I suggest reviewing the TSVWG drafts on L4S and SCE, as well as the presentation materials from the Montreal TSVWG meetings.  Here are some pointers:

  *   L4S drafts (@ https://datatracker.ietf.org/wg/tsvwg/documents/): draft-ietf-tsvwg-{l4s-arch,aqm-dualq-coupled,ecn-l4s-id}
  *   SCE drafts: Start with draft-morton-tsvwg-sce - the other two draft-morton-* drafts are somewhat related, but the sce draft is the core draft.
  *   TSVWG presentation material from Montreal: https://datatracker.ietf.org/meeting/105/session/tsvwg
We (TSVWG WG) would greatly appreciate some more data center and RDMA "eyes" on this material, as it is highly relevant to data centers.

Thanks, --David (TSVWG WG co-chair)

From: Rdma-cc-interest <rdma-cc-interest-bounces@ietf.org<mailto:rdma-cc-interest-bounces@ietf.org>> On Behalf Of Roni Even (A)
Sent: Tuesday, September 24, 2019 3:08 AM
To: rdma-cc-interest@ietf.org<mailto:rdma-cc-interest@ietf.org>
Subject: [Rdma-cc-interest] Congestion Control for Large Scale Data Center Networks


[EXTERNAL EMAIL]

Hi,
We had side meetings at the last two IETF meeting about better congestion control for data centers with good number of participants.

The high throughput of the Data center networks require a good congestion control for data centers should provide low latency, fast convergence and high link utilization.  Since multiple  applications with different requirements may run on the DC network it  is important to provide fairness between different applications that may use different congestion algorithms.  An important issue from the  user perspective is to achieve short Flow Completion Time (FCT).

It is clear that we are not going to make changes to ROCE but we still would like to look at good e2e congestion control. Currently there are multiple published work presented in different papers (examples are DCQCN, HPCC, RPC) but it will be good if the IETF will be able to recommend an e2e congestion protocol that will allow interoperability between vendors and that will leverage the information from the network .

We would like to have a side meeting in Singapore and intend to submit a draft that will discussed the different options hoping to be able to collaborate between the interested people to work on a common direction.

Regards
Roni Even