Re: [Rdma-cc-interest] Congestion Control for Large Scale Data Center Networks

"Black, David" <David.Black@dell.com> Thu, 26 September 2019 20:02 UTC

Return-Path: <David.Black@dell.com>
X-Original-To: rdma-cc-interest@ietfa.amsl.com
Delivered-To: rdma-cc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9BA57120ADB for <rdma-cc-interest@ietfa.amsl.com>; Thu, 26 Sep 2019 13:02:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, LOTS_OF_MONEY=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=dell.com header.b=CIjPBuFI; dkim=fail (1024-bit key) reason="fail (message has been altered)" header.d=emc.com header.b=hG4Z2DR2
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3zno5ZIGXR19 for <rdma-cc-interest@ietfa.amsl.com>; Thu, 26 Sep 2019 13:02:17 -0700 (PDT)
Received: from mx0a-00154904.pphosted.com (mx0a-00154904.pphosted.com [148.163.133.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2CD41120891 for <rdma-cc-interest@ietf.org>; Thu, 26 Sep 2019 13:02:17 -0700 (PDT)
Received: from pps.filterd (m0170390.ppops.net [127.0.0.1]) by mx0a-00154904.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x8QJeCxW016674 for <rdma-cc-interest@ietf.org>; Thu, 26 Sep 2019 16:02:16 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : mime-version; s=smtpout1; bh=AgPRkcWKa62gW9Sl3qaVVNEb4tpu1nSvhavG93oeYqU=; b=CIjPBuFIu8aIhhDbn67Wy/c6KEXllPHAOrgWz67kQpp32S8FrT4Mv93ozzNtO1T8YRBV fdSoSn8i6Ila+BQzEYwanRDn/P6ncgPUCSfnkMKb64Fw7KdcaAypaQ5YOo2/3k3iZB9O XguHKSl5uGTCr+0TZlFqewpsXaOFv1U1X1SHW09HRe8RUK+cQ90eyvZBxEDxH/rNGnwB Gk1IY3M0nR22My7Qc2WOLV0cw7o+Slq4At+VBZm6nDLhcb62Br64iNESNA9A3y5hDz9V 2hvHIL06qisr3GNx0rCj1qIpou+8//BWYmFAG4kuQ7gtD+bYce++tKzhsCUWCcnEFQdn Pg==
Received: from mx0a-00154901.pphosted.com (mx0b-00154901.pphosted.com [67.231.157.37]) by mx0a-00154904.pphosted.com with ESMTP id 2v5ex4tycg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for <rdma-cc-interest@ietf.org>; Thu, 26 Sep 2019 16:02:16 -0400
Received: from pps.filterd (m0089484.ppops.net [127.0.0.1]) by mx0b-00154901.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x8QJhJMd112865 for <rdma-cc-interest@ietf.org>; Thu, 26 Sep 2019 16:02:15 -0400
Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-00154901.pphosted.com with ESMTP id 2v8e3y0fhj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for <rdma-cc-interest@ietf.org>; Thu, 26 Sep 2019 16:02:14 -0400
Received: from m0089484.ppops.net (m0089484.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id x8QK10xh002341 for <rdma-cc-interest@ietf.org>; Thu, 26 Sep 2019 16:02:14 -0400
Received: from mailuogwhop.emc.com (mailuogwhop-nat.lss.emc.com [168.159.213.141] (may be forged)) by mx0b-00154901.pphosted.com with ESMTP id 2v8e3y0fh5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 26 Sep 2019 16:02:14 -0400
Received: from maildlpprd06.lss.emc.com (maildlpprd06.lss.emc.com [10.253.24.38]) by mailuogwprd04.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id x8QK2Boh029378 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 26 Sep 2019 16:02:13 -0400
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd04.lss.emc.com x8QK2Boh029378
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1569528133; bh=pqQ9i/hg+8MFO8By3+WcWSn4x1c=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=hG4Z2DR23nJAKRvf3sTBsOR3oMc7l5fsTxZz07xgNVYF07/6/di0hSgolXFKk0IUN VJRI0YuHTTS1GIb4+41eMAF88plKeIp5cThzTewesphMY5txaC3eEh5NKie+D8KMsJ d90rXWtWKWU7ppbDZbbkh8IeA1e55d5HZ9fijmoU=
Received: from mailusrhubprd51.lss.emc.com (mailusrhubprd51.lss.emc.com [10.106.48.24]) by maildlpprd06.lss.emc.com (RSA Interceptor); Thu, 26 Sep 2019 16:01:36 -0400
Received: from MXHUB313.corp.emc.com (MXHUB313.corp.emc.com [10.146.3.91]) by mailusrhubprd51.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id x8QK1aLl020843 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=FAIL); Thu, 26 Sep 2019 16:01:37 -0400
Received: from MX307CL04.corp.emc.com ([fe80::849f:5da2:11b:4385]) by MXHUB313.corp.emc.com ([10.146.3.91]) with mapi id 14.03.0439.000; Thu, 26 Sep 2019 16:01:35 -0400
From: "Black, David" <David.Black@dell.com>
To: "Roni Even (A)" <roni.even@huawei.com>, "rdma-cc-interest@ietf.org" <rdma-cc-interest@ietf.org>
CC: "Black, David" <David.Black@dell.com>
Thread-Topic: Congestion Control for Large Scale Data Center Networks
Thread-Index: AdVypdasUsSCz31jTEKThKlG/zI4pAANoukwAB8E4lAAUyvlMA==
Date: Thu, 26 Sep 2019 20:01:35 +0000
Message-ID: <CE03DB3D7B45C245BCA0D2432779493630714FFA@MX307CL04.corp.emc.com>
References: <6E58094ECC8D8344914996DAD28F1CCD23D6B64E@DGGEMM506-MBX.china.huawei.com> <CE03DB3D7B45C245BCA0D243277949363070F8D1@MX307CL04.corp.emc.com> <6E58094ECC8D8344914996DAD28F1CCD23D6B9F5@DGGEMM506-MBX.china.huawei.com>
In-Reply-To: <6E58094ECC8D8344914996DAD28F1CCD23D6B9F5@DGGEMM506-MBX.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Enabled=True; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_SiteId=945c199a-83a2-4e80-9f8c-5a91be5752dd; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Owner=david.black@emc.com; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_SetDate=2019-09-26T20:01:31.3938529Z; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Name=External Public; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Application=Microsoft Azure Information Protection; MSIP_Label_17cb76b2-10b8-4fe1-93d4-2202842406cd_Extended_MSFT_Method=Manual; aiplabel=External Public
x-originating-ip: [10.105.8.135]
Content-Type: multipart/alternative; boundary="_000_CE03DB3D7B45C245BCA0D2432779493630714FFAMX307CL04corpem_"
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd51.lss.emc.com
X-RSA-Classifications: public
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-09-26_08:2019-09-25,2019-09-26 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 malwarescore=0 spamscore=0 impostorscore=0 mlxlogscore=999 mlxscore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 suspectscore=0 clxscore=1015 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1909260155
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 malwarescore=0 spamscore=0 mlxlogscore=999 impostorscore=0 bulkscore=0 suspectscore=0 clxscore=1015 priorityscore=1501 lowpriorityscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1908290000 definitions=main-1909260155
Archived-At: <https://mailarchive.ietf.org/arch/msg/rdma-cc-interest/kvLo7EsuH3rguWp-EiObsUbJGXI>
Subject: Re: [Rdma-cc-interest] Congestion Control for Large Scale Data Center Networks
X-BeenThere: rdma-cc-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Congestion Control for Large Scale HPC/RDMA Data Centers <rdma-cc-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rdma-cc-interest/>
List-Post: <mailto:rdma-cc-interest@ietf.org>
List-Help: <mailto:rdma-cc-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Sep 2019 20:02:21 -0000

I look forward to seeing the draft and will comment further then ...

Thanks, --David

From: Roni Even (A) <roni.even@huawei.com>
Sent: Wednesday, September 25, 2019 12:37 AM
To: Black, David; rdma-cc-interest@ietf.org
Subject: RE: Congestion Control for Large Scale Data Center Networks


[EXTERNAL EMAIL]
Hi David,
I agree that "fairness" is important, at least in DCs we can see ROCEv2 over UDP and TCP traffic. We are also not looking at new congestion control algorithm but more about which ones we can use in the environment and how to implement them. Looking also at HPCC  https://liyuliang001.github.io/publications/hpcc.pdf and RCP http://yuba.stanford.edu/~nanditad/thesis-NanditaD.pdf.  The DC network is not the general Internet, it is a managed network and may also be a single network domain.

As for the work in TSVWG , of course I looked at it and in my view dualq-coupled is orthogonal to the congestion information I mention here.
As for SCE , it does not describe what is the congestion behavior from the sender side and  I also looked at https://tools.ietf.org/id/draft-heist-tsvwg-sce-one-and-two-flow-tests-00.txt and it is clear that for high throughput SCE flows back off in term of competition due to the earlier notification comparing to non SCE flows that will wait for drop or CE marks.

BTW: this is part of the draft I intend to submit

Roni Even


From: Black, David [mailto:David.Black@dell.com]
Sent: Tuesday, September 24, 2019 4:50 PM
To: Roni Even (A); rdma-cc-interest@ietf.org<mailto:rdma-cc-interest@ietf.org>
Subject: RE: Congestion Control for Large Scale Data Center Networks

Hi Roni,

> it will be good if the IETF will be able to recommend an e2e congestion protocol that will allow interoperability between vendors and that will leverage the information from the network .

... and I'd like $10 million in small unmarked bills, please :-).

Seriously, IMHO, the goal for the network ought to be robustness to the variety of current congestion control protocols, not all of which coexist well in the same switch queue.   In other words, I think the better goal to pursue is "to provide fairness between different applications that may use different congestion algorithms" although I would suggest avoiding the word "fairness" in this context as it's typically used to mean "TCP fairness" in this context, whereas something more general is wanted.   In contrast, pursuit of a single e2e congestion protocol that will be universally adopted appears unreasonable and unachievable at this juncture.

There is important work going on in the TSVWG WG that will improve that congestion control protocol coexistence and help achieve low latency (which is not possible in all cases for traffic mix reasons that I won't explain here for brevity).

I suggest reviewing the TSVWG drafts on L4S and SCE, as well as the presentation materials from the Montreal TSVWG meetings.  Here are some pointers:

  *   L4S drafts (@ https://datatracker.ietf.org/wg/tsvwg/documents/): draft-ietf-tsvwg-{l4s-arch,aqm-dualq-coupled,ecn-l4s-id}
  *   SCE drafts: Start with draft-morton-tsvwg-sce - the other two draft-morton-* drafts are somewhat related, but the sce draft is the core draft.
  *   TSVWG presentation material from Montreal: https://datatracker.ietf.org/meeting/105/session/tsvwg
We (TSVWG WG) would greatly appreciate some more data center and RDMA "eyes" on this material, as it is highly relevant to data centers.

Thanks, --David (TSVWG WG co-chair)

From: Rdma-cc-interest <rdma-cc-interest-bounces@ietf.org<mailto:rdma-cc-interest-bounces@ietf.org>> On Behalf Of Roni Even (A)
Sent: Tuesday, September 24, 2019 3:08 AM
To: rdma-cc-interest@ietf.org<mailto:rdma-cc-interest@ietf.org>
Subject: [Rdma-cc-interest] Congestion Control for Large Scale Data Center Networks


[EXTERNAL EMAIL]

Hi,
We had side meetings at the last two IETF meeting about better congestion control for data centers with good number of participants.

The high throughput of the Data center networks require a good congestion control for data centers should provide low latency, fast convergence and high link utilization.  Since multiple  applications with different requirements may run on the DC network it  is important to provide fairness between different applications that may use different congestion algorithms.  An important issue from the  user perspective is to achieve short Flow Completion Time (FCT).

It is clear that we are not going to make changes to ROCE but we still would like to look at good e2e congestion control. Currently there are multiple published work presented in different papers (examples are DCQCN, HPCC, RPC) but it will be good if the IETF will be able to recommend an e2e congestion protocol that will allow interoperability between vendors and that will leverage the information from the network .

We would like to have a side meeting in Singapore and intend to submit a draft that will discussed the different options hoping to be able to collaborate between the interested people to work on a common direction.

Regards
Roni Even