[Rdma-cc-interest] draft minutes of IETF-105 Side meeting on Large Scale Data Center HPC/RDMA Networks
Paul Congdon <paul.congdon@tallac.com> Mon, 12 August 2019 21:55 UTC
Return-Path: <paul.congdon@tallac.com>
X-Original-To: rdma-cc-interest@ietfa.amsl.com
Delivered-To: rdma-cc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 808CF120077 for <rdma-cc-interest@ietfa.amsl.com>; Mon, 12 Aug 2019 14:55:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.118
X-Spam-Level:
X-Spam-Status: No, score=-1.118 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=tallac-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Fzqrg15naNCr for <rdma-cc-interest@ietfa.amsl.com>; Mon, 12 Aug 2019 14:55:22 -0700 (PDT)
Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1E9DE1216BB for <rdma-cc-interest@ietf.org>; Mon, 12 Aug 2019 07:13:51 -0700 (PDT)
Received: by mail-ot1-x333.google.com with SMTP id m24so6243656otp.12 for <rdma-cc-interest@ietf.org>; Mon, 12 Aug 2019 07:13:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tallac-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=xDDS1drYmSB5oSf3P0CwHqRIWIGjjyXUX4cVHFyUnMc=; b=E284MbJyDom0sZjY7nAfvWowyxZbuB4twZF7NowFItGmqSurGx39Q+qzbz0FHLGZX8 UgAkxcc9nDK6HzI6xJc18tHZ9HShHlas+S/P4V4QYdI3TWb57hkdVBpvo5QAhUe2pfn4 Wr5HNp0jo/PJm20YVzbTJUCFdcIjly+zMA+jNaJHP4IVTGVQsemi5FEMYSpgxwaTab3J txnExgvMmPsVcjwjU7s5V0GWj0FGJXttd1B8Mr2SuKWJM8kVqQFZQca8rE3cPsup+2fW pkZuQn9hzJBEh8mOKrdghW8s3THrwdchHaatLZcSto7bBaAZvdT0sgdOFEwMqMddPnAN Pysw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=xDDS1drYmSB5oSf3P0CwHqRIWIGjjyXUX4cVHFyUnMc=; b=m5A+F0VOruKrOj/kldEv7PJSLVD/Ad+VgoMm+E80u8hd/4Nry8mTkbeF+vM9LyWTC+ 1lwib84u7yjX+HlwweVUkpF6mxYxdRwDUa7p5AQcrrMy0hcpDyjm6UZgSH7MPuxDbNJf Hq/MidCpcT4BLociKqx3KGr1zVcAoeg7hoph4y+b/KUjD1LprNmfHJC7sYh5jdR+FF8g DqXjZMfr21gQxyPhWVNlp5bbclOdjLVpXPTsJS7/oeBv+TtQXTzjvnokKgQU0bKyt3JU M1ZB40IDBc9ho2tgt4WhuHXjv6W2R997XJrld+iPow6nfHSpzGZiXD3uGDZN+MIFhVPi bEnA==
X-Gm-Message-State: APjAAAV/ZRu2GkAGuodqlCRnX0gnWfUUGhZjpnSMfk9F4gTYSYDTOgFw w/qFZ2rYoG6kyj4zKr4PWmx1CMB9QjBW9WGuAKaIWeWSY6N0oUlQ
X-Google-Smtp-Source: APXvYqxpj7qERxlEFpE5jL0oQg+tp+aG5sFTcZKcVyH7VqiM26VCIJtxlP95vsMCwRjXCZUdY1/JarsuaeNffqPo5M8=
X-Received: by 2002:a05:6830:1159:: with SMTP id x25mr2536336otq.237.1565619229288; Mon, 12 Aug 2019 07:13:49 -0700 (PDT)
MIME-Version: 1.0
From: Paul Congdon <paul.congdon@tallac.com>
Date: Mon, 12 Aug 2019 07:13:38 -0700
Message-ID: <CAAMqZPu-JSoK10dU+LfXHqh_CBzaKroA9k_PpQUauaTjLCWj8g@mail.gmail.com>
To: rdma-cc-interest@ietf.org
Content-Type: multipart/alternative; boundary="00000000000003f582058fec22fc"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rdma-cc-interest/dlUj-mwkZxsrEMzp2XWyOHrJDOM>
Subject: [Rdma-cc-interest] draft minutes of IETF-105 Side meeting on Large Scale Data Center HPC/RDMA Networks
X-BeenThere: rdma-cc-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Congestion Control for Large Scale HPC/RDMA Data Centers <rdma-cc-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rdma-cc-interest/>
List-Post: <mailto:rdma-cc-interest@ietf.org>
List-Help: <mailto:rdma-cc-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Aug 2019 21:55:26 -0000
I've uploaded the initial minutes of the side meeting to the following public location: https://mentor.ieee.org/802.1/dcn/19/1-19-0062-01-ICne-ietf0105-sidemeeting-minutes.pdf I've also put a copy here in text form for quick review. Please send me any updates or corrections you see. Sorry for the duplication for some, as I have also copied the non-wg email list. Please sign-up for the list if you have not already done so. https://www.ietf.org/mailman/listinfo/rdma-cc-interest IETF-105 Side Meeting: Large Scale Data Center HPC/RDMA Monday, July 22, 2019 8:30AM – 9:45AM Attendees (who signed the blue sheet or where recognized): *First* *Last* *Affiliation* Hirochika Asai Preferred Networks / WIDE Project David Black Dell Randy Bush Arrcus Xavier de Foy InterDigital Jesús Escudero Universidad de Castilla-La Mancha Roni Even Huawei Randy Haagens Microsoft Jianfei He Huawei Russ Housley Vigil Security, LLC Rachel Huang Huawei Georgios Karagiannis Huawei Technologies Dusseldorf GmbH Younghan Kim SSU Kee-Cheon Kim Wangbong Lee ETRI Aini Li Huawei Peng Liu China Mobile Sonum Mathur Viasat David Melman Marvell Jan Metzke BSI Tal Mizrahi Huawei Yoshifumi Nishda GE Global Research / Keio Research Institute Keyur Patel Arrcus Fengwei Qin China Mobile Richard Scheffenegger NetApp Marcus Sun Huawei KJ Sun Sowmini Varadhan Microsoft Stephan Wenger Tencent America Hua Ru Yang Huawei Hyunsik Yang IISTRC Xiang Yu Huawei Shuai Zhao Tencent Yan Zhuang Huawei Ning Zong Huawei Details: 1. The meeting organizer, Paul Congdon, presented the IETF Note Well reminder and bashed the agenda. No changes to the agenda, but it was suggested to include additional technical approaches being pursued in the IETF such as LSVR. 2. The slide material presented at the meeting is available at: https://mentor.ieee.org/802.1/dcn/19/1-19-0061-01-ICne-ietf-sidemeeting.pdf 3. The slides from IETF-105 HotRFC that announced this side meeting are available at: https://datatracker.ietf.org/meeting/105/materials/slides-105-hotrfc-7-strategies-to-drastically-improve-congestion-control-in-high-performance-data-centers-next-steps-for-rdma-00 4. Jesús Escudero presented strategies to drastically improve congestion control in high performance DC and the next steps for RDMA. The slides are included in the above link. It was suggested to have an interactive discussion about the proposed solutions in the final slide – consider their feasibility and applicability to scaling RDMA and HPC networks. 5. David Black provided some background on RDMA protocols running over IP. He clarified that in practice, RDMA networks often use PFC, but in principle, it is not required for all transports. David provided the history of iWARP and RoCEv2 and the industry momentum behind RoCEv2. He indicated that iWARP is, perhaps, superior, especially when it comes to congestion control, but the industry has adopted RoCE. RoCEv2 is not an IETF protocol. 6. Randy Bush asked if we are aware of NDP, the best paper in Sigcomm 2017. Randy pointed out that IETF has a long history of doing such work as well as the research community and it would be nice to see work in this area. It was asked if there was any sensitivity to addressing such transports in the IETF 7. David Black asked if there were any NIC vendors in the audience. There was one in the crowd and one on the phone. The data rates in the data center require hardware offload support. 8. In the slides, it was suggested that perhaps a new UDP based transport for data centers would be interesting to consider. One of the key requirements would be that it would need to be hardware offload-able. The slides allude to a current trend to run more applications over UDP for low latency and high efficiency. 9. Keyur Patel points out that congestion occurs in switches and that there are already many switch vendors implementing proprietary extensions to address this. He would like to see a problem definition because people are currently building solutions but wonders what could be done beyond what the switch vendors are currently doing. Paul Congdon points out that we are a standards community and defining interoperable solutions is whole the point. Proprietary vendor solutions are not standards based nor necessarily interoperable between vendors. 10. The topic of being able to identify the type of congestion (e.g. incast verses in-network) could be valuable. The congestion mitigation approach might vary based upon where we are in the topology and what type of congestion the switch is experiencing. Knowing your position in the topology can be configured, but more configuration is not desirable and could be error prone. An automated protocol to identify the location of a switch in the overall topology could be valuable. 11. One of the suggested improvements is to provide more congestion information within the headers. Jeff pointed out the use of overlay networks and their rich and extensible header can help. This is being used more and more now. 12. Roni Even points out that RoCE is driven by Mellanox. They were invited to attend this side meeting but did not attend. Roni felt the lack of attendance and participation is political and that Mellanox prefers to contribute to IBTA, which is a closed working group where they have control. 13. Yan Zhuang presented the material for two drafts: https://tools.ietf.org/html/draft-zhh-tsvwg-open-architecture-00 and https://tools.ietf.org/html/draft-yueven-tsvwg-dccm-requirements-00.html. The slides are included in the complete side meeting slide deck referenced above. 14. David Black asked if any switch in the network can directly communicate with a NIC? The answer is yes, in the proposed architecture, the switch will send messages back to the source. The increasing use of overlays creates a challenge here. RDMA doesn’t currently do much with overlays, however, that may change. The headers the switch sees may not be the headers of the source/desk of the RDMA traffic. The messages sent back from the switch to the NIC need to have enough data for the source NIC to be able to unwrap and decode the overlay. 15. There was a question about the drafts and what is next for them. David Black, as TSVWG chair, expressed that the drafts are being presented in the side meeting to better understand their content and interest. Roni Even is asking where the appropriate place is to progress these drafts; iccrg or tsvwg. 16. Paul Congdon asked how the transport agnostic congestion signaling works in conjunction with a TCP based transport that also has congestion signals? Since the switch will be signaling directly to the NIC, it will be the responsibility of the source to combine and mix the signals appropriately. 17. There are some simulation results that show the effectiveness of the proposals in the drafts. David Black suggests the results should be shared with ICCRG because it has congestion control expertise and TSV looks to ICCRG for this guidance. 18. Paul Congdon closed the meeting with an observation that we have more people attending this side meeting, in part due to the favorable scheduling at the IETF. There is a request to create an IETF mailing list for this work. POST MEETING NOTE: the email list is created and is called rdma-cc-interest@ietf.org. The listserv sign-up and info is available at: https://www.ietf.org/mailman/listinfo/rdma-cc-interest 19. Jeff pointed out that NVME over RDMA solutions are coming and the NIC will no longer be the bottleneck, putting more pressure and congestion on the network. David Black, as author of NVME over Fabrics using TCP, points out that NVME over TCP (without RDMA) is new and generating a lot of interest. 20. Paul Congdon asked if there are other applications, besides RDMA, in the data center that might benefit from low-latency, high-throughput. Perhaps other control traffic (e.g. simple server-to-server REST APIs). Roni Even points out that we should look for other applications, but it will all depend on the tradeoffs and requirements. 21. The side meeting was adjourned approximately at 9:40AM
- [Rdma-cc-interest] draft minutes of IETF-105 Side… Paul Congdon