[iccrg] Data Center Congestion Control side meeting plans

Paul Congdon <paul.congdon@tallac.com> Mon, 11 November 2019 19:15 UTC

From: Paul Congdon <paul.congdon@tallac.com>
Date: Mon, 11 Nov 2019 11:15:13 -0800
To: iccrg@irtf.org
Hello ICCRG:

I believe many of you will have an interest in this.  I have reserved a
side meeting room at IETF-106 to continue discussions on congestion control
in modern data center networks.  This message is to solicit feedback on the
suggested topics and agenda.

Tuesday, November 19

8:30AM - 9:45AM

Room: VIP A

There has been some discussion about where to take this group and what
activities should spin out from it; should we expand the current focus in
ICCRG or are these topics appropriate for a new research group in IRTF.  At
IETF-106 there are a number of topics that people would like to discuss.
In particular, where can the IETF/IRTF discuss how NICs can be
designed for CC in HPC/RDMA/AI DCN and also how can the network participate
to improve performance.

I'm working on a more detailed agenda for the side meeting, but here are
some current thoughts and feedback is welcomed.

1. How NICs can be designed for better CC in the HPC/RDMA/AI DCN

Discuss feedback on a draft under development on OpenCC:
https://datatracker.ietf.org/doc/draft-zhh-tsvwg-open-architecture/; a
framework for flexible establishment of congestion control algorithms
implemented by NICs and the network.  The expectation is there will be some
experiment results.  The goal is to discuss the ideas with stakeholders
(customers, NIC vendors, switch vendors) and explore what could/should
be standardized.

2. How does the network participate in CC for HPC/RDMA/AI DCN?

There are a few items for discussion.

2.1 AI ECN

Discuss feedback on
https://datatracker.ietf.org/doc/draft-zhuang-tsvwg-ai-ecn-for-dcn/.  The
idea is to use AI for adaptive configuration of the network - a hard
problem.  How is necessary information collected from the devices to form
models and what could/should be standardized here as well?

2.2 Network Fast Feedback

Discuss follow-on feedback on
https://tools.ietf.org/html/draft-even-iccrg-dc-fast-congestion-00 which is
expected to be introduced in ICCRG on Monday.  The draft discusses the
state-of-the-art congestion controllers in use and from research, and poses
a number of questions for discussion. What is to be researched and what
could/should be standardized going forward?

2.3 Mixing RDMA and TCP traffic

These two traffic types with their differing congestion controllers are
known to not play well with one another in the same traffic class.  There
may be some analysis data to share on this topic.  A goal would be to
discuss network approaches for mitigating the impact of the two on each

3. Metrics for HPC/RDMA/AI networks

Are the current metrics and scales appropriate for HPC/RDMA/AI networks?
HPC and Storage networks tend to use IOPS as a key measure and the latency
requirements can be on the order of 10us; much different than Internet
latency and throughput measures.  Should there be a draft on metric
requirements for DCN networks?   Can we work with real customers to define
some well-known scenarios and metrics for HPC/RDMA/AI DCNs.