[iccrg] Data Center Congestion Control side meeting plans

Paul Congdon <paul.congdon@tallac.com> Mon, 11 November 2019 19:15 UTC

Return-Path: <paul.congdon@tallac.com>
X-Original-To: iccrg@ietfa.amsl.com
Delivered-To: iccrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 461F9120913 for <iccrg@ietfa.amsl.com>; Mon, 11 Nov 2019 11:15:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.119
X-Spam-Level:
X-Spam-Status: No, score=-1.119 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=tallac-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id orS876xaBZBI for <iccrg@ietfa.amsl.com>; Mon, 11 Nov 2019 11:15:24 -0800 (PST)
Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF60E120903 for <iccrg@irtf.org>; Mon, 11 Nov 2019 11:15:24 -0800 (PST)
Received: by mail-oi1-x230.google.com with SMTP id m193so12502102oig.0 for <iccrg@irtf.org>; Mon, 11 Nov 2019 11:15:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tallac-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=CQoPgWRphMEOwmAh0rR5/gBZspjpuSkqQM0cOcGc1UE=; b=Co6EpO3w/VAd9pEkrjxn4FqOl/PegVYgjXZKI+dF5vzjOd8I38C4Nz9LrwyIVemqnJ wyb21gAXbJff8HEjcl1HrDOBgIvLrXzKN8Cu5wsGNV/sXaqKfd7XygCl15pR0l0HbjS9 +4M28fxnjXE+zWjGeehnQI3cfWUPei0JBk8A4AUeRMasiz8ikHPZR2WmZ0NuzONK12Pa CQ1pRw7B/ThHZt3xFwvfYop3WI3pA9PDHT+qQampYR6teEUE5WUlzSZQIAT8zoNw36li 1dodYIjnlHeCvRur40unXdgmHXCL/DACfTcBcRLWC9FFwHj4P9sjEodLLflrR1MiUaEK 7YCA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=CQoPgWRphMEOwmAh0rR5/gBZspjpuSkqQM0cOcGc1UE=; b=Nm7sGYDXz/H0UwNvhD2eBQ2STY7yY/BKXL/rkCgUl8MbfWIBRPfmsIgOAONDomQNST 5jSJCGXm5ofyqW0UGXvT/xM+ORbV11ZnHLjByDo68NpRMxKdb+SK37iM9+RL3hU2PCFH JDMnZjL3VUUMt5KwkANnXxZEdcsixkKW+H7Fu5zoUzQO+S4mQTWZiXr7f9pvpXWtpEL/ Xr7qlfNqpsA/b9xERilYr0RFPHhyKqbJikse40RbAMGhPcySQqlP2QRKPT+LzHAIVBz6 LExix+s5NjGitR7dVe71NV355TE6QZQbzNqDy1ROkV6i9PR7E9tlTmkvFVhrfZQUDN0X 081A==
X-Gm-Message-State: APjAAAWNryZ8/d6tDh3ztkhXPz10niAxHvdGMViOhotA1B0KlyrazSTb iOQDYPt6LrK/0M9we7W9pp5KtPjy8rN2JlxaY2A2c/zuCRs=
X-Google-Smtp-Source: APXvYqzg7r83K6ZRAc1Tn1DrtHSLhCBKgz/Dujpmjm6MY9JoAsRai4vSHCQ4OcJd6FBXBqUNZmkANz5+C5SM/XQTjZM=
X-Received: by 2002:aca:cf12:: with SMTP id f18mr469115oig.48.1573499723554; Mon, 11 Nov 2019 11:15:23 -0800 (PST)
MIME-Version: 1.0
From: Paul Congdon <paul.congdon@tallac.com>
Date: Mon, 11 Nov 2019 11:15:13 -0800
Message-ID: <CAAMqZPuxEx=KuEh8F0-5WHZOjhSdXO9f7==5mC4Tw00kzYkKAg@mail.gmail.com>
To: iccrg@irtf.org
Content-Type: multipart/alternative; boundary="00000000000013ac8f059716f40e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/iccrg/gqQQ25rTbZuA-ekVSx3OX7YhPcE>
Subject: [iccrg] Data Center Congestion Control side meeting plans
X-BeenThere: iccrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of Internet Congestion Control Research Group \(ICCRG\)" <iccrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/iccrg>, <mailto:iccrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/iccrg/>
List-Post: <mailto:iccrg@irtf.org>
List-Help: <mailto:iccrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/iccrg>, <mailto:iccrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Nov 2019 19:15:26 -0000

Hello ICCRG:


I believe many of you will have an interest in this.  I have reserved a
side meeting room at IETF-106 to continue discussions on congestion control
in modern data center networks.  This message is to solicit feedback on the
suggested topics and agenda.


Tuesday, November 19

8:30AM - 9:45AM

Room: VIP A


There has been some discussion about where to take this group and what
activities should spin out from it; should we expand the current focus in
ICCRG or are these topics appropriate for a new research group in IRTF.  At
IETF-106 there are a number of topics that people would like to discuss.
In particular, where can the IETF/IRTF discuss how NICs can be
designed for CC in HPC/RDMA/AI DCN and also how can the network participate
to improve performance.


I'm working on a more detailed agenda for the side meeting, but here are
some current thoughts and feedback is welcomed.



1. How NICs can be designed for better CC in the HPC/RDMA/AI DCN


Discuss feedback on a draft under development on OpenCC:
https://datatracker.ietf.org/doc/draft-zhh-tsvwg-open-architecture/; a
framework for flexible establishment of congestion control algorithms
implemented by NICs and the network.  The expectation is there will be some
experiment results.  The goal is to discuss the ideas with stakeholders
(customers, NIC vendors, switch vendors) and explore what could/should
be standardized.



2. How does the network participate in CC for HPC/RDMA/AI DCN?


There are a few items for discussion.


2.1 AI ECN


Discuss feedback on
https://datatracker.ietf.org/doc/draft-zhuang-tsvwg-ai-ecn-for-dcn/.  The
idea is to use AI for adaptive configuration of the network - a hard
problem.  How is necessary information collected from the devices to form
models and what could/should be standardized here as well?


2.2 Network Fast Feedback


Discuss follow-on feedback on
https://tools.ietf.org/html/draft-even-iccrg-dc-fast-congestion-00 which is
expected to be introduced in ICCRG on Monday.  The draft discusses the
state-of-the-art congestion controllers in use and from research, and poses
a number of questions for discussion. What is to be researched and what
could/should be standardized going forward?


2.3 Mixing RDMA and TCP traffic


These two traffic types with their differing congestion controllers are
known to not play well with one another in the same traffic class.  There
may be some analysis data to share on this topic.  A goal would be to
discuss network approaches for mitigating the impact of the two on each
other.



3. Metrics for HPC/RDMA/AI networks


Are the current metrics and scales appropriate for HPC/RDMA/AI networks?
HPC and Storage networks tend to use IOPS as a key measure and the latency
requirements can be on the order of 10us; much different than Internet
latency and throughput measures.  Should there be a draft on metric
requirements for DCN networks?   Can we work with real customers to define
some well-known scenarios and metrics for HPC/RDMA/AI DCNs.