Re: [Rdma-cc-interest] Side meeting plans at IETF-106

Lars Eggert <lars@eggert.org> Tue, 12 November 2019 11:35 UTC

Return-Path: <lars@eggert.org>
X-Original-To: rdma-cc-interest@ietfa.amsl.com
Delivered-To: rdma-cc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 31DA312086F for <rdma-cc-interest@ietfa.amsl.com>; Tue, 12 Nov 2019 03:35:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.587
X-Spam-Level:
X-Spam-Status: No, score=-0.587 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_RP_RNBL=1.31, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8kMyZUoojXE5 for <rdma-cc-interest@ietfa.amsl.com>; Tue, 12 Nov 2019 03:35:37 -0800 (PST)
Received: from emh02.mail.saunalahti.fi (emh02.mail.saunalahti.fi [62.142.5.108]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 77018120862 for <rdma-cc-interest@ietf.org>; Tue, 12 Nov 2019 03:35:37 -0800 (PST)
Received: from eggert.org (unknown [62.248.255.8]) by emh02.mail.saunalahti.fi (Postfix) with ESMTP id B886620135; Tue, 12 Nov 2019 13:35:35 +0200 (EET)
Received: from slate.eggert.org (Slate.eggert.org [172.19.235.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by eggert.org (Postfix) with ESMTPSA id 17B9C64224F; Tue, 12 Nov 2019 13:35:20 +0200 (EET)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\))
From: Lars Eggert <lars@eggert.org>
In-Reply-To: <CAAMqZPu6g56PotHQJcn6vvoex3=EPomCTgrmMm8jo3ozehG-WQ@mail.gmail.com>
Date: Tue, 12 Nov 2019 13:35:20 +0200
Cc: rdma-cc-interest@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <1605A4E1-7C7C-4BBD-BE35-960730A678D0@eggert.org>
References: <CAAMqZPu6g56PotHQJcn6vvoex3=EPomCTgrmMm8jo3ozehG-WQ@mail.gmail.com>
To: Paul Congdon <paul.congdon@tallac.com>
X-MailScanner-ID: 17B9C64224F.A3076
X-MailScanner: Found to be clean
X-MailScanner-From: lars@eggert.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/rdma-cc-interest/Hpbuzd2FfNxWwpeJdeeU9VveoV0>
Subject: Re: [Rdma-cc-interest] Side meeting plans at IETF-106
X-BeenThere: rdma-cc-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Congestion Control for Large Scale HPC/RDMA Data Centers <rdma-cc-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rdma-cc-interest/>
List-Post: <mailto:rdma-cc-interest@ietf.org>
List-Help: <mailto:rdma-cc-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Nov 2019 11:35:39 -0000

Hi,

On 2019-11-11, at 21:11, Paul Congdon <paul.congdon@tallac.com> wrote:
> Tuesday, November 19
> 8:30AM - 9:45AM
> Room: VIP A

I'll try and make the meeting, modulo NomCom duties. (Please all, send your feedback about candidates to NomCom now!)

Because I am not fully sure I can make this, some written feedback on the various agenda items:

> 1. How NICs can be designed for better CC in the HPC/RDMA/AI DCN
> 
> Discuss feedback on a draft under development on OpenCC: https://datatracker.ietf.org/doc/draft-zhh-tsvwg-open-architecture/; a framework for flexible establishment of congestion control algorithms implemented by NICs and the network.  The expectation is there will be some experiment results.  The goal is to discuss the ideas with stakeholders (customers, NIC vendors, switch vendors) and explore what could/should be standardized.

There seem to be three things here:

A. A modular NIC offload interface. Unclear if the IETF is the right home for this? Also, see "Restructuring Endpoint Congestion Control" (Narayan et al.) for another direction.

B. Mixing and matching different CCs in one network over time. Given that at datacenter latencies, you really want to prevent even small-scale hiccups due to interactions between different CCs, I wonder if it would be sufficient to slice the NIC and have different slices where all traffic is handled by one CC? Seems more tractable.

C. The architectural idea to move away from in-band CC signaling from the network to the endpoints. There isn't much in the document that motivates this, and nothing about potential issues (e.g., loss of fate-sharing).

A comment on this document that is really about this entire effort: we should just give up on RoCE. Mellanox has no interest in opening it, and I am therefore unwilling to spend cycles thinking about it.

> 2. How does the network participate in CC for HPC/RDMA/AI DCN?
> 
> There are a few items for discussion.
> 
> 2.1 AI ECN
> 
> Discuss feedback on https://datatracker.ietf.org/doc/draft-zhuang-tsvwg-ai-ecn-for-dcn/.  The idea is to use AI for adaptive configuration of the network - a hard problem.  How is necessary information collected from the devices to form models and what could/should be standardized here as well?

I don't see a proposal here, I don't even see a concrete problem statement? This is yet another "let's throw AI at it" three-pager.

> 2.2 Network Fast Feedback 
> 
> Discuss follow-on feedback on MailScanner has detected a possible fraud attempt from "tools.ietf.org" claiming to be https://tools.ietf..org/html/draft-even-iccrg-dc-fast-congestion-00 which is expected to be introduced in ICCRG on Monday.  The draft discusses the state-of-the-art congestion controllers in use and from research, and poses a number of questions for discussion. What is to be researched and what could/should be standardized going forward?

This is the beginnings of a survey. It misses a ton of related work esp. from academia though. HOMA, pFabric, HULL, D3, PDQ, pHost, NDP, etc., etc.

> 2.3 Mixing RDMA and TCP traffic
> 
> These two traffic types with their differing congestion controllers are known to not play well with one another in the same traffic class.  There may be some analysis data to share on this topic.  A goal would be to discuss network approaches for mitigating the impact of the two on each other.

When you say RDMA, you mean RoCE? Separate RoCE into a slice and move on. It's pointless to try and optimize for coexistence with a protocol that can change willy-nilly.

> 3. Metrics for HPC/RDMA/AI networks
> 
> Are the current metrics and scales appropriate for HPC/RDMA/AI networks?  HPC and Storage networks tend to use IOPS as a key measure and the latency requirements can be on the order of 10us; much different than Internet latency and throughput measures.  Should there be a draft on metric requirements for DCN networks?   Can we work with real customers to define some well-known scenarios and metrics for HPC/RDMA/AI DCNs.

Whose "current metrics and scales"? Papers on DC mechanisms certainly define appropriate metrics. Obviously Internet scales don't work, but who is using those?

Lars