[Rdma-cc-interest] Data Center Congestion Control

Paul Congdon <paul.congdon@tallac.com> Mon, 03 August 2020 19:57 UTC

Return-Path: <paul.congdon@tallac.com>
X-Original-To: rdma-cc-interest@ietfa.amsl.com
Delivered-To: rdma-cc-interest@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 86CFD3A10EE for <rdma-cc-interest@ietfa.amsl.com>; Mon, 3 Aug 2020 12:57:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.655
X-Spam-Level:
X-Spam-Status: No, score=0.655 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=tallac-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iWyvIBJImJRK for <rdma-cc-interest@ietfa.amsl.com>; Mon, 3 Aug 2020 12:57:51 -0700 (PDT)
Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 923013A10E3 for <rdma-cc-interest@ietf.org>; Mon, 3 Aug 2020 12:57:22 -0700 (PDT)
Received: by mail-ot1-x334.google.com with SMTP id r21so18311163ota.10 for <rdma-cc-interest@ietf.org>; Mon, 03 Aug 2020 12:57:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tallac-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=/FwFe6+7W/t9W9vA7KIcfgLFyB9g1sr3NfZeryxkpYo=; b=1jkyZO4hVqJ3U7UVu33w0c3hxHnEJr5KLjN/U5KjzLTZ528L6rC5XGZFoH+w/8RMTt s21HhIbUnCFcbr/6qGDqsSrx4M1ciz31gdGw8i7Z9siHFWXbNCWRpDOhkk4ah4R+d8BK YpKNZn4ZgEnqFPz5I7GgTnhy02usDvYEvqn7PeTz9O0INUcattxeaNra45Q426jNc0zw +f9e3yO/mwEMuuzeplXUhs2C0XEg5jFDzpsl5iGAY4Q2Y1kSE+G2f18bMLk1iB012s1w loMGBGDzGDmVgOdBXN0ZSgXGZ8qGFzhxqjW+YY7pmHxOlJzdtW9feCOgJ2N1MZXZszxN k/yQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=/FwFe6+7W/t9W9vA7KIcfgLFyB9g1sr3NfZeryxkpYo=; b=KKzaBqbftdoA2RKvQZm5iDr3jGKIAgG0HPhiF6LS76DZMfqmQHeQnxZIdoSd1JXmch HjiQ57bUHlgNLCgJCY5tm15NI4Td+PTtf75ZKMWq+d1hZlG9lUQAfl+S81KUQlYWT0/D 7GOQGkjmk8XMr6mX3ErsJ5gW64f1QX9yKfT3PstAk8JlyicxF5w5oY8Nsa+j6pAL7jCA Y11YcS+//ebk+qS4ozHFH0/9JG11IXG6Gq1hmEPcnMmdodN42Asde9iMOt23HwTVTQQr UISO1DzT88luOM6GvSejYWovApqYamrM5CyImykz/57H/u/ZTNkpRQzHFjT8UmgVSI3y UGFw==
X-Gm-Message-State: AOAM530A3jDC65EGO9Ql4HamA2U7k5zrBihnjfexnYtUXjXj54dUQN1+ EZUOuaMs0q97hLlSUIdJiDz0M5KmJTduHyM9Migo+uRsLVOTUQ==
X-Google-Smtp-Source: ABdhPJziYsMAkAMVwKsdoSXRR1ulYXy1qGSpkd9YvteKHlAzhWvGPgC561ER3ASbS6k05iIG1zIYEYn8GHYehBcLNWA=
X-Received: by 2002:a9d:2661:: with SMTP id a88mr15359221otb.74.1596484641357; Mon, 03 Aug 2020 12:57:21 -0700 (PDT)
MIME-Version: 1.0
From: Paul Congdon <paul.congdon@tallac.com>
Date: Mon, 03 Aug 2020 12:57:11 -0700
Message-ID: <CAAMqZPuK0pNa6Mk0f3NL8mkWZ=t6iAbpxGA1YQiD5nE=KtjDVA@mail.gmail.com>
To: rdma-cc-interest@ietf.org
Content-Type: multipart/alternative; boundary="000000000000f02fa305abfe8bb2"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rdma-cc-interest/vZRgo97dAcDhSRgSGYG5OLy1byE>
Subject: [Rdma-cc-interest] Data Center Congestion Control
X-BeenThere: rdma-cc-interest@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Congestion Control for Large Scale HPC/RDMA Data Centers <rdma-cc-interest.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rdma-cc-interest/>
List-Post: <mailto:rdma-cc-interest@ietf.org>
List-Help: <mailto:rdma-cc-interest-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rdma-cc-interest>, <mailto:rdma-cc-interest-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Aug 2020 19:57:53 -0000

Hello RDMA CC list participants,

There were two interesting presentations at IETF-108 related to Data Center
Congestion Control.  A couple of discussion points came up that have
surfaced before, so I thought it might be worth highlighting them for
ongoing discussion.

The two presentations in the second session of TSVWG were:

1. HPCC++: Enhanced High Precision Congestion Control - link
<https://datatracker.ietf.org/meeting/108/materials/slides-108-tsvwg-sessb-81-hpcc-enhanced-high-precision-congestion-control>
2. PFC free low delay control protocol - link
<https://datatracker.ietf.org/meeting/108/materials/slides-108-tsvwg-sessb-82-pfc-free-low-delay-control-protocol>

Both of these presentations discussed improving congestion control for
RoCEv2 (as well as other protocols), which re-surfaced a couple of issues:

1. RoCEv2 is specified by IBTA, so the IETF should not propose 'specific'
changes for this protocol.
2. If work were to be done in the IETF, where should it begin; ICCRG or
TSVWG.

I propose the following comments on these issues:

1. Agreed that RoCEv2 and it's DCQCN are under the stewardship of IBTA.
Perhaps what is really needed from IETF is a 'data center' focused CC
approach that any UDP based protocols can use - in the 'data center'. This
was briefly discussed several IETFs ago in a side meeting.  Maybe some of
the tenants of that protocol could come from the recent presentations and
include:
   - Use of in-band signaling (i.e. IOAM) to provide more precise
congestion information to end-points in order to allow a CC to react
quickly and accurately (not overdoing it or under doing it) - motivated by
HPCC.
   - Specify how ECN thresholds can be fluid/dynamic to allow protocols to
start fast and stabilize quickly without congestion  - motivated by LDCP

2. The leadership of IETF have consistently stated that congestion control
ideas should/could start off in ICCRG and then move to TSVWG when/if a
standards track draft is needed.

Just some thoughts... Would love to hear what others have to say.

Paul