Re: [tsvwg] a new method of congestion control

rs.ietf@gmx.at Fri, 09 June 2023 19:22 UTC

Return-Path: <rs.ietf@gmx.at>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5D98DC15257A for <tsvwg@ietfa.amsl.com>; Fri, 9 Jun 2023 12:22:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.094
X-Spam-Level:
X-Spam-Status: No, score=-2.094 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.at
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SsTK7UrHwAZm for <tsvwg@ietfa.amsl.com>; Fri, 9 Jun 2023 12:21:55 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A1971C14F5E0 for <tsvwg@ietf.org>; Fri, 9 Jun 2023 12:21:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.at; s=s31663417; t=1686338512; x=1686943312; i=rs.ietf@gmx.at; bh=lPk2dTmwtxprp9eAQBZjn5YRye/+bLXxjBc3KSGL/Xc=; h=X-UI-Sender-Class:Date:From:Reply-To:Subject:To:References:In-Reply-To; b=LudFsyKIievlEXMCm5WnEl0fd1P5ZeDK6lw/5RU/TMl6idyqyAYB/m3J377dvPqPBmshcBa sSHKetOKt6Qx2oF7sypGZUW4Be1oMcvm8yvJD8LfW4p9OzyiTB4kqCkX9AenIhcNcFIGaimZx CAz427AjGyAGhg1vlh92Ls7/MzNS4Wip5KgTs9nA4tYOIAwtvVmJc0f5wfs0q55yrOXpr+d2K kec0XSDOGHChPpOp62PqHqeca6uPeuIJueeH3OcTJ5t+Et6ByUvqI3xVJWFF1cwS7ZXfhgK2f O0OFntle51K426HYHFq6wsG+pTnW/Ru8xnPx7ZHPG+ZFeJgktDIA==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from [192.168.233.104] ([185.236.167.136]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MOA3F-1qVWzR08c5-00OY7o; Fri, 09 Jun 2023 21:21:52 +0200
Message-ID: <578e0834-3d50-8b8e-4ca4-1613c211d35a@gmx.at>
Date: Fri, 09 Jun 2023 21:21:51 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2
From: rs.ietf@gmx.at
Reply-To: rs.ietf@gmx.at
To: Dave Taht <dave.taht@gmail.com>, tsvwg IETF list <tsvwg@ietf.org>
References: <CAA93jw6TJEciW8QhgbSe=0ZTk6njhpxMTQ3ETxzy73hhcP0yAw@mail.gmail.com>
In-Reply-To: <CAA93jw6TJEciW8QhgbSe=0ZTk6njhpxMTQ3ETxzy73hhcP0yAw@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:463hCR09/POXtxMowWYWvtZjlCcvb0qiexqnLGt4l2SpBqmv2KB pxIGQWxZvfQwFCrkdfXgpRLA463SL29nb427TZrA8F5h5iYYrLGz+roxA5DM5bSJdtdxSoP 5Uyesw9GOCbh/QQYl/rew29WrWHsM+rK7+DVACF9JXNt6f192VPLmxn67al2TYZ8caV1hDr 7AFogmpKF8o7XDUzbBrew==
UI-OutboundReport: notjunk:1;M01:P0:5gB/YH4+x3A=;4emRhhdWP8UXyVep7xkqJWcd0Wv JAM3q4OwPST4dHtDRCxuTG6uEs35bXAyOLSkxeBe/7nVUmwiuBdqYT9qrNSoe4tZuZOzUcaKP WKmfFgZnniesgykVRnlFlY7RJ4HSM5NeyO22XbeAa9xyqAFJRHdqrdzoyEgaAN3gIPXVkPspG Z1M5p268O9qHqDctev/FrP/EjilK36xE3p6tjtaiARCpBig0+BNydD7LiH+FAdHsFRADoW7w5 voasUa8R7EOFJTPh/dQcFZxeWfDjV+7LcUaJb2QEH15TK+AZXNIaS9p0ss2fSC2ihf7fr2L19 bSBoir4+VrZWzmHyKPLtc961gSIzPBOj6WvuoI9HDbh3g9xeZbLzXKlrCFpDBQ1EGaFXtG9Fx R2bOLNwEpOsY5bqUcr0Rn9XcH1XarQFBFEUOdS12GvnPxGjbHfAOSR38XIv+uTTaQp33KJ0fE iMamsSkKOIvBaYe3PpjOoTB+UJM/1qmTa8IwhpviLhySvlXLYUHxBx+vMbVGzFwaO/mfYTazH O38mR3O9Wo591BOGONiiYkN2Vd3JDfcu/5C6q0CV0I+45cVouJM/rmpo2N+FNHIQTkKOrYkzX wIrvRXu693mJG45WqDfuWYIfQkimTgFYUMaBdiP0+wyyHhQt00lkvmBm+lGboPMhtGH4PBkqV eJNSeeRUV2XcYoaYerOJObeRdg/5NXxdE7kbbRz7lPBx58MAkycIuoMDOgZHb2CDQK/yd6B8i 03eIrq7/stqzR4ZdI6ZHzo8Sffqc5SIowqvo1zJbKlT4RKePHGe10PkZDWWkkWhWBjHsNcrOE x3ARxl+TlhQElcVi5cjBQhfv1j6gOr4iNKTakgf4uwra56EK4HE0FGabE/hiN50DI+X9auowe l3FfmpaX8eDUZZ8m1vNbznhvbYRT+ZzC5TS3zT+u5MG2TQwWQqJigE7FXHH3VrYUQSkRHh++k KyhrMCrbsK49lD+VQAAB+qP/+bk=
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/htExV1kZonQiG18z9QiYCptMoGc>
Subject: Re: [tsvwg] a new method of congestion control
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Jun 2023 19:22:00 -0000

To me, it appears that Nvidia is great on Marketing;

Alone for RoCE they re-invented at least three difference "better than anything before" CC algorithms (I believe the current one is a variation of a delay-based congestion controller, with avoiding some of the signal noise by depending on a very hardware-centric implementation, and auto-magic tuning optimizations when end-hosts communicate via compliant switches).

AI "workloads" should be differentiated between training and inference; the former is usually signified by very high bandwidth demands (e.g. uncompressed, or low-complexity compressed video) at significiant higher-than-realtime data ingestion, but in every case a very low latency response requirement.

The latter has much lower bandwidth rate requirements (e.g. sensors in real-time) but still low latency requirements.

I wouldn't hold my breath, and would only be interested in learning about the scalability limits and real-world issues of these improvements created behind closed doors...

Richard



Am 05.06.2023 um 02:53 schrieb Dave Taht:
> announced by nvidia here:
> https://www.zdnet.com/article/nvidia-unveils-new-kind-of-ethernet-for-ai-grace-hopper-superchip-in-full-production/
>
> I have no idea what an AI workload looks like, familiarity with a ton
> of DC l2 protocols, and there are hints in this post about telemetry.
>
> Anyone have clue here?
>