Re: [ippm] [tsvwg] [iccrg] New Internet Draft: Congestion Signaling (CSIG)

Jai Kumar <jai.kumar@broadcom.com> Tue, 20 February 2024 21:02 UTC

Return-Path: <jai.kumar@broadcom.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 59C58C180B43 for <ippm@ietfa.amsl.com>; Tue, 20 Feb 2024 13:02:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=broadcom.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E5uV5MkuYz9Y for <ippm@ietfa.amsl.com>; Tue, 20 Feb 2024 13:02:23 -0800 (PST)
Received: from mail-oa1-x2a.google.com (mail-oa1-x2a.google.com [IPv6:2001:4860:4864:20::2a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 640D2C180B49 for <ippm@ietf.org>; Tue, 20 Feb 2024 13:02:22 -0800 (PST)
Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-21e6be74db4so2676133fac.2 for <ippm@ietf.org>; Tue, 20 Feb 2024 13:02:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; t=1708462942; x=1709067742; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=cUx/yi464ESy/OqT2PeaSPX8br60xn1EbMH7dL8oVwA=; b=Q15DCPEhySVJTPstH5OkUDPQP2I1wCSzBpJyGOk+qxoT/D5NoexU88Z9WimjXhg3lU pUKvHFXXOmRD5jt7yI5ljr+aMud3D6w8K7JiBg/SzAijC/KNUUQx9oOpmWbXbj/QlC4c CZ2+Z1HFNKkySDcv3YtzU/FMTntW5vXnGCAZo=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708462942; x=1709067742; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cUx/yi464ESy/OqT2PeaSPX8br60xn1EbMH7dL8oVwA=; b=XWkF9IjEbq1tDz9CshKf4cHs2CiVGdSlIR7AgAUpaoSFMUXLeyfwMptuAo1njpHQPC WlNn0+Gj4HVlR50oI7mprKT72bnn8n5ecckgwKK2j3ChAUha2moVLdsc7RKJWsavhKvS wNUJl02UzUNlfWX5ueY7tn+nGE5ayYB7M/dLsVCdsydnoSlmzFtjXoPht9nNWYI5lSGG l3XJwrstPqYplWfaO6ZYkFHufE9CQQ8HWscu499Hj61VB6n61vhPZx8gbd1iYXZHgKmS 8M1O+jQy4ebOf35weV3NUNvtEiDfDx5PjHh8mYJnL6TgJjydxVOY84iic8OBgWcBhPH0 STng==
X-Forwarded-Encrypted: i=1; AJvYcCUCEC9PgPVE6JLI9rNxBsyiUqHCbkalLIGGjFst8SHPDjeO0nPaF+sqCcZEeAX+CZKg3uB/2s+kLMU5C6yZ
X-Gm-Message-State: AOJu0Yy1pxF9NDRWYotP4OovA3fW5BTezT5Fs9yNgDU8fmKHt7g8gx7u l/t8HN4mH5zaH0cf8w9HR+tOCBC0glZumPEWT5u+M1EBYIxLmujA0oYALoOk4dQdA+uI+iO4qAV ZxMcBHCpNfhG99NLK9RxhbDP4uelox1k0pXFLt8xfvGE4tdh2dr88OAfo2Kzgbm33euOdBtZFvV QBRvCRWA==
X-Google-Smtp-Source: AGHT+IElxion1DvOQVOCKQFO0fjjPbk3V/2bv06ykzXVNzygN03VECyr+yZwubsa1o0sZqcypPbLUC2/zFKMel9FDkk=
X-Received: by 2002:a05:6871:330a:b0:21e:ab72:bd6f with SMTP id nf10-20020a056871330a00b0021eab72bd6fmr8639975oac.58.1708462941902; Tue, 20 Feb 2024 13:02:21 -0800 (PST)
MIME-Version: 1.0
References: <CAF0+TDD+44TAHf7y05GzmCgbau66ey7AU2RaVroim_Tukf=7nQ@mail.gmail.com> <CALx6S35V8xyDBkN0m8kDEcNk0N734Fqq0Ne8ZJ284ZnSSUwV9w@mail.gmail.com> <CALx6S35XNyBe5=gh7JpaCKEkiXaEwPGHrDZe=E-EPkiF5mUCLA@mail.gmail.com> <CAB_+Fg5McYXt=M5MNkuxHrKrXQgZMS6PLRoVeUKiSUe5Qb7LjA@mail.gmail.com> <CALx6S35OHyhWjmkV2jiOqO-sB9Csugx0umB_yF_ann9rB8Tgbw@mail.gmail.com> <CAEsRLK9_bHrhyvFqCz3do=Ax3mKZor4EtqXY2chdfL7fzi1UMw@mail.gmail.com> <500388A6-50D3-4535-84CB-E6EF454960DD@gmx.de> <CALx6S37gOatLC_DZiM4M=e8qrzyE9y1D1i+UqOYXatd7Y6Nauw@mail.gmail.com> <918C1325-EC13-48CF-9B29-50EEB3A0FF1C@gmx.de> <CALx6S37zGrNMai+9khwG2_rpsiQuTd8bSiWbxZK-oiVEB0aimQ@mail.gmail.com> <A68A0319-7942-482D-A395-BB72901B2EA7@gmx.de> <CALx6S36AON6GkPLLcBVaq1uKxaRwgvc-txCkb9PCyX0DGs7ktw@mail.gmail.com> <4E3C7A28-C810-4420-A799-81ACC320A5D2@gmx.de> <9a0b8228-ed26-431c-92df-03a29d5f1a0d@huitema.net> <CALx6S35OJi0p8rSiHWyhGvmkZLKdrAKO3R=O=bgOjHaWQrWQ0Q@mail.gmail.com> <EE631A7E-2CAF-4CCC-8932-AED83B4611C0@gmx.de>
In-Reply-To: <EE631A7E-2CAF-4CCC-8932-AED83B4611C0@gmx.de>
From: Jai Kumar <jai.kumar@broadcom.com>
Date: Tue, 20 Feb 2024 13:02:08 -0800
Message-ID: <CAHuowCMnztviFPRPNDM5AOOsBTDFTVU7_e4SrRVtWmMfA86g3A@mail.gmail.com>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Tom Herbert <tom@herbertland.com>, Christian Huitema <huitema@huitema.net>, tsvwg <tsvwg@ietf.org>, IETF IPPM WG <ippm@ietf.org>, Nandita Dukkipati <nanditad@google.com>, iccrg@irtf.org, Naoshad Mehta <naoshad@google.com>, ccwg@ietf.org, Abhiram Ravi <abhiramr@google.com>
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha-256"; boundary="000000000000caea210611d68668"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/CnZLGDMxWdBdqNIkNzWVa4remCY>
X-Mailman-Approved-At: Wed, 21 Feb 2024 03:45:32 -0800
Subject: Re: [ippm] [tsvwg] [iccrg] New Internet Draft: Congestion Signaling (CSIG)
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Feb 2024 21:02:27 -0000

Thank you for a lively discussion.

I just want to add a few more points in favor of using CSIG in L2.

As we are trending towards high performance ethernet fabric for HPC and
AI/ML clusters, there are innovations happening in both link layer and
layer 3 headers. Amongst many some of the key requirements are
- minimal and constant overhead (in CSIG it is a fixed 16bit TPID, where as
for IOAM IPv4 packets if I recall it is a GRE header encapsulation and for
v6 it is Hop by Hop option)
- independence from layer 3 transport. So that it can work in legacy IP
networks or in optimized networks where there MAY be some form of
compressed headers present.
- low latency hw implementation. AI/ML clusters are latency sensitive for
inference and demands <.20us switch latency.
- decoupling of congestion signaling from transport. One such example is
the Ultra Ethernet Transport proposed in UEC. Reflection for source based
congestion control is in purview of transport as and when needed.
- fixed offset and constant header variance for v4 or v6 frames for both
low latency and future proofing for any innovation happening in these
layers. Note that these clusters are not L2 bridged domains and are mostly
L3 routed domains. Though we talk about vxlan tunnels in AI/ML or HPC
clusters but that is still a couple of years away. So the argument of
variable layer 2 headers is a weak argument.

I think the draft makes it clear that applicability of the proposal is DC
focussed.

Tom,
Also, there is a discussion and presentation done before talking about the
HW complexity and merits/demerits of encapsulating a v4 packet in IOAM GRE
header and more. It should be present in archives.

Best,
-Jai




On Tue, Feb 20, 2024 at 12:35 PM Sebastian Moeller <moeller0@gmx.de> wrote:

> Hi Tom,
>
>
> > On 20. Feb 2024, at 21:26, Tom Herbert <tom@herbertland.com> wrote:
> >
> > On Tue, Feb 20, 2024 at 12:09 PM Christian Huitema <huitema@huitema.net>
> wrote:
> >>
> >>
> >>
> >> On 2/20/2024 9:55 AM, Sebastian Moeller wrote:
> >>>> That's more of a statement of security and not feasibility. There's
> simply no security in the Internet, so we cannot trust or validate that
> anonymous intermediate nodes are going to write correct information. Any
> plain text in a packet on the Internet is subject to inspection and
> modification if the data isn't authenticated, and in the worst case this
> could be a DoS vector by writing bad information.
> >>> [SM3] Indeed, but e.g. for TCP you would need to know a lot about the
> most recent packet to be able to play games, no? So either you are on path
> and already can drop/duplicate packets at will or you are off path but
> still need a recent enough veridical packet to be able cause mischief, no?
> (I might be insufficiently creative in attack vectors)
> >>
> >> I am analysis congestion control information using the framework of
> >> "honest signals". In human communication, "honest signals" are those
> >> that cannot be easily faked by the communicator. For example, smiling is
> >> not really a honest signal, because it is easy to fake; blushing, on the
> >> other hand, is hard to fake.
> >>
> >> When it come to Internet wide congestion control, we have pretty much
> >> the same issue. Networks may want to fool the application for a variety
> >> of reasons, and may start faking congestion signals. Some of these
> >> signals are hard to fake. End to end data rate for example: slowing a
> >> specific stream of packets is hard to fake; measuring the end to end
> >> data rate is a pretty good indication of the state of the network. End
> >> to end RTT is also a rather honest signal: yes, routers could put some
> >> specific packets in a slow queue, but that requires resource.
> >>
> >> Packet losses almost belong in that category. They are not hard to fake,
> >> routers could play favorites and selectively drop packets with a certain
> >> profile. But dropping too many packets affects the "quality rating" of a
> >> provider, so there is some pressure to not fake it. That pressure is
> >> probably one of the reasons behind bufferbloat. The main problem with
> >> packet loss as a signal is that losses may have other causes than
> >> congestion.
> >>
> >> ECN is not really a honest signal. Setting a bit in a packet header does
> >> not require a lot of efforts, so routers could do that to play
> >> favorites. In fact, past bugs in some networks caused almost all packets
> >> to be marked as CE. Using ECN is very nice when you can trust it, but
> >> end nodes should probably do that cautiously, detecting for example a
> >> sudden raise in ECN marks rather than reacting to an average value.
> >>
> >> ECN is just one bit. There is always a temptation to do a better ECN
> >> with many more bits. For example, CE directs a sender to slow down. It
> >> would be nice to have a corresponding "All clear" signal telling the
> >> senders that they can speed up. L4S attempts to do that by modulating
> >> the CE bit, so that a low frequency kinda indicates "all clear", while a
> >> high frequency says "slow down", and give some indication of how much.
> >> Suddenly, one bit becomes several bits, just spread over many packets.
> >>
> >> The idea of adding more bits in packet headers is not exactly new -- see
> >> for example TCP QUIC Start by Sally Floyd et al., RFC 4782, January
> >> 2007. The problem is that the more bits you add, the more you exacerbate
> >> issues of trust, and also risks of bugs. "Many more bits" may work in a
> >> controlled environment, but I really do not see that working on the
> >> whole Internet.
> >
> > Hi Christian,
> >
> > Do you know what the state of ECN deployment over the Internet is?
>
> [SM5] Nobody knows exactly, but quite a lot of Linux servers use the LInux
> defaults and will use ECN if the client negotiates it. In my qdisc
> statistics I routinely see not only drops, but also CE marks logged (from
> my AQM).... so believe it or not, ECN over the internet mostly works...
>
>
> > It seems to me that if someone is sending to an arbitrary host over the
> > Internet they're already pretty much accepting "best effort" service:
>
> [SM5] Not sure about the US, but that is all ISPs offer to end users over
> here... but ECN works well even over a best effort internet access link in
> my personal experience.
>
> > long latencies and with potentially high variance, such that getting
> > fined grained congestion information from intermediate routers, even
> > if it's honest, probably doesn't add much to the information that we
> > can derive from packet loss or measuring RTT with no additional
> > mechanisms or implementation.
>
> [SM5] How can you come to that conclusion without ever trying?
>
>
> > The situation is very different in a limited domain which could
> > include large service provider networks.
>
> [SM5] Indeed, papers discussion 'better congestion signalling' often come
> out of those environments. But IMHO not because these methods only help in
> those environments, but that this is where people are willing to spend the
> money to test and implement potential solutions.
>
> Sebastian
>
> > In that case more information
> > is good, it's easier to provide security so we can trust the
> > information, and we're not restricted to just one or two bits of
> > information to carry the information in a packet. This is also where I
> > see host-to-network signaling being useful-- this allows applications
> > to request QoS for their packets
>
> >
> > Tom
> >
> >>
> >> -- Christian Huitema
>
>
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.