Re: [ippm] [tsvwg] [iccrg] New Internet Draft: Congestion Signaling (CSIG)

Tom Herbert <tom@herbertland.com> Tue, 20 February 2024 21:54 UTC

Return-Path: <tom@herbertland.com>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 248C5C180B69 for <ippm@ietfa.amsl.com>; Tue, 20 Feb 2024 13:54:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.106
X-Spam-Level:
X-Spam-Status: No, score=-7.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=herbertland.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g0qgeo2kc1w0 for <ippm@ietfa.amsl.com>; Tue, 20 Feb 2024 13:54:05 -0800 (PST)
Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 37AC6C180B44 for <ippm@ietf.org>; Tue, 20 Feb 2024 13:54:05 -0800 (PST)
Received: by mail-lf1-x132.google.com with SMTP id 2adb3069b0e04-512bb2ed1f7so3133703e87.3 for <ippm@ietf.org>; Tue, 20 Feb 2024 13:54:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=herbertland.com; s=google; t=1708466043; x=1709070843; darn=ietf.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7arLVsuEyhCkpk+nWtlrMErWZ9Gl2Yq5CQr2IEulOUY=; b=cZ9F8lb33upQMdKO9ldBzl7W1OwGvrJdEe2p1aV+9+p0DO1ryXI18jlYo9ShiiFX22 944vPsDPMfau3NYTCZVJBoI/X9nO9G4v+0pumV4bX2gVpvWtUc4fY/W4rUEa/bLO+j9Q i9RhVzp6ixWuzK34pqLsSIOnG/lb6mQceXfObkN2Ccg6v++roqAv0+vJmzQa3pP5e+39 SjzewomnndELua0S0hrn3iVCvYX5RDoP1lKqjN0zZgeAsVqOneLgY13QJG5LGR73LRHx tFErXsvhGufXr4RkMWAIgHvNIuONZSHZXGb9ef43blBdzIDfX6rGVdcrgRZL50xaND56 riEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708466043; x=1709070843; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7arLVsuEyhCkpk+nWtlrMErWZ9Gl2Yq5CQr2IEulOUY=; b=mL8LDkFpAHw73e6ypavNESt0gNySjw6cBwI/xC+7G49yzuyrYHKrVrgZCMXH2NK2Hl TKcvGnI90gsyTik1wVl3+9ZJBLz21+RwtF3Md+98gFQ/UWjeQU5cJIHzGCOizzZBcjyz XeMWrGE4wbYU+CSBXQD2BzT7kl8GL+bIfR4+m8msFvCCNJHr5iaqvA+ZHn/+3+Bi1UdJ 76MED7q8Cj7nQXluEr4chfr8j7Abw+G90HOvdhQg5kyZZCiYN3cylrYFHsJP4VDeYSWp GktSNhPpdHTJyxEzQ12E67ZZUcnJ1vf5iS5mptSQ1yiUuxr8q/EYk1OUPWWLdtE/Pya7 dSGg==
X-Forwarded-Encrypted: i=1; AJvYcCVW0QUBfvQT1ZIH1PD7WN0k/n3Ffy8ioHApHWUG3EQJ+NhdtCMq6pNFi/eYBhANYb0C1zGqxNiP8+WGEEJy
X-Gm-Message-State: AOJu0YwSvhLcpWgh2L6tK2EAZC389T232yDhi8AtZ/WsqGT1pD0VjIoj JE+5qP952FQ6RTRnnRR6S7bL/JhNjQ6gdovDjv1JdLDSDYuozwYGxgeTD9anp+Em6t+GXM8QaGA FtzsQn4+UvS98eVpYPfHklHcMSgaqUjaUyx7V
X-Google-Smtp-Source: AGHT+IE/22igeUhtZffGGrNBHbjRsIXUuXKkhbcPR3zvYQQ1qM+yD84fHRU0uwgCEe5tHFM8tsXMaKDBzexelWrLaXc=
X-Received: by 2002:a05:6512:20cd:b0:510:ee2c:e9c4 with SMTP id u13-20020a05651220cd00b00510ee2ce9c4mr9113956lfr.17.1708466042843; Tue, 20 Feb 2024 13:54:02 -0800 (PST)
MIME-Version: 1.0
References: <CAF0+TDD+44TAHf7y05GzmCgbau66ey7AU2RaVroim_Tukf=7nQ@mail.gmail.com> <CALx6S35V8xyDBkN0m8kDEcNk0N734Fqq0Ne8ZJ284ZnSSUwV9w@mail.gmail.com> <CALx6S35XNyBe5=gh7JpaCKEkiXaEwPGHrDZe=E-EPkiF5mUCLA@mail.gmail.com> <CAB_+Fg5McYXt=M5MNkuxHrKrXQgZMS6PLRoVeUKiSUe5Qb7LjA@mail.gmail.com> <CALx6S35OHyhWjmkV2jiOqO-sB9Csugx0umB_yF_ann9rB8Tgbw@mail.gmail.com> <CAEsRLK9_bHrhyvFqCz3do=Ax3mKZor4EtqXY2chdfL7fzi1UMw@mail.gmail.com> <500388A6-50D3-4535-84CB-E6EF454960DD@gmx.de> <CALx6S37gOatLC_DZiM4M=e8qrzyE9y1D1i+UqOYXatd7Y6Nauw@mail.gmail.com> <918C1325-EC13-48CF-9B29-50EEB3A0FF1C@gmx.de> <CALx6S37zGrNMai+9khwG2_rpsiQuTd8bSiWbxZK-oiVEB0aimQ@mail.gmail.com> <A68A0319-7942-482D-A395-BB72901B2EA7@gmx.de> <CALx6S36AON6GkPLLcBVaq1uKxaRwgvc-txCkb9PCyX0DGs7ktw@mail.gmail.com> <4E3C7A28-C810-4420-A799-81ACC320A5D2@gmx.de> <9a0b8228-ed26-431c-92df-03a29d5f1a0d@huitema.net> <CALx6S35OJi0p8rSiHWyhGvmkZLKdrAKO3R=O=bgOjHaWQrWQ0Q@mail.gmail.com> <EE631A7E-2CAF-4CCC-8932-AED83B4611C0@gmx.de> <CAHuowCMnztviFPRPNDM5AOOsBTDFTVU7_e4SrRVtWmMfA86g3A@mail.gmail.com>
In-Reply-To: <CAHuowCMnztviFPRPNDM5AOOsBTDFTVU7_e4SrRVtWmMfA86g3A@mail.gmail.com>
From: Tom Herbert <tom@herbertland.com>
Date: Tue, 20 Feb 2024 13:53:51 -0800
Message-ID: <CALx6S36YK-M7cFZvB7grH4cHe4ZmOtpGMrj+JHquUv-bG5h=Zg@mail.gmail.com>
To: Jai Kumar <jai.kumar@broadcom.com>
Cc: Sebastian Moeller <moeller0@gmx.de>, Christian Huitema <huitema@huitema.net>, tsvwg <tsvwg@ietf.org>, IETF IPPM WG <ippm@ietf.org>, Nandita Dukkipati <nanditad@google.com>, Naoshad Mehta <naoshad@google.com>, Abhiram Ravi <abhiramr@google.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/MHtvziP0m0v847sc8ADusJwYTVs>
Subject: Re: [ippm] [tsvwg] [iccrg] New Internet Draft: Congestion Signaling (CSIG)
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Feb 2024 21:54:09 -0000

On Tue, Feb 20, 2024 at 1:02 PM Jai Kumar <jai.kumar@broadcom.com> wrote:
>
> Thank you for a lively discussion.
>
> I just want to add a few more points in favor of using CSIG in L2.
>
> As we are trending towards high performance ethernet fabric for HPC and AI/ML clusters, there are innovations happening in both link layer and layer 3 headers. Amongst many some of the key requirements are
> - minimal and constant overhead (in CSIG it is a fixed 16bit TPID, where as for IOAM IPv4 packets if I recall it is a GRE header encapsulation and for v6 it is Hop by Hop option)

Please see the solution I proposed (again not IOAM). This could be
done effectively as a fixed size header consisting of 14 byte Ethernet
header, 20 byte IPv6 header, and 8 byte HBH Options containing a CSIG
option. As I mentioned, hardware can identify these headers from
fields as fixed offsets, and otherwise assume that all the fields that
need to be processed are at fixed offsets in the packet.

> - independence from layer 3 transport. So that it can work in legacy IP networks or in optimized networks where there MAY be some form of compressed headers present.

By legacy, I assume you mean IPv4? There are some approaches that
could be applied (draft-herbert-ipv4-eh or maybe a lightweight
encapsulation protocol).

Also, while independence for L3 transport might be a goal, as I
already pointed out there are at most only two L3 protocols we need to
worry about: IPv6 and IPv4. If the information is L2 then we have to
map that into the different L2 technologies-- so IMO, avoiding
dependence on L2 would be a better goal to constrain complexity of the
solution. This is also true for the upper layer >=L4. Right now the
draft is dependent on L4 or higher to reflect the signal. If we put
this information in L3, then we again only have to worry how to do
this for the two IP protocols.

> - low latency hw implementation. AI/ML clusters are latency sensitive for inference and demands <.20us switch latency.

Please look at the format I proposed. I don't offhand see anything
that would make processing any slower than putting the information in
L2. Again, everything needed could be at fixed offsets and mapping the
packet to the format for processing is a matter of comparing a few
fields in the parsing buffer (and most of those fields are probably
already checked for normal forwarding anyway).

> - decoupling of congestion signaling from transport. One such example is the Ultra Ethernet Transport proposed in UEC. Reflection for source based congestion control is in purview of transport as and when needed.

But reflecting the information doesn't decouple congestion signaling
from transport. This is the reason why the reflected signal should be
in L3.

> - fixed offset and constant header variance for v4 or v6 frames for both low latency and future proofing for any innovation happening in these layers. Note that these clusters are not L2 bridged domains and are mostly L3 routed domains.

This is where I don't understand how an L2 solution can work
end-to-end. If clusters are L3 routed domains then how does
information in an L2 header get propagated through an L3 switch?

> Though we talk about vxlan tunnels in AI/ML or HPC clusters but that is still a couple of years away. So the argument of variable layer 2 headers is a weak argument.

But in that argument you're assuming the use of the protocol is just
in high performance AI/ML or HPC clusters. We get much better utility
out of developing protocols with a variety of use cases. For instance,
I think CSIG will be very high value in 6G networks which will have
different characteristics and probably heterogeneous L2 in the path.

>
> I think the draft makes it clear that applicability of the proposal is DC focussed.
>

Sure, a lot of great protocols start with a narrow use case :-)

> Tom,
> Also, there is a discussion and presentation done before talking about the HW complexity and merits/demerits of encapsulating a v4 packet in IOAM GRE header and more. It should be present in archives.

Still haven't found it. Maybe I'm looking in the wrong archives?

Tom

>
> Best,
> -Jai
>
>
>
>
> On Tue, Feb 20, 2024 at 12:35 PM Sebastian Moeller <moeller0@gmx.de> wrote:
>>
>> Hi Tom,
>>
>>
>> > On 20. Feb 2024, at 21:26, Tom Herbert <tom@herbertland.com> wrote:
>> >
>> > On Tue, Feb 20, 2024 at 12:09 PM Christian Huitema <huitema@huitema.net> wrote:
>> >>
>> >>
>> >>
>> >> On 2/20/2024 9:55 AM, Sebastian Moeller wrote:
>> >>>> That's more of a statement of security and not feasibility. There's simply no security in the Internet, so we cannot trust or validate that anonymous intermediate nodes are going to write correct information. Any plain text in a packet on the Internet is subject to inspection and modification if the data isn't authenticated, and in the worst case this could be a DoS vector by writing bad information.
>> >>> [SM3] Indeed, but e.g. for TCP you would need to know a lot about the most recent packet to be able to play games, no? So either you are on path and already can drop/duplicate packets at will or you are off path but still need a recent enough veridical packet to be able cause mischief, no? (I might be insufficiently creative in attack vectors)
>> >>
>> >> I am analysis congestion control information using the framework of
>> >> "honest signals". In human communication, "honest signals" are those
>> >> that cannot be easily faked by the communicator. For example, smiling is
>> >> not really a honest signal, because it is easy to fake; blushing, on the
>> >> other hand, is hard to fake.
>> >>
>> >> When it come to Internet wide congestion control, we have pretty much
>> >> the same issue. Networks may want to fool the application for a variety
>> >> of reasons, and may start faking congestion signals. Some of these
>> >> signals are hard to fake. End to end data rate for example: slowing a
>> >> specific stream of packets is hard to fake; measuring the end to end
>> >> data rate is a pretty good indication of the state of the network. End
>> >> to end RTT is also a rather honest signal: yes, routers could put some
>> >> specific packets in a slow queue, but that requires resource.
>> >>
>> >> Packet losses almost belong in that category. They are not hard to fake,
>> >> routers could play favorites and selectively drop packets with a certain
>> >> profile. But dropping too many packets affects the "quality rating" of a
>> >> provider, so there is some pressure to not fake it. That pressure is
>> >> probably one of the reasons behind bufferbloat. The main problem with
>> >> packet loss as a signal is that losses may have other causes than
>> >> congestion.
>> >>
>> >> ECN is not really a honest signal. Setting a bit in a packet header does
>> >> not require a lot of efforts, so routers could do that to play
>> >> favorites. In fact, past bugs in some networks caused almost all packets
>> >> to be marked as CE. Using ECN is very nice when you can trust it, but
>> >> end nodes should probably do that cautiously, detecting for example a
>> >> sudden raise in ECN marks rather than reacting to an average value.
>> >>
>> >> ECN is just one bit. There is always a temptation to do a better ECN
>> >> with many more bits. For example, CE directs a sender to slow down. It
>> >> would be nice to have a corresponding "All clear" signal telling the
>> >> senders that they can speed up. L4S attempts to do that by modulating
>> >> the CE bit, so that a low frequency kinda indicates "all clear", while a
>> >> high frequency says "slow down", and give some indication of how much.
>> >> Suddenly, one bit becomes several bits, just spread over many packets.
>> >>
>> >> The idea of adding more bits in packet headers is not exactly new -- see
>> >> for example TCP QUIC Start by Sally Floyd et al., RFC 4782, January
>> >> 2007. The problem is that the more bits you add, the more you exacerbate
>> >> issues of trust, and also risks of bugs. "Many more bits" may work in a
>> >> controlled environment, but I really do not see that working on the
>> >> whole Internet.
>> >
>> > Hi Christian,
>> >
>> > Do you know what the state of ECN deployment over the Internet is?
>>
>> [SM5] Nobody knows exactly, but quite a lot of Linux servers use the LInux defaults and will use ECN if the client negotiates it. In my qdisc statistics I routinely see not only drops, but also CE marks logged (from my AQM).... so believe it or not, ECN over the internet mostly works...
>>
>>
>> > It seems to me that if someone is sending to an arbitrary host over the
>> > Internet they're already pretty much accepting "best effort" service:
>>
>> [SM5] Not sure about the US, but that is all ISPs offer to end users over here... but ECN works well even over a best effort internet access link in my personal experience.
>>
>> > long latencies and with potentially high variance, such that getting
>> > fined grained congestion information from intermediate routers, even
>> > if it's honest, probably doesn't add much to the information that we
>> > can derive from packet loss or measuring RTT with no additional
>> > mechanisms or implementation.
>>
>> [SM5] How can you come to that conclusion without ever trying?
>>
>>
>> > The situation is very different in a limited domain which could
>> > include large service provider networks.
>>
>> [SM5] Indeed, papers discussion 'better congestion signalling' often come out of those environments. But IMHO not because these methods only help in those environments, but that this is where people are willing to spend the money to test and implement potential solutions.
>>
>> Sebastian
>>
>> > In that case more information
>> > is good, it's easier to provide security so we can trust the
>> > information, and we're not restricted to just one or two bits of
>> > information to carry the information in a packet. This is also where I
>> > see host-to-network signaling being useful-- this allows applications
>> > to request QoS for their packets
>>
>> >
>> > Tom
>> >
>> >>
>> >> -- Christian Huitema
>>
>>
>
> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.