Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: More details are requested about the large scale use cases, including issue 8-11

Dino Farinacci <farinacci@gmail.com> Mon, 24 October 2022 02:54 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 $3696.120.41.1.1$)
From: Dino Farinacci <farinacci@gmail.com>
In-Reply-To: <Y1X2kvbLv0qXtD8z@faui48e.informatik.uni-erlangen.de>
Date: Sun, 23 Oct 2022 19:54:09 -0700
Cc: Yisong Liu <liuyisong@chinamobile.com>, msr6@ietf.org, pim@ietf.org, BIER WG <bier@ietf.org>, mboned@ietf.org, Stig Venaas <stig@venaas.com>, hooman.bidgoli@nokia.com
Content-Transfer-Encoding: quoted-printable
Message-Id: <DDD735E2-0930-4CB8-8992-E3E74C715D16@gmail.com>
References: <011701d8e361$88780710$99681530$@chinamobile.com> <D0BA8841-BA90-4DF5-AAE5-A0113D4F17C7@gmail.com> <02fc01d8e537$6037c7e0$20a757a0$@chinamobile.com> <1A893DF5-816E-4D09-AAC6-065BBD1BD409@gmail.com> <Y1X2kvbLv0qXtD8z@faui48e.informatik.uni-erlangen.de>
To: Toerless Eckert <tte@cs.fau.de>
Archived-At: <https://mailarchive.ietf.org/arch/msg/mboned/J5f-fTXTtqL-6TwX-41CGXjo9hA>
Subject: Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: More details are requested about the large scale use cases, including issue 8-11
Precedence: list

> Think of a DC with 10,000 nodes and considering stateless
> multicast source routing with 10,000 addressable destinations. 1280 min
> MTU for ethernet is not a concern in such a network. Even if it

It is a concern if no user data is in the packet.

> costs 50% or more overhead on e.g. a 1280 byte packet payload, cutting

Its going to cost 1250 bytes for 10,000 destinations.

> the addresing down to 5,000 and sending two packets across the network
> would be more bandwidth incurred on it.

Sorry, your argument makes no sense.

> Splitting multicast across multiple packets also brings up the unfairness
> concern of differential latency and the synchronization deciding "last-receiver"
> highest latency propagation latency.

But if you don't have to split it up across two packets, it is better for the user. You CANNOT argue this point. You might say wasting data packet bandwidth to elminate control state is a good tradeoff, but it clearly is not. And you won't be able to convince this point to anyone.

If you want people to take msr6 seriously, you have to make good obvious tradeoffs.

> And some key applictions in DCs may actually want to send lowest-latency traffic
> to thousands of receivers.  Consider parallel compute application worker
> management, like those customers have used since the early 200x in DC.
> Those packets may today need to go to thousands of parallel instances and
> for fastest synchronization they should arrive at all of them at the same,
> fastest time.

We can talk about low latency solutions once you give up the need to put so much state in a data packet. That is a different topic, and your data plane bloat won't solve either.

> Or think of high volume multicast apps distributing content, and
> flow completion time is the key factor. Even if that multicasts just to 100
> receivers out of 10,000: If you can not predict the subset at deployment 
> time then you could end up requiring to send to different receiver subsets
> between 4 and 40 times the traffic because 10,000/256 (bitstring size) = 40,
> and there is at lest one out of the 100 receivers in each Set Identifier (bitstring).
> Thats a whole new layer of traffic management pain problems for DCs and
> for the application owner: Performance could vary from best to up to 8 times
> (40/5) slower. And yes: maybe i could McGyver with SI and entropy field and
> available ECMP on the DC routers a solution where i ECMP load-split the
> traffic for different SI, but that would be highly complex and difficult to generalize.
> And it still wouldn't reduce the overall fabric load that changes by receiver set.

Note if a packet is delivered on a state based delivery tree, with no source-route, and you have a joiner downstream on the distribution tree that joins "at the same time as the packet is traveling down the tree", that new receiver can get the packet. With a source-route, any existing packets won't get delivered to that receiver. So you will have high join latency and missed opportunities to deliver packets already in flight to the new receiver.

> Of course, one would certainly like a stateless source routing header
> design that does not require to carry the 10,000 bit receiver information

"like"? How about it's a strong requirement for obvious reasons.

> if the addressed set is actually smaller (such as 200). And there are proposals for
> that (dynamic source-route-header size based on size of receiver set).
> See e.g: draft-eckert-msr6-rbs.

So you will have multiple solutions for group size? That is a bad tradeoff too. And what happens when you go from 200 receivers 201, is there a major shift to a different solution?

That sounds far worse than what we experirenced switching from shared-tree to source-tree.

> Wrt to the receiver tracking: Remember that in end-to-end applications
> only the sender may need to be involved in calculating the receivers,
> (No network control plane harmed!).

Then you have the same problem with head-end replication at the source, as we do today with CDNs. You are just moving the problem and not solving the problem where a source just sends packets to a group address.

Today, multicast sending from a source CANNOT get more efficient. You just send one UDP packet to a multicast group. That is pretty simple in my mind and can't get any simpler. So anything you change will add overhead and of course a non-starter.

> Those host nodes typically can do a shi.load of compute in high-speed
> CPU cache when they are DC servers.  In the mentioned parallel compute
> worker management, this would for example be a dynamic subset
> calculation of 10,000 parallel workers based on ongoing performance telemetry.
> I bet those existing large-scale distributed compute apps already have to
> spend orders of magnitude more compute than converting any such subset into
> a bitstring.

They can do better and faster by not doing this.

Dino

[MBONED] MSR6 BOF 3rd Issue Category: More detail… Yisong Liu
Re: [MBONED] MSR6 BOF 3rd Issue Category: More de… Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Yisong Liu
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Huaimo Chen
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Bier] MSR6 BOF 3rd Issue Category: … Jeffrey (Zhaohui) Zhang
Re: [MBONED] [pim] [Msr6] MSR6 BOF 3rd Issue Cate… Toerless Eckert
Re: [MBONED] [pim] [Msr6] MSR6 BOF 3rd Issue Cate… Dino Farinacci
Re: [MBONED] [pim] [Msr6] MSR6 BOF 3rd Issue Cate… Toerless Eckert
Re: [MBONED] [Bier] [pim] [Msr6] MSR6 BOF 3rd Iss… Greg Shepherd
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Xiejingrong (Jingrong)
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Jeffrey (Zhaohui) Zhang
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Michael McBride
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Jeffrey (Zhaohui) Zhang
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Gengxuesong (Geng Xuesong)
Re: [MBONED] [Bier] [Msr6] MSR6 BOF 3rd Issue Cat… Gengxuesong (Geng Xuesong)
Re: [MBONED] [Bier] [Msr6] MSR6 BOF 3rd Issue Cat… Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dino Farinacci
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dirk Trossen
Re: [MBONED] [Bier] [Msr6] MSR6 BOF 3rd Issue Cat… Gengxuesong (Geng Xuesong)
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Gengxuesong (Geng Xuesong)
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dirk Trossen
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dirk Trossen
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Toerless Eckert
Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: … Dirk Trossen
Re: [MBONED] [Bier] [Msr6] MSR6 BOF 3rd Issue Cat… Dino Farinacci
Re: [MBONED] [Bier] [Msr6] MSR6 BOF 3rd Issue Cat… Toerless Eckert
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dirk Trossen
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dino Farinacci
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dirk Trossen
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dino Farinacci
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dirk Trossen
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dino Farinacci
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dirk Trossen
Re: [MBONED] [Msr6] [Bier] MSR6 BOF 3rd Issue Cat… Dino Farinacci
Re: [MBONED] [Bier] MSR6 BOF 3rd Issue Category: … Huaimo Chen