Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: More details are requested about the large scale use cases, including issue 8-11

Dirk Trossen <dirk.trossen@huawei.com> Thu, 03 November 2022 12:11 UTC

Return-Path: <dirk.trossen@huawei.com>
X-Original-To: mboned@ietfa.amsl.com
Delivered-To: mboned@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8CFFDC1524C3; Thu, 3 Nov 2022 05:11:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.207
X-Spam-Level:
X-Spam-Status: No, score=-4.207 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jhJd6DCYDhDk; Thu, 3 Nov 2022 05:11:13 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 15AD3C14CE31; Thu, 3 Nov 2022 05:11:13 -0700 (PDT)
Received: from frapeml500007.china.huawei.com (unknown [172.18.147.200]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4N32Yr3KSBz6H6nS; Thu, 3 Nov 2022 20:07:12 +0800 (CST)
Received: from lhrpeml100006.china.huawei.com (7.191.160.224) by frapeml500007.china.huawei.com (7.182.85.172) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 3 Nov 2022 13:11:11 +0100
Received: from lhrpeml500003.china.huawei.com (7.191.162.67) by lhrpeml100006.china.huawei.com (7.191.160.224) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 3 Nov 2022 12:11:10 +0000
Received: from lhrpeml500003.china.huawei.com ([7.191.162.67]) by lhrpeml500003.china.huawei.com ([7.191.162.67]) with mapi id 15.01.2375.031; Thu, 3 Nov 2022 12:11:10 +0000
From: Dirk Trossen <dirk.trossen@huawei.com>
To: Toerless Eckert <tte@cs.fau.de>
CC: "Gengxuesong (Geng Xuesong)" <gengxuesong=40huawei.com@dmarc.ietf.org>, Dino Farinacci <farinacci@gmail.com>, Jeffrey Zhang <zzhang@juniper.net>, "Xiejingrong (Jingrong)" <xiejingrong=40huawei.com@dmarc.ietf.org>, BIER WG <bier@ietf.org>, "msr6@ietf.org" <msr6@ietf.org>, "mboned@ietf.org" <mboned@ietf.org>, "pim@ietf.org" <pim@ietf.org>
Thread-Topic: [Msr6] MSR6 BOF 3rd Issue Category: More details are requested about the large scale use cases, including issue 8-11
Thread-Index: AQHY7m9JD3lrN7R9G0uzVxHA8OIhC64s8J8AgAACHoCAACUggIAAAHzQgAADUYCAAAA+IA==
Date: Thu, 03 Nov 2022 12:11:10 +0000
Message-ID: <083b99b512cd43adbc8066245c21a258@huawei.com>
References: <0d2e78fefe9e4cef87c52493b7fefc80@huawei.com> <BL0PR05MB56528FCEF7FDE262F633A24FD4329@BL0PR05MB5652.namprd05.prod.outlook.com> <C10FBD6A-E651-49BB-B2EC-0C04FC966C4A@gmail.com> <Y1/nUmnoYQhTn7OO@faui48e.informatik.uni-erlangen.de> <15F231E4-1D93-4531-AEA1-B4DC06F25A69@gmail.com> <Y2HqfVIOKKeDfdF0@faui48e.informatik.uni-erlangen.de> <d96fb2881de0476f8a6368f9c821c124@huawei.com> <078a861004164f45a600ded9a816140b@huawei.com> <Y2Oq2RqFUE0vtjUT@faui48e.informatik.uni-erlangen.de> <262df32c359c4a2f86cc6a4e3b54ad5e@huawei.com> <Y2OuCbZaipvDmyFV@faui48e.informatik.uni-erlangen.de>
In-Reply-To: <Y2OuCbZaipvDmyFV@faui48e.informatik.uni-erlangen.de>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.220.96.241]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/mboned/eR7-cNnAK_pzeNzHxBHDu_4aTOc>
Subject: Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: More details are requested about the large scale use cases, including issue 8-11
X-BeenThere: mboned@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Mail List for the Mboned Working Group <mboned.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mboned>, <mailto:mboned-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mboned/>
List-Post: <mailto:mboned@ietf.org>
List-Help: <mailto:mboned-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mboned>, <mailto:mboned-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Nov 2022 12:11:17 -0000

See inline.

-----Original Message-----
From: Toerless Eckert <tte@cs.fau.de> 
Sent: 03 November 2022 13:03
To: Dirk Trossen <dirk.trossen@huawei.com>
Cc: Gengxuesong (Geng Xuesong) <gengxuesong=40huawei.com@dmarc.ietf.org>; Dino Farinacci <farinacci@gmail.com>; Jeffrey Zhang <zzhang@juniper.net>; Xiejingrong (Jingrong) <xiejingrong=40huawei.com@dmarc.ietf.org>; BIER WG <bier@ietf.org>; msr6@ietf.org; mboned@ietf.org; pim@ietf.org
Subject: Re: [Msr6] MSR6 BOF 3rd Issue Category: More details are requested about the large scale use cases, including issue 8-11

On Thu, Nov 03, 2022 at 11:55:21AM +0000, Dirk Trossen wrote:
> [DOT] well, it's following the same semantic but the realization of IP multicast prevents you from doing this at high dynamicity of the group forming. If you really intend on doing something like what the HTTP multicast response draft outlines, IP multicast won't do, indeed. 

I think this is the core novely of source-routing of multicast packets, and still extremely valuable to document. But likely evenmore so to do that towards app-developers/researchers whose application (such as in data centers) could benefit from it than in the IETF.
[DOT] indeed, hence the FRRM draft, which uses the previous HTTP multicast response as an example realization (over BIER and possibly also MSR6). The last part of your statement is key: if pulled to the application, the multicast 'savings' translate into reduced server load and immediate server ingress BW need. That's not explicitly pulled out in the draft but was always key to demos we gave on this in previous efforts.

Except of course that those researchers would of course want to have router/switch platforms that do support BIER/MSR6 well enough that they can actually build working prototype applications on top of it.
[DOT] As is also mentioned in the draft, some simpler version (for single domain) of such source routing capability is an SDN-based forwarding (misusing the IPv6 address for the bitmask and using SDN-compliant wildcard matching for the multicast replication - all OF-compliant) was demonstrated several times and also trialed in a number of EU-funded 5G trials. Applications ranged from simple video OTT retrieval to AR/VR to SW download. The trials were specifically done by application developers, utilizing the trial network (then deployed in UK, Spain and Italy). When bringing this to the IETF back in 2018 (I think), it was our idea to move those concept from the SDN space into the Internet with technologies like BIER. 

Greg had mentioned some BIER host stack, but i failed to find it.
Some packaging of BIER/MSR6 on routers (e.g.: Tuebingen/BUPT work on P4) plus such a host stack would be what might attract more researchers to experiment with apps using it.

> Sure. Effectively the BIER archtiecture with flow-overlay is an instance of this, and BIER didn't invent it. The MVPN architecture itself is also the same, only the terminology is different. 
> [DOT] That what's the FRRM draft (and to a larger extend the mentioned paper submission) tries to address, namely consolidating the semantics and then show how the various realizations go about implementing it, i.e., it is trying to establish a single lens to look across while investigating the differences in realizing the specific solutions.

Ack.

Cheers
    Toerless

> > Best,
> > 
> > Dirk
> > 
> > -----Original Message-----
> > From: Msr6 <msr6-bounces@ietf.org> On Behalf Of Gengxuesong (Geng
> > Xuesong)
> > Sent: 03 November 2022 10:29
> > To: Toerless Eckert <tte@cs.fau.de>; Dino Farinacci 
> > <farinacci@gmail.com>
> > Cc: Jeffrey Zhang <zzhang@juniper.net>; Xiejingrong (Jingrong) 
> > <xiejingrong=40huawei.com@dmarc.ietf.org>; BIER WG <bier@ietf.org>; 
> > msr6@ietf.org; mboned@ietf.org; pim@ietf.org
> > Subject: Re: [Msr6] MSR6 BOF 3rd Issue Category: More details are 
> > requested about the large scale use cases, including issue 8-11
> > 
> > Hi Toerless and all,
> > 
> > Thanks a lot for mentioning the document of https://datatracker.ietf.org/doc/draft-ietf-bier-multicast-http-response/ , which I think is a very good reference about how to connect a group of receivers which request the same content to a multicast service. Although it is done through proxy rather than host at the existing stage, but it shows a valid approach about how to realize this function in host-initiated multicast case with MSR.
> > 
> > Best
> > Xuesong
> > 
> > -----Original Message-----
> > From: Msr6 [mailto:msr6-bounces@ietf.org] On Behalf Of Toerless 
> > Eckert
> > Sent: Wednesday, November 2, 2022 11:57 AM
> > To: Dino Farinacci <farinacci@gmail.com>
> > Cc: Jeffrey Zhang <zzhang@juniper.net>; Xiejingrong (Jingrong) 
> > <xiejingrong=40huawei.com@dmarc.ietf.org>; BIER WG <bier@ietf.org>; 
> > msr6@ietf.org; mboned@ietf.org; pim@ietf.org
> > Subject: Re: [Msr6] MSR6 BOF 3rd Issue Category: More details are 
> > requested about the large scale use cases, including issue 8-11
> > 
> > On Mon, Oct 31, 2022 at 01:59:31PM -0700, Dino Farinacci wrote:
> > > Let me make one more point. It is so easy to originate a packet from anywhere, with all the BFs lit up. What will happen? Its a lot easier to do this and DoS attack the data-plane then to use a control-plane which is harder, but not impossible, to attack.
> > 
> > Control plane, especially interdomain is about the hardest to defend against attacks. That is why BGP security is the hardest ongoing issue of the Internet architecture.
> > 
> > And of course, we do have war stories for Multicast on this. Unicast operators/ vendor in the IETF did in the 90th quite carefully rejected to have MSDP be a BGP feature for security concerns of not knowing what it would do to the Internet - and fearing for the worst.
> > 
> > MSDP was then (predictably ?) the first protocol that brought down a good part of the Internet control plane when it was attacked UNINTENTIONAL:
> > 
> > DDoS IP address scanning attack software did not exclude IP 
> > multicast addresses - whoever wrote the software did not t know what 
> > multicast is, they really only wanted to exploit hosts, not attack 
> > the Internet infrastructure ?! ;-),
> > 
> > This caused unlimited MSDP (SA) messages to flood the multicast enabled Internet core (most research networks USA/Europe, some AsiaPC). I had to do all the functional specs for all the state limiters and since then educate customers how to configure all those nerd-nops on every MSDP router.
> > 
> > I am not sure anymore if these events preceeded downgrading MSDP to experimental RFC or if they happened later. Still took maybe 15 years since that time to deprecate the IP Multicast (ASM!) contrl plane for the Internet - RFC8815.
> > 
> > I did continue to see the same problems intradomain with MSDP/Anycast-RP long after those interdomain (research community) experiments also overloading intradomain Service provider cores. In cases we could not even figure out why, because there is so little diagnostic and event tracing to do a root cause analsysis - because all those PIM-SM/MSDP control plane is to complex.
> > There have been quite long and detailled service provider guideline docs for all he things needed to harden intradomain IP multicast deployments - all control plane issues based.
> > 
> > Ok, we of course want SSM, but state overload is exactly the same issue, indeed it does add the issue of creating state attacks in the absence of multicast traffic. Only difference to MSDP is that you have to trigger those attacks via IGMP/MLD "joins" and not by sending IP multicast data packets.
> > Not a challenge to intentional attackers.
> > 
> > I have been working on state securing on almost every single multicast control plane protocol, especially PIM/IGMP/MLD, and tried to educate customers how to apply those hardening configs through almost 10 years of educational talks.
> > Especially in single, supposedly controlled domains (enterprise). And there still is IMHO no really good automated defaults for all those necessary control plane state limits in commercial router products. All IP multicast deployments i have seen are based on a combination of trusted hosts/participants, very limited multicast functionality enabled, absence of enough attackers, expert use of all those state limiters, praying - and the fact that there is today a lot more money to be made attacking hosts/content vs. infrastructure (something i of course fear to be a thing one should never rely on, which is why i am doing ANIMA to protect the infra).
> > 
> > >From all this experience with securing IP multicast deployments, i 
> > >do consider
> > myself to be somewhat of an expert on the topic, and my high level take away is that securing BIER/MSR6 forwarding plane actually does make things easier over only receiver join based IP multicast. Yes, you will need to do policing of the "bitstrings" in ingres, but the beauty is that its one single place for a source, wheres in join-based, it a harder, distributed problem, so it does have the potential to actually even simplify some of these security/hardening issues.
> > 
> > > If not controlled, unlike SR, a general source-routing approach is loaded with security issues.
> > 
> > High level, you are arguing that control plane state is more 
> > trustworthy or better controlled than packet header state, but IP 
> > multicast control plane state is exactly the proof of how to 
> > proliferate through the network untrustworthy user/application 
> > created state. And we don't even have working congestion management 
> > techniques for it. Which we would do with packet headers state as in BIER/MSR6.
> > That is the gist of what alas we so far failed to write up 
> > succinctly 
> > (https://datatracker.ietf.org/doc/draft-ietf-bier-multicast-http-res
> > po nse/, is one so far expired attempt to document it, we need to 
> > revive that topic better, Dirk Trossen has started this a bit 
> > again...).
> > 
> > Nobody says that in the most general-purpose deployments of BIER and 
> > even more so MSR6, these packets headers should be allowed to be 
> > generated by untrusted application code. There are easily workable 
> > models on how to limit this function to trusted control plane 
> > entities
> > (example: trusted container OS level generation of headers). But 
> > there is a great future power for multicast in this, when you do 
> > consider and design hosts+network as one big distributed system and 
> > build it with trusted applications - something that is core to the 
> > success of all the hyperscaler DCN. Yes, like with BIER, our likely 
> > first candidate for
> > MSR6 is intradomain in large SP networks, but DCN are immediately a very logical beneficiary of the same technologies.
> > 
> > Cheers
> >     Toerless
> > 
> > > Dino
> > > 
> > > > On Oct 31, 2022, at 8:18 AM, Toerless Eckert <tte@cs.fau.de> wrote:
> > > > 
> > > > On Fri, Oct 28, 2022 at 09:52:38AM -0700, Dino Farinacci wrote:
> > > >> It sounds like a compromise could be to use the bitfield concept introduced by BIER and put those in an IPv6 packet. But the bits describe the oif-list like BIER does and not the receiver hosts. Then you bring the scale down to "the number of interfaces that lead to the edges of this domain".
> > > > 
> > > > Dino:
> > > > 
> > > > I think MSR6 has two orthogonal set of challenges/solutions:
> > > > 
> > > > One is scalability. Thats about better source-route/tree 
> > > > encoding than flat bitstrings. e.g.: draft-eckert-bier-rbs. This 
> > > > is independent IMHO of what encap to choose, e.g.: RFC8296 or 
> > > > something meant to support IPv6 only networks, compliant with
> > > > RFC8200 source routing like SRH.
> > > > 
> > > > The other is exactly such an IPv6 encapsulation. Like we already 
> > > > have two for unicast - SRH and RFC6554. IMHO, to really allow 
> > > > stateless source-routed IP multicast, that source-routing header 
> > > > needs to include the IP destination (multicast group) address. 
> > > > See
> > > > e.g.: draft-eckert-msr6-rbs
> > > > 
> > > >> Arguably you can use the flow-label field for this so the header DOES NOT need to be larger. There are plenty of flow-label bits. But you are still going to have a control-plane, like BIER does. But maybe you can reuse that control-plane.
> > > > 
> > > > To get down to only the bits of the neighbors, the 
> > > > source-routing needs to include these bits for every router on 
> > > > the distribution tree that you want to send the packet on.  This 
> > > > is more than flow-label, but this is exactly what RBS does.
> > > > 
> > > >> Its not clear at all to me, that operators would deploy this rather than just using an overlay, where when used properly there is no multicast state in the underlay.
> > > > 
> > > > One origin story of BIER is of coure exactly to provide 
> > > > improvements over such solutions by service provider:
> > > > Ingres-replication. And of course, this is a matter of 
> > > > performance criteria which solution to pick. But stateless 
> > > > source-routing that drives replication in the intervening hops 
> > > > like BIER and all derived idea do is still stateless in the core. And it is similar in operations to steering as in SR (MPLS or SRv6). Aka:
> > > > its a lot closer in concept to unicast and thats always what in 
> > > > the past has made multicast solutions more amenable to operators.
> > > > 
> > > > (but this is really not a novel argument for MSR6, but of course equally for BIER).
> > > > 
> > > > Cheer
> > > >   Toerless
> > > > 
> > > >> Dino
> > > >> 
> > > >>> On Oct 28, 2022, at 7:46 AM, Jeffrey (Zhaohui) Zhang <zzhang@juniper.net> wrote:
> > > >>> 
> > > >>> Hi,
> > > >>> 
> > > >>> Here is how I see it - I don't see one solution for all but separate solutions already exist or proposed with *IPv6-agnostic* data plane.
> > > >>> 
> > > >>> For "Stateless using MSR6 and still efficient in packet size when POSSIBLE" - BIERin6 or CGM2/RBS.
> > > >>> For "Stateful using MSR6 with proper control-plane when MUST" 
> > > >>> - P2MP tunnels (mLDP/RSVP/SR-P2MP)
> > > >>> 
> > > >>> I guess the crux is the following (the debate that we have a couple of years ago):
> > > >>> - "what is IPv6 native" - is it only if the BIER/RBS header is put into an IPv6 extension header?
> > > >>> - "must it be IPv6 native" - BIERin6, which is "transport" (L2 or tunnel or IPv4/IPv6) agnostic, just can't do the job?
> > > >>> 
> > > >>> Jeffrey
> > > >>> 
> > > >>> 
> > > >>> Juniper Business Use Only
> > > >>> 
> > > >>> -----Original Message-----
> > > >>> From: MBONED <mboned-bounces@ietf.org> On Behalf Of 
> > > >>> Xiejingrong
> > > >>> (Jingrong)
> > > >>> Sent: Friday, October 28, 2022 3:02 AM
> > > >>> To: Dino Farinacci <farinacci@gmail.com>; Toerless Eckert 
> > > >>> <tte@cs.fau.de>
> > > >>> Cc: BIER WG <bier@ietf.org>; msr6@ietf.org; mboned@ietf.org; 
> > > >>> pim@ietf.org
> > > >>> Subject: Re: [MBONED] [Msr6] MSR6 BOF 3rd Issue Category: More 
> > > >>> details are requested about the large scale use cases, 
> > > >>> including issue 8-11
> > > >>> 
> > > >>> [External Email. Be cautious of content]
> > > >>> 
> > > >>> 
> > > >>> Hi,
> > > >>> 
> > > >>> I got a time to read through this thread just now, and found the discussions very frank and very interesting.
> > > >>> If I understanding it correctly, Dino's point is that, efficient packet size is the highest priority in a multicast solution design choice, stateless and other "benefits" are all secondary.
> > > >>> Can we expect a solution that is:
> > > >>> Stateless using MSR6 and still efficient in packet size when POSSIBLE ?
> > > >>> Stateful using MSR6 with proper control-plane when MUST ?
> > > >>> 
> > > >>> Thanks
> > > >>> Jingrong
> > > >>> 
> > > >>> 本邮件及其附件可能含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于
> > > >>> 全部
> > > >>> 或部
> > > >>> 分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
> > > >>> This e-mail and its attachments may contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
> > > >>> 
> > > >>> 
> > > >>> -----Original Message-----
> > > >>> From: Msr6 [mailto:msr6-bounces@ietf.org] On Behalf Of Dino 
> > > >>> Farinacci
> > > >>> Sent: Tuesday, October 25, 2022 10:31 AM
> > > >>> To: Toerless Eckert <tte@cs.fau.de>
> > > >>> Cc: Yisong Liu <liuyisong@chinamobile.com>; msr6@ietf.org; 
> > > >>> pim@ietf.org; BIER WG <bier@ietf.org>; mboned@ietf.org; Stig 
> > > >>> Venaas <stig@venaas.com>; hooman.bidgoli@nokia.com
> > > >>> Subject: Re: [Msr6] MSR6 BOF 3rd Issue Category: More details 
> > > >>> are requested about the large scale use cases, including issue
> > > >>> 8-11
> > > >>> 
> > > >>> Toerless, a packet without source routing gives more packet space to the user than one has source routing. Everything else is secondary and not relevant to this basic point.
> > > >>> 
> > > >>> Solve the problem, whatever you think it is, with a control-plane.
> > > >>> 
> > > >>> Dino
> > > >>> 
> > > >>>> On Oct 24, 2022, at 9:27 AM, Toerless Eckert <tte@cs.fau.de> wrote:
> > > >>>> 
> > > >>>> On Sun, Oct 23, 2022 at 07:54:09PM -0700, Dino Farinacci wrote:
> > > >>>>>> Think of a DC with 10,000 nodes and considering stateless 
> > > >>>>>> multicast source routing with 10,000 addressable destinations.
> > > >>>>>> 1280 min MTU for ethernet is not a concern in such a network. 
> > > >>>>>> Even if it
> > > >>>>> 
> > > >>>>> It is a concern if no user data is in the packet.
> > > >>>> 
> > > >>>> Rephrasing: Requiring a larger MTU than 1500 to support such 
> > > >>>> an option is not a limiting factor for such type of 
> > > >>>> controlled network with next-generation hardware (do you 
> > > >>>> remember when we had to muck around with 64kbyte jumbo packet 
> > > >>>> support for HEPnet networks in the 1990th and even in the early 200x - aka: networks with larger MTU have been around forever).
> > > >>>> 
> > > >>>>>> costs 50% or more overhead on e.g. a 1280 byte packet 
> > > >>>>>> payload, cutting
> > > >>>>> 
> > > >>>>> Its going to cost 1250 bytes for 10,000 destinations.
> > > >>>> 
> > > >>>> Right. And i provided the calculation how the overal amount 
> > > >>>> of traffic even with such a large header would be lower than 
> > > >>>> replicating the packet and sending it multiple times with smaller header to reach all destinations.
> > > >>>> 
> > > >>>> BIER for example itself is spec'ed to support up to 4096 
> > > >>>> bits, based on wort case/largest deployment case 
> > > >>>> considerations as of
> > > >>>> 10 years ago. That might not be required for the current 
> > > >>>> SP-WAN deployments, but obviously, when we started BIER, we 
> > > >>>> also looked into broader use-case candidates than whats currently the BIER deployment focus.
> > > >>>> Thinking of another factor 2 as a possible maximum is not a big stretch of the imagination.
> > > >>>> 
> > > >>>>>> the addresing down to 5,000 and sending two packets across 
> > > >>>>>> the network would be more bandwidth incurred on it.
> > > >>>>> 
> > > >>>>> Sorry, your argument makes no sense.
> > > >>>> 
> > > >>>> Payload 1280. total number of egres routers: 10,000
> > > >>>> Sending one packet with 10,000 bit bitstring:      1250+1280 =  2530 byte
> > > >>>> Sending two packets with 5,000 bit bitstring:   2*(1250+625) =  3810 bytes
> > > >>>> 
> > > >>>>>> Splitting multicast across multiple packets also brings up 
> > > >>>>>> the unfairness concern of differential latency and the synchronization deciding "last-receiver"
> > > >>>>>> highest latency propagation latency.
> > > >>>>> 
> > > >>>>> But if you don't have to split it up across two packets, it is better for the user.
> > > >>>> 
> > > >>>> You are making my point. Of course i know how you do not want 
> > > >>>> to, because you are arguing for stateful multicast solutions at scale.
> > > >>>> 
> > > >>>>> You CANNOT argue this point. You might say wasting data packet bandwidth to elminate control state is a good tradeoff, but it clearly is not. And you won't be able to convince this point to anyone.
> > > >>>> 
> > > >>>> Customers are accepting the overhead of source routing 
> > > >>>> headers over managing forwarding state in networks. This applies both to unicast (SR vs. RSVP-TE) and multicast (BIER/MSR6).
> > > >>>> Customers have eliminated stateful solutions with e.g.: 
> > > >>>> RSVP-TE in favor of that. They have replaced stateful multicast with ingres replication to avoid state in the core.
> > > >>>> 
> > > >>>> These customers are looking at the overall traffic savings. 
> > > >>>> The reference is unicast, not stateful multicast. If i take a 
> > > >>>> unicast solution sending to 1000 to 10000 receivers it 
> > > >>>> requires 1000x...10000x of the payload size. If i give then a 
> > > >>>> stateless multicast solution tat requires 1x...3x of the payload size because of header, that IS preferrable over introducing a whole new stateful service into the network.
> > > >>>> 
> > > >>>> Larger MTUs btw. have ben common in many controlled networks forever.
> > > >>>> Remember all the requirements for HEPnet in the 1990th with 
> > > >>>> networks using up to 64k IP MTU ? Nowadays all those DC networks with RoCE also use larger MTU.
> > > >>>> 
> > > >>>> Of course, thinking of a header size of 1k is a stretch 
> > > >>>> today, so most of my colleagues also think that a 1/10th of 
> > > >>>> this is a a reasonable limit, but i think thats just too much 
> > > >>>> grounded in backward fears and not comparing the actual 
> > > >>>> use-case benefits, especially simpliciy, predictability, minimum latency/jitter. Those are going to be the criteria of interest going forward.
> > > >>>> 
> > > >>>>> If you want people to take msr6 seriously, you have to make good obvious tradeoffs.
> > > >>>> 
> > > >>>> But lets make sure we do not asume that tradeoffs for an 
> > > >>>> SP-WAN based on todays router hardware limits the ability for the best tradeoffs in a DC 5 years down the road.
> > > >>>> 
> > > >>>> [ I think we've seen this in the opposite direction with 
> > > >>>> IPv6, where i think everybody was happy to see min MTU raised 
> > > >>>> to 1280 over IPv4, and the Internet was happy to get the
> > > >>>> 64 bit routing address space (tongue in cheek ;-), except 
> > > >>>> that "everybody" didn't include all those IoT and other 
> > > >>>> controlled networks, that since then had to almost start an 
> > > >>>> IETF area of their own to come up with all those workaround 
> > > >>>> to make IPv6 better fit those networks (header compression, 
> > > >>>> fragmentation, routing etc.). ]
> > > >>>> 
> > > >>>> I just don't want this to
> > > >>>> happen for MSR6, but i want MSR6 to be scaleable across a 
> > > >>>> wider range of networks, especially at the higher end in its 
> > > >>>> core design. Thats why i am happily being provocative here 
> > > >>>> with the source-routing size to have us think further. 
> > > >>>> Especially when the IETF is mostly looking (bcause of 
> > > >>>> participartion) at mostly the SP-WAN market in the west, which alas is not moving much, and ignoring that n countries like china there are still a lot more scale requirements to solve. WAN and DC.
> > > >>>> 
> > > >>>>>> And some key applictions in DCs may actually want to send 
> > > >>>>>> lowest-latency traffic to thousands of receivers.  Consider 
> > > >>>>>> parallel compute application worker management, like those customers have used since the early 200x in DC.
> > > >>>>>> Those packets may today need to go to thousands of parallel 
> > > >>>>>> instances and for fastest synchronization they should 
> > > >>>>>> arrive at all of them at the same, fastest time.
> > > >>>>> 
> > > >>>>> We can talk about low latency solutions once you give up the need to put so much state in a data packet. That is a different topic, and your data plane bloat won't solve either.
> > > >>>> 
> > > >>>> I respectfully disagree. Being able to send with an equal 
> > > >>>> latency in the usec range a packet to multiple destinations 
> > > >>>> WITHOUT prior estblishment of multicast state is one core 
> > > >>>> benefit of stateless multicast, and the data overhead of that is just a quantifyable cost that can easily be judged vs. the alternative (stateful) based on use-case.
> > > >>>> 
> > > >>>>> Note if a packet is delivered on a state based delivery tree, with no source-route, and you have a joiner downstream on the distribution tree that joins "at the same time as the packet is traveling down the tree", that new receiver can get the packet.
> > > >>>> 
> > > >>>> A high dynamic rate of join/leaves is a myth that we 
> > > >>>> succumbed to when designing multicast RMT solutions in the 
> > > >>>> IETF. In fact it is the stateless multicast with source 
> > > >>>> routing that is the key enableer of scaleable/rate-adaptive 
> > > >>>> multicast transport solutions.  See 
> > > >>>> draft-ietf-bier-multicast-http-response
> > > >>>> (we'll have to rewrite this).
> > > >>>> 
> > > >>>>> With a source-route, any existing packets won't get delivered to that receiver. So you will have high join latency and missed opportunities to deliver packets already in flight to the new receiver.
> > > >>>> 
> > > >>>> This observation does not appply to applications where the 
> > > >>>> sender knows best who needs to get what. Which ultimately is 
> > > >>>> the case in almost all multicast applications. DASH/ABR over 
> > > >>>> multicast (see
> > > >>>> above) or distributed coordination via multicast.
> > > >>>> 
> > > >>>> Even without adaptive video, but most boring MPEG IPTV, the 
> > > >>>> receiver driven joins where a complete pain: Channel zapping 
> > > >>>> where you really don't want to receive duplicate traffic even 
> > > >>>> for short periods (limited receiver link BW), but you also don't want to join as soon as possible multicast, but get unicast until the next GOP.
> > > >>>> This switchover is done sooooo much easier with source-routing.
> > > >>>> 
> > > >>>> Even when it's not the case, join propagation times are in 
> > > >>>> the order of msec per hop, vs. typically much shorter packet 
> > > >>>> forwarding times host-to-host in networks up to metro size.
> > > >>>> 
> > > >>>>>> Of course, one would certainly like a stateless source 
> > > >>>>>> routing header design that does not require to carry the
> > > >>>>>> 10,000 bit receiver information
> > > >>>>> 
> > > >>>>> "like"? How about it's a strong requirement for obvious reasons.
> > > >>>> 
> > > >>>> I should have said "unnecessary 10,000 bits". Akas: In the 
> > > >>>> same way that customers where happy to start SRv6 with 
> > > >>>> 8*16=128 byte SRH steering data, they also would like to see CRH. I was thinking of the same for bitstrings:
> > > >>>> If all i have is a 10,000 "flat" bitstring, customers can 
> > > >>>> make the easy calculation how this is e.g.: 2x..3x of the 
> > > >>>> payload data in the DC, so they'll go with it. But when its 
> > > >>>> clear that we could compress the header significantly when the number of receivers is more sparse, then they would of course want that (RBS).
> > > >>>> 
> > > >>>>>> if the addressed set is actually smaller (such as 200). And 
> > > >>>>>> there are proposals for that (dynamic source-route-header size based on size of receiver set).
> > > >>>>>> See e.g: draft-eckert-msr6-rbs.
> > > >>>>> 
> > > >>>>> So you will have multiple solutions for group size? That is a bad tradeoff too. And what happens when you go from 200 receivers 201, is there a major shift to a different solution?
> > > >>>> 
> > > >>>> [ There are working groups that had to claim the use-case was 
> > > >>>> simple and clear and a single solution easily feasible to get 
> > > >>>> chartered. And then they evolved into completely different 
> > > >>>> use-cases and a wide range of alternative solution component 
> > > >>>> pieces, but never the original use-case ;-) ]
> > > >>>> 
> > > >>>> What we have with MSR6 a different candidate proposals, and 
> > > >>>> one of the core part of the work of the proposed WG is to 
> > > >>>> figure out what the best compromise is for first WG and industry adopted mechanisms.
> > > >>>> 
> > > >>>> Obviosly, like Mr Fynman, i would like a unified 
> > > >>>> source-routing-header theory of the universe. To this end it 
> > > >>>> is for example that we want to investigate if the RBS option 
> > > >>>> could also have favourable performnce metrics over destination-only source-routing. aka: like BIER - MSR6 docs call this the "BE" mode (Best Effort).
> > > >>>> Even though RBS at its core is just a much better evolution 
> > > >>>> for tree engineering over BIER-TE.
> > > >>>> 
> > > >>>> If this fully unified option is insufficient, we most likely 
> > > >>>> would arrive at one option for BE and one for tree engineering (TE). Aka:
> > > >>>> maybe RBS for TE and flat-bitstring for BE, and beside all 
> > > >>>> the
> > > >>>> IPv6 forwarding plane specifics, we might just increase the 
> > > >>>> maximum supported header size for both based on hardware 
> > > >>>> capability in future routers
> > > >>>> (aka: today we have 512 byte examinable packet heder in 
> > > >>>> routers, then it likely could be at least 1024). But thats personal conjecture.
> > > >>>> 
> > > >>>>> That sounds far worse than what we experirenced switching from shared-tree to source-tree.
> > > >>>> 
> > > >>>> I may have worked with more customers having pain with that 
> > > >>>> across more different HW accelerated platforms than you did, 
> > > >>>> so my mileage may vary. But i have a hard time thinking these two technology aspects could be compared fairly in the same ballpark.
> > > >>>> 
> > > >>>> IMHO, packet header size is an easily quantified cost vs. 
> > > >>>> benefit evaluation for the customers.  Operations of RTP/SPT 
> > > >>>> switchover is an ugly technology detail nobody should need to 
> > > >>>> understand, but if you're operating a PIM-SM network you MUST 
> > > >>>> understand all it's bloody intricacies. And then there are the customers threatening you with a lot of money and explaining you that their definition of IP datagram forwarding is:
> > > >>>> no loss, no jitter, no reorder. And you know how you can 
> > > >>>> build multicast to do it, and its actually a nice way to justify the high cost of your product. Just RPT/SPT is not the way.
> > > >>>> (but source-routing is one such option ;-).
> > > >>>> 
> > > >>>>>> Wrt to the receiver tracking: Remember that in end-to-end 
> > > >>>>>> applications only the sender may need to be involved in 
> > > >>>>>> calculating the receivers, (No network control plane harmed!).
> > > >>>>> 
> > > >>>>> Then you have the same problem with head-end replication at the source, as we do today with CDNs. You are just moving the problem and not solving the problem where a source just sends packets to a group address.
> > > >>>> 
> > > >>>> The source is the one sending the source-routed multicast 
> > > >>>> packet
> > > >>>> 
> > > >>>>> Today, multicast sending from a source CANNOT get more efficient. You just send one UDP packet to a multicast group. That is pretty simple in my mind and can't get any simpler. So anything you change will add overhead and of course a non-starter.
> > > >>>> 
> > > >>>> We did outsource the source discovery from network layer to 
> > > >>>> application land when we introduced SSM, because the network 
> > > >>>> as doing a shitty job at scaling for that (RPT/SPT just being one part of the problem).
> > > >>>> 
> > > >>>> We're moving membership management out of the network with 
> > > >>>> source-routing. Whether its done locally in the source 
> > > >>>> aplication or with a help of a controller, it is so much 
> > > >>>> easier to NOT do it in the network where routers always have 
> > > >>>> constrained contrl plane, and not do it distributed in 20 
> > > >>>> hops along a path, but only in source or controller, both of 
> > > >>>> which can much more easily have arbitrary amount of CPU 
> > > >>>> compared to network devices - and a source especially only needs to bother about its own traffic.
> > > >>>> 
> > > >>>> lets also remind the rest of the audience here that i am 
> > > >>>> hopefully fair to say that this critique of yours is not specific to MSR6 but also applies to BIER.
> > > >>>> 
> > > >>>>>> Those host nodes typically can do a shi.load of compute in 
> > > >>>>>> high-speed CPU cache when they are DC servers.  In the 
> > > >>>>>> mentioned parallel compute worker management, this would 
> > > >>>>>> for example be a dynamic subset calculation of 10,000 parallel workers based on ongoing performance telemetry.
> > > >>>>>> I bet those existing large-scale distributed compute apps 
> > > >>>>>> already have to spend orders of magnitude more compute than 
> > > >>>>>> converting any such subset into a bitstring.
> > > >>>>> 
> > > >>>>> They can do better and faster by not doing this.
> > > >>>> 
> > > >>>> First of all, this is not true for native source-routing apps 
> > > >>>> like i mentioned above, where the application sender can now 
> > > >>>> perfectly manage things like sending excactly only whats 
> > > >>>> needed during channel zapping of a receiver (receiver in old 
> > > >>>> TV channel source-routing header, followed by unicasted 
> > > >>>> cached unicast reference frames of new TV channel, followed by receiver in new TV channel source-routing header). Likewise all the similar application examples for other applications in DC.
> > > >>>> 
> > > >>>> Btw: It's one of the big pains that we never so far got into 
> > > >>>> writing down more elaborately all those application benefits 
> > > >>>> of source-routing. Alas, this is because no application developer can buy today really routers with BIER to experiment with.
> > > >>>> 
> > > >>>> Secondly: Even when we have "good-old-ip-multicast" 
> > > >>>> applications with receiver joins, the overall solution 
> > > >>>> complexity
> > > >>>> network+application goes down by moving to SSM, and it even
> > > >>>> further moves down when we do the receiver tracking in the SSM sender.
> > > >>>> 
> > > >>>> Cheers
> > > >>>>  Toerless
> > > >>>> 
> > > >>>>> Dino
> > > >>>>> 
> > > >>>> 
> > > >>>> --
> > > >>>> ---
> > > >>>> tte@cs.fau.de
> > > >>> 
> > > >>> --
> > > >>> Msr6 mailing list
> > > >>> Msr6@ietf.org
> > > >>> https://urldefense.com/v3/__https://www.ietf.org/mailman/listi
> > > >>> nf
> > > >>> o/
> > > >>> msr6__;!!NEt6yMaO-gk!DBCC8DrBuiDv8c751ocIPw4VAIzivgMtzOeKRdDM6
> > > >>> ct gW WLLJ-Otyii9_3v8ShIsvM8lDDCzwJbpGkkDQZU9mq9V2Wahx7UB$
> > > >>> 
> > > >>> _______________________________________________
> > > >>> MBONED mailing list
> > > >>> MBONED@ietf.org
> > > >>> https://urldefense.com/v3/__https://www.ietf.org/mailman/listi
> > > >>> nf
> > > >>> o/
> > > >>> mboned__;!!NEt6yMaO-gk!DBCC8DrBuiDv8c751ocIPw4VAIzivgMtzOeKRdD
> > > >>> M6 ct gWWLLJ-Otyii9_3v8ShIsvM8lDDCzwJbpGkkDQZU9mq9V2cO0DVYj$
> > > > 
> > > > --
> > > > ---
> > > > tte@cs.fau.de
> > > 
> > 
> > --
> > ---
> > tte@cs.fau.de
> > 
> > --
> > Msr6 mailing list
> > Msr6@ietf.org
> > https://www.ietf.org/mailman/listinfo/msr6
> > --
> > Msr6 mailing list
> > Msr6@ietf.org
> > https://www.ietf.org/mailman/listinfo/msr6
> 
> --
> ---
> tte@cs.fau.de

--
---
tte@cs.fau.de