Re: [Bier] ASIC restrictions

Tony Przygienda <tonysietf@gmail.com> Tue, 15 November 2022 13:08 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: bier@ietfa.amsl.com
Delivered-To: bier@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF052C14CEED for <bier@ietfa.amsl.com>; Tue, 15 Nov 2022 05:08:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.095
X-Spam-Level:
X-Spam-Status: No, score=-2.095 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yjiOZgb5a9LD for <bier@ietfa.amsl.com>; Tue, 15 Nov 2022 05:08:19 -0800 (PST)
Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 496C3C14CE58 for <bier@ietf.org>; Tue, 15 Nov 2022 05:08:19 -0800 (PST)
Received: by mail-ed1-x532.google.com with SMTP id u24so21727562edd.13 for <bier@ietf.org>; Tue, 15 Nov 2022 05:08:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=3w4N2DHk7eLyNkxkmFEC6PLzJ9ubi9BNfcM5TRbPNMQ=; b=TZ7LHm4sIRbUQKMmZMrMyY2G0WWL/YnsKtTd9Ot6eIwk9gjLmTc0I/8kfSBXEzIF66 /cwYJRd8vQ5sb4oX6LzqcUOnzbSfeT3Rhcg+AW8E8JUUP8IInYdLqR8SyNZiQtNSxCxY rlzhiCi3jKBe5BrQLS/jFXRLLBicgwcXDMdN+kMwqpLmDkNEqak43ZNPi7zRCoKf5sp9 /63Iw7pdmYmM+hA/tuR3BTeNfBXT4Uz9MB9NsNNT9AozVrSp1xiG3SPFGyGQHLoeskxb m8/TAuHji/0r8phF6KPMyr1hZB9i4PZWeb65DdtVITXpwe+6c16XWYJHNCGbO9BIRfH1 Cq4A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3w4N2DHk7eLyNkxkmFEC6PLzJ9ubi9BNfcM5TRbPNMQ=; b=jegF7ARRfQIkkIR29HzOJhEuU5uBTovx83ZUbvIYxbjXhJ3Wfv5SoEj1MWHJjfbL+T 3PCQEbDn/49fm4tvSzYm9feiRz3Guzq8OwEmspoonJ6eT+MeujxzB54CoTOOnAN2c8fX avLz+5WwOFBh9KpuO61hQxgIhtekNjcxZRb0xfcPT9qr6V695wMY/e+lRu43eKphwt33 RsGSIf0CVbqmy+quLpJW+vU0KcyI3ifitNag477CRpiSuRgOxuLxJuEQw4nbuXgmD7IM uyQa0x+qZqlydOx5AxJwAk/b+StU7eKj3NWix0qFxpQwsGzj5NaJBepWPY5FeYPLKpNh i0jw==
X-Gm-Message-State: ANoB5pkaHb+IO4SBetQdjUFodYPFgopMUy4PWOlDOD00wgrVsWEZA070 eJM4bsuVdWDbRbIOi5cG2MVeBTEntWPjIw93j5Uxyy0k0nc=
X-Google-Smtp-Source: AA0mqf6yn9/puTE4uPNmmVqcZsSzJZN1Q43bHsFacwHKHslQ+5FyHsWyucmXRWbjk2/bTxOMRrnrSJ2aThes6AH5kqA=
X-Received: by 2002:aa7:c983:0:b0:461:c6e8:452e with SMTP id c3-20020aa7c983000000b00461c6e8452emr14698006edt.298.1668517697170; Tue, 15 Nov 2022 05:08:17 -0800 (PST)
MIME-Version: 1.0
References: <81ff0e3b3bad4e6a892e3aa005aa9e9a@huawei.com> <CA+wi2hN2UpY4ZX51ofWfXDoPu3vW+8zZLtL4LDXrH2sh605Osw@mail.gmail.com> <Y3NllOEy2JdJ2NOZ@faui48e.informatik.uni-erlangen.de>
In-Reply-To: <Y3NllOEy2JdJ2NOZ@faui48e.informatik.uni-erlangen.de>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Tue, 15 Nov 2022 14:07:40 +0100
Message-ID: <CA+wi2hPg8xKkEQzyvNYsTT82hcKVh79Pdi70d9bbsXD+2KLSbw@mail.gmail.com>
To: Toerless Eckert <tte@cs.fau.de>
Cc: Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org>, "bier@ietf.org" <bier@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000a42f4405ed820c7b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/bier/_WwICXxZ6yMsX2BP15bt2AKi2c0>
Subject: Re: [Bier] ASIC restrictions
X-BeenThere: bier@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "\"Bit Indexed Explicit Replication discussion list\"" <bier.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bier>, <mailto:bier-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bier/>
List-Post: <mailto:bier@ietf.org>
List-Help: <mailto:bier-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bier>, <mailto:bier-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Nov 2022 13:08:23 -0000

Well, roughly agreeing with Toerless but couple more observation

1. the problem with SRH kind of thingies is not only the de-facto
unlimited size, compression is bad already (complex, complex stuff on the
die and if you have a bug it's end of the world mostly compared to a
software bug in control plane) but the fact that SRH'ilk stuff can show up
at _any_ offset is what drives silicon folks bats. The _love_ fixed
offsets, that can be implement in gates very, very, very efficinently (i.e.
high throuput at low power), anything that is variable format, variable
offset is anathema to those guys so needs to be avoided. Hence e.g. the
interim in BIER ages ago where feedback from Juniper silicon foks was
strong "put the bitmask last since it varies in lenght" ;-) Thankfully that
has been adopted

2. recirculation is a complex beast. _If_ the chip does NOT allow for
mucking aroiund with most header/payload after turn around it's mostly
really just descriptors being circled around. That's decently cheap. _If_
you want to muck with header deeply (which in case of BIER and definitely
SRH is the case) then you start to shuffle aroiund payload copies, full
header copies, whatever and _THAT_ is killing BW on the die

all depends heavily on the specific die/vendor/chip architecture of course
so take that all as rough ballpark reality outline

-- tony

On Tue, Nov 15, 2022 at 11:10 AM Toerless Eckert <tte@cs.fau.de> wrote:

> Thanks, Vasili, Tony
>
> My experience has been that you need to throw really big money
> opportunities into
> the front of product marketing, which then goes to chip development (or
> OEM) and
> requests better hardware.
>
> I lived through that when platforms/chip-designers complained bitterly
> about
> moving from IPv4 to IPv6 with TCAM or MTRIE lookups, and of course, when i
> wanted
> to have IPv6 (S,G) lookups (256 bits!) they complained bitterly again.
> Until
> luckily someone asked for IPv6 5 tuple (unicast) ACLs, which required even
> longer lookups
> and in result enabled (S,G) multicast lookups as well.
>
> And i am not going to repeat all the details of the challenges that we as
> an
> industry did have with High Energy Physics jumbo packets and the like.
> Several similar
> stories..
>
> So, it is definitely very prudent to be able to show value of any new
> system with
> the currently feasible lookups such as 256 bit (which is why we
> concentrated out
> initial simulation on that), but also have the system be able
> to scale up to longer lookups, so big customers/use-cases can demand
> support in
> future generations. Aka: i would not want to bet on such a strict "longer"
> lookup requirement as IPv6 had it when IPv4 was standard. But once there
> are use-cases
> thst make money and they go into a hardwre churn phase, there are always
> opportunities
> to get more/longer lookups if shown beneficial.
>
> Wrt SRH: I still would like to write a draft explaining how i could see an
> SRH equivalent stateless multicast header, and we can discuss the length
> aspect
> then. In general, i wonder, when we are talking in SPRING about compressing
> the steering header elements, why we could not also consider compressing
> the base
> header. The IoT community has done a lot more comprssion of IPv6 (and
> transport)
> over the years, but they can throw a lot of per-packet CPU at the problem,
> whereas in our case, we may just be happy with all of IPv6 except for
> shorter
> addresses for a limited domain. Just saying.
>
> Wrt recirculation: Whether or not that is actually something you want to
> avoid
> very much depend on the chip design. It does seem to be undesirably
> expensive
> on Tofino, but for well built chips, where you have some N packets
> throughput,
> you design forwarding such that you can input some e.g.: N * 1.1, and any
> recirculated
> packet just takes away from that input bandwidth. For sustained throughput,
> you would not even need 1.1, but just 1.0 because ultimately, you can not
> send out
> more than N packets anyhow,  whether you receive and send N unicast
> packets, or
> you receive N/100 multicast packets, each of which is replicated 100
> times. You
> still do N packet (header) processing, without or with recirculation. 1.1
> is only
> needed to allow during recirculation of packets to simultaneously process
> enough
> arriving (unicast) packets so that there is not (too much) head of line
> blocking.
>
> It is quite common though to have replication through lookup of multicast
> tables, such as presented in Steffen/Michaels P4 presentation, but i am not
> too sure, how much post-processing is then still possible. One has to
> remember that
> every BIER packet copy can have a different bitstring, because of the bit
> clear
> operation which is specific to each copy. Usually, the only post-processing
> on platforms with multicst table lookup is then limited to some L2
> adjacencies.
> If Tofino can actually still do a per-packet rewrite of the bitstring (as
> required by BIER)
> after such a replication, then that would speak for Tofino. If Tofino can
> not
> do that, then that would speak for BIER-TE, because BIER-TE does
> only require an ingres clearing of bits, common for all packet copies
> (except for
> one crazy optional feature).
>
> Cheers
>     Toerless
>
> On Fri, Nov 11, 2022 at 11:32:28AM +0000, Tony Przygienda wrote:
> > yes, discussion has been many times but in 2 quips
> >
> > 1. high end chips at least are not really limited by the overall size but
> > the size they can put on the "really, really fast packet header
> > processing scratchboard". And then you recirculate if you blow out that
> > size. Which halves the throughput and hence the problem. That's, as you
> > have seen in e.g. P4 work that was presented, makes things that work fine
> > in smaller labs, networks, not feasible in large deployments in terms of
> > economics. Making the scratchboard wider is very expensive in terms of
> > die/power etc/etc obviously so it needs really strong economic drivers.
> > 2. BIER introduced a way to deal with the problem via the concept of sets
> > (and subdomains) and that's about the best engineering tradeoff on high
> end
> > chips that's viable. ON good chips replicating a packet three times
> instead
> > of twice is till much cheaper than recirculating from engineering
> > perspective. In very loose terms explanation lies in the fact that
> holding
> > stuff on-chip is very expensive, pushing it out the box is significantly
> > cheaper (in a sense timexdelay buffer is way cheaper than on-die buffer)
> >
> > BIER has been built as architecture by folks that were fighting those
> > silicon things from the early days of the IP technology and many of them
> > _way_ before ;-)
> >
> > --- tony
> >
> > On Thu, Nov 10, 2022 at 6:40 PM Vasilenko Eduard <vasilenko.eduard=
> > 40huawei.com@dmarc.ietf.org> wrote:
> >
> > > Dear multicast experts,
> > >
> > > I am not subscribed to this alias, I am not even a multicast expert, I
> am
> > > a stranger here.
> > >
> > > When I heard today the comment that 1280B+1280B is the challenge from
> > > MTU's point of view – I reacted that it is not the biggest problem, a
> much
> > > bigger problem would be the overall size of all headers that particular
> > > Chip could process (no way for 1280Bytes, never!).
> > >
> > > Toerless Eckert asked me to put my comment here.
> > >
> > > Well, I did believe that it is well-known and some sort of obvious.
> > >
> > > Some routing switches (for DC/Cloud) still have the restriction
> 128Bytes
> > > for all headers (including MAC, VLAN, GTP, SRv6, etc).
> > >
> > > Some high-high-end Telco routers are capable to process 384bytes – it
> is
> > > probably the upper limit now (again, for all L2-L4 headers).
> > >
> > > When I have seen the first BIER IETF presentation in 2016 - it
> immediately
> > > comes to my mind that “not all vendors would be capable to implement
> this”.
> > >
> > > It would be not polite to say at least 2 names here – I could not
> respond
> > > for other vendors.
> > >
> > > I have seen 1k routers network (and many in the range between 256 and
> 512
> > > PEs), but 1k is already 128bytes just for BIER bit-field.
> > >
> > > Especially problem would be in the combination with SRv6 which could
> have
> > > an SRH header up to 208bytes.
> > >
> > > Of course, it is possible to break E2E, split for Areas/Domains, and
> > > stitch by Gateways.
> > >
> > > And you discussed today the way how to localize bit patterns – it is
> > > probably some sort of automatic split for Areas (I did not read the
> draft).
> > >
> > > I did always believe that “headers size” is the BIER's primary problem.
> > > Hence, it was probably discussed here many-many times.
> > >
> > > Sorry, if I stepped on something well discussed again. As I said: I am
> a
> > > stranger. I am just passing by. Sorry, for the point if it is a triple
> > > duplicate.
> > >
> > >
> > >
> > > Your architecture is so nice (stateless) – I like it very much. And I
> am
> > > sure that my employer has no problem with header sizes. BIER foreverJ
> > >
> > >
> > >
> > > [image: cid:image001.png@01D3A7DF.E7D86320]
> > >
> > > Best Regards
> > >
> > > Eduard Vasilenko
> > >
> > > Senior Architect
> > >
> > > Europe Standardization & Industry Development Department
> > >
> > > Tel: +7(985) 910-1105, +7(916) 800-5506
> > >
> > >
> > > _______________________________________________
> > > BIER mailing list
> > > BIER@ietf.org
> > > https://www.ietf.org/mailman/listinfo/bier
> > >
>
>
>
> --
> ---
> tte@cs.fau.de
>