Re: [Int-area] New Version Notification for draft-li-int-aggregation-00.txt

Toerless Eckert <tte@cs.fau.de> Mon, 28 February 2022 17:40 UTC

Date: Mon, 28 Feb 2022 18:40:19 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: Tony Li <tony.li@tony.li>
Cc: int-area@ietf.org
Message-ID: <Yh0JAwD9Gv5w4/ep@faui48e.informatik.uni-erlangen.de>
References: <164367925561.21687.13323438769934745511@ietfa.amsl.com> <A5236BE8-2499-4E45-8B06-C131C4324611@tony.li> <YhiNEDhMoo2HRVPz@faui48e.informatik.uni-erlangen.de> <45325980-F4EC-483E-9D02-CBB208A3EDA4@tony.li> <YhkT+iYy/VVVoZNp@faui48e.informatik.uni-erlangen.de> <3F72EFE8-10ED-4185-A001-D5C06B14A862@tony.li>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <3F72EFE8-10ED-4185-A001-D5C06B14A862@tony.li>
Archived-At: <https://mailarchive.ietf.org/arch/msg/int-area/0wsrMywqVPpN1pMoH-oL6P-9yNE>
Subject: Re: [Int-area] New Version Notification for draft-li-int-aggregation-00.txt
Precedence: list

Thanks, Tony,

Inline

On Fri, Feb 25, 2022 at 01:17:17PM -0800, Tony Li wrote:
> > On Feb 25, 2022, at 9:38 AM, Toerless Eckert <tte@cs.fau.de> wrote:
> > I just ran against control plane resource limitations in products way more
> > often during the decades than i felt necessary knowing what control plane
> > performane would be possible with appropriately scaled CPU/memory. 
> 
> Well, here’s the problem: Internet usage is decoupled from all other applications. Data centers, even some of the hyperscalars, do not have the resource requirements of Internet routing. And yes, this exacerbated by the policies and practices of ISPs.

Haha. I will keep this paragraph for citation when i am again faced
with arguing how solutions for private networks do NOT have to meet
all the same requirements as those for the Internet. And that it is
NOT appropriate to invalidate this statement by starting that traffic
COULD and SHOULD always be able to leak between the two and hence
private network need to inherit all Internet requirements. This is
official IETF policy as can be seen in TSV review and specs like
RFC8084.

> It’s to the point where the resource and cost differential between ’normal’ products and ISP products have forked into two separate product variants. And even then, continuing to scale the ISP products is challenging.

> As a vendor, I should be happy about this: it all goes to better margins. However, as an engineer, I’m aghast. Most of this is waste. If we paid more attention to efficiency, we would be in a much better place. Unfortunately, efficiency is not sexy and cool.

I don't think this is true (not being sexy). The whole PE/P structuring is
also about efficiency.  On another thread, Jeff Tentsura just yesterday reminded
of that principle and that expensive packet processing is best done offpath and
that hop-by-hop forwarding should be cost/energy optimized. 

Now, when i listen to Geoff Houstons pitches, such as in last years
DINRG meeting (https://www.youtube.com/watch?v=1kbSbVjb1ZU starting at 14 mins), 
it seems to me as if we might be about to see "peak Internet (transit)",
aka: more and more traffic will be for a limited number of edge-DC services
from big OTT, and an even more declining percentage of traffic will be
to the "whereever else" locations in the Internet. 

In this environment, i would, in the same way as you propose unnecessary
traffic by shorter prefixes, go all the way for the "0.0.0.0/0" (default)
prefix in my high-speed edge-network, and then redesign my
"full routing table" Internet edge/transit devices to be ideally cost
optimized for the special Internet requirements: Highly parallelized
BGP on an off-the-shelf Intel/ADM blade-server plus a forwarding plane
with the aforementioned FIB entry optimizations.

Thats at least IMHO the reference model against which your proposal
should measure up in terms of cost/benefit. Because its clear that
your proposal would reduce routes. But how would we measure the cost/benefit ?

> > Maybe its not as bad now as it was in the recent past given how
> > Moore's law is changing, but my past experience was that dedicated
> > route processor boards could not compete in price and life-cycle agility
> > with general purpose data-center servers, but for Internet BGP, there
> > is IMHO no reason (other than desires for revenue), to NOT use general
> > purpose data center servers for Internet BGP routing tables. 
> 
> The volumes are ten orders of magnitude lower, so that’s no surprise. And just having routing without direct coupling to a forwarding plane restricts you to off-box applications, which is missing a key part of the problem.

The off-the-shelf blade-server to run routing on can perfectly well
have a 100Gbps ethernet into the forwarding chassis. I jut don't think
we should expect the price/overhead of that control plane instance to
anything more than commodity CPU/memory pricing.

> But while this is interesting, it is also irrelevant to the matter at hand: we can be more efficient in our routing if we want to.

Sure. And we can be more flexible in our non-Internet addressing and a lot
of other innovation we IMHO should do, but there too we are boing faced
with rightful questions about cost/benefit of comparable/other approaches.

> > If some of this is happening but in your opinion not a rational response
> > that would make it important IMHO to be dicussed in the document.
> 
> The point is to make routing more efficient, not to talk about FIB compression mechanisms.

I am sure registries/operators would like to see a quantitative costs/benefit
comparison.

> > The more aggregation you want to support, the more geographic structure
> > you introduce into the addressing space and the more you will also
> > run the risk of creating detriments against cross-geographic
> > shortcut links (oh no, a peering with you would raise the cost of
> > my routing table undesirably...). Aka we're trying to compare
> > capex cost for potentially overpriced CPU/RAM in routers with the
> > cost of operational processes in registries and operators. Thats
> > a difficult comparison.
> 
> And I’m not trying to force anyone one way or another. I’m simply documenting the tool.

Btw: It might be useful to go to OPSAWG to present and get feedback
on how to set up such aggregated routes at some particular
location in the Internet, because unfortunately to me it is not
quite clear how easy this would be in the face of this highly meshed
peering we have today.

> > Hmm. Then i misread your draft. It did sound as if there is an ask
> > in your document against registries to improve how they assign address
> > block such that better geographic aggregation than today was enabled.
> 
> Only for geographic addressing that is not already covered by per-registry block allocation. Largely, that means allocating blocks to Australia. :)

See above. Practical examples of how/where it would be applied and how
gain it would give in the face of more and more arbitrary
peering would be nice to see.

> >> That is simply not true. /8 is not somehow less optimal than /16. It depends on the topology, not the prefix length.
> > 
> > The shorter the prefix length of an aggregate, the more longer aggregates it
> > could have, right ?
> 
> That’s irrelevant. We are not concerned with potential downsides. The question is how can we improve efficiency.
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
???

> > In general we would not want a longer prefix because we want the aggregation
> > (with your proposal).
> 
> We do not want to carry longer prefixes when they point to the same next hops.  They add no value. And unfortunately, they are all too common.  How about we stop carrying them?

Probably best done by practical operartional recommendations for operators
how and when to insert those aggregates, and actually what the pro/cons
are (e.g.: how to exclude longer poison/blackhole routes from being aggregated, because
you probably till want/need those to escape).

Do we even have a YANG model to operationally express these procedures ?
I remember CLI configs, but haven't followed YANG ;-(

In any case, OPSAWG would IMHO be a good place to present and get feedback from.
If not even a better place as a home for such work than INTAREA (no strong opinion).

Public shaming portals are also helpfull, e.g.: where routing tables
are examined and worst offenders are named. Now you've got to find someone
brave enough to do this ;-)

There is also pre-established practice i think to already filter prefixes
longer than some max-length if i am not mistaken. So operators will ask
why/where that is not sufficient.

As an operator i also would love to have features that i think do not exist.
Such as alerts (YANG PUSH or the like) whenever i do see longer routes
"from the Internet" for a prefix i did aggregate. And of course there
are different cases to consider.

> > So now _if_ someone wants a longer perfix to go
> > on path where it would violate the aggregation, we call it traffic engineering.
> > But whats the operational mechanism by which one would decide this is
> > a permitted instance of traffic engineering or a bug / misconfiguration
> > in the aggregation scheme ?
> 
> First off, this is not about defining laws or rules.  This is not about ‘permitted’ or ‘bugs’.
> 
> The point is that a single operator can look and see that all of the next hops of a prefix and its more specifics are aligned. That operator can then choose to aggregate. It’s that simple.  Yes, traffic flow may be affected. The operator needs to decide whether or not that is problematic for them.

Sure. And a BCP guidelines RFC  (opsarea ?) would go a lot further than an architectural
definition of new terms... (IMHO...).

But again: i still wish there was some qualitative analysis of possible benefits.

Cheers
    Toerless

> Tony

-- 
---
tte@cs.fau.de

[Int-area] Fwd: New Version Notification for draf… Tony Li
Re: [Int-area] Fwd: New Version Notification for … Toerless Eckert
Re: [Int-area] New Version Notification for draft… touch@strayalpha.com
Re: [Int-area] New Version Notification for draft… Tony Li
Re: [Int-area] New Version Notification for draft… Toerless Eckert
Re: [Int-area] New Version Notification for draft… Toerless Eckert
Re: [Int-area] New Version Notification for draft… Tony Li
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… Joe Touch
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… touch@strayalpha.com
Re: [Int-area] New Version Notification for draft… touch@strayalpha.com
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… Toerless Eckert
Re: [Int-area] New Version Notification for draft… Toerless Eckert
[Int-area] tunneling and recursion (was: Re: New … Toerless Eckert
Re: [Int-area] New Version Notification for draft… touch@strayalpha.com
Re: [Int-area] tunneling and recursion (was: Re: … touch@strayalpha.com
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] New Version Notification for draft… Dino Farinacci
Re: [Int-area] tunneling and recursion (was: Re: … Dino Farinacci
Re: [Int-area] tunneling and recursion (was: Re: … touch@strayalpha.com
Re: [Int-area] tunneling and recursion (was: Re: … Dino Farinacci
Re: [Int-area] tunneling and recursion (was: Re: … Joe Touch
Re: [Int-area] tunneling and recursion (was: Re: … Dino Farinacci
Re: [Int-area] New Version Notification for draft… Tony Li