Re: [Anima] ANIMA when there is a system-wide issue

Toerless Eckert <tte@cs.fau.de> Tue, 23 February 2021 11:04 UTC

Date: Tue, 23 Feb 2021 12:04:28 +0100
From: Toerless Eckert <tte@cs.fau.de>
To: Michael Richardson <mcr+ietf@sandelman.ca>
Cc: Anima WG <anima@ietf.org>
Message-ID: <20210223110428.GA57749@faui48f.informatik.uni-erlangen.de>
References: <136aa329-41a5-8b65-ef9e-fadf089696eb@gmail.com> <704b66e9-d41c-f7e9-7e4b-f2d934ec9158@gmail.com> <PR3PR07MB68265F26A2CFB818D9CFDFBCF3C30@PR3PR07MB6826.eurprd07.prod.outlook.com> <20210128160356.GB54347@faui48f.informatik.uni-erlangen.de> <17274.1611866107@localhost> <20210211201910.GA48871@faui48f.informatik.uni-erlangen.de> <18842.1613083225@localhost> <20210216174315.GB48871@faui48f.informatik.uni-erlangen.de> <4218.1613756735@localhost>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <4218.1613756735@localhost>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/jlQIKT68kjXwQhP5Fu14vs_putU>
Subject: Re: [Anima] ANIMA when there is a system-wide issue
Precedence: list

On Fri, Feb 19, 2021 at 12:45:35PM -0500, Michael Richardson wrote:
> 
> EXECSUM: we need IEEE802.1X political clue.
> 
> The WG should adopt my l2-friendly-acp document as a problem statement
> document, and then we should propose a solution which can be discussed
> outside of the IETF.

Personally i feel i would like to see a good amount of the explanations and
clarifications that we exchanged in this thread to first go into this or
any other draft before i personally would feel its appropriate for WG
adoption. Let me see that i can find some time to help.

> You are asking L2 switches, which don't speak any L3 protocols in their
> forwarding engine, to do filtering on L3 things, namely L3 destination
> multicast addresses.

Actually, for such a switch to become useful, we do expect more, and maybe
it wold help to write down those different "more" options that we think to
understand. Aka: in the most simple form, we expect that the switch is
able to do ACP processing in software - and then the packet filtering
is the only "HW" requirement.

I am still fundamentally struggling with the operational model of the
ACP when it can have significantly different forwarder speeds. But thats
because i saw perfect deployment opportunities dwindle because of implementations
that where only in software.

For example using the ACP for the RFC8368 use case and then going through
the calculation about the maximum performance required by a controller
to update/change configs across a wide range of devices while having all
this traffic go through a switch in the NOC (close to the controller)
that only implements ACP in CPU software forwarding.

So i am wondering if we would even have  good plan how to solve this
issue. FOr example: Could i define the ACP such that during enrollment,
the registrar could determine that the switch sucks (too slow) for the
network and asks it to turn into an ACP host instead of an ACP router
(and data-plane switch) ? That to me looks like one possible option
to let operators decide. And its a great option really only available
with L2 switches (not routers).

I think this would be easy to define except for this missing
linkage between BRSKI and ACP: If i wanted to push this info to
such a device today (ACP router vs. ACP host), i'd have to come
up with a new flag in the AcpNodeName. Which may be ok, and maybe
even helpfull security-wise (consider any RPL info from such a system
except for its own route to be an error and ignore it...  ?)

But having some parameters that can be given at the tailend of
BRSKI to the node would be lovely too. I think we discussed this
in the past.

>     >> While this is deterministically mapped to a unique L2 multicast MAC,
> 
>     tte> What is "this" ? Sorry.... can not parse again.
> 
> The L3 multicast address is mapped on ethernet to a deterministic L2
> multicast address.  So, an L2 switch could filter on that, but unless it then
> examines the L3 address, it could catch future L3 multicasts that happen to
> map to the same L2 multicast.  When we allocated the ACP l3 multicast
> destination, we did not make any attempt to make it a unique L2 target.
> Assuming that there are no existing conflicts, to avoid the above, we would
> need to allocate all the other L3 multicast targets that map to the same L2
> multicast target.

Right. Any implementation of what i described in ACP would need to
software forward any L3 multicast address traffic that is aliasing to
the DULL GRASP L3 multicast address. Upon quick browsing i think its
not even mentioned in RFC4541 (problem for every IGMP snooping implemention).

>     tte> In my past implementation experience, the punt option that always
>     tee> works is one where you punt a specific ethertype.
>     >> Great, so one of the options I proposed is to have a new ethertype for
>     >> IPv6-DULL messages, as you also say below.
> 
>     tte> My point was that its not only about the discovery packets. We also
>     tte> want to make sure that the ACP secure channel packets could also go
>     tte> across blocked ports.
> 
> Yes, that's a different problem, but I agree it is related.
> 
> For the L2 SDN that does not use STP because it does not want blocked ports,
> but rather wants to use all the bandwidth, the problem is keep the ACP DULL
> multicast from causing loops.

Uhmm.. not clar. Forget ACP DULL... You have an ethernet without
STP but with rdundant paths. How do you avoid loops ? Do you use one of
the IEEE SPF alternatives to STP ?

>     mcr> I think that the IEEE/IETF liason can get us an ethertype, given an
>     mcr> appropriate STD track draft. Tell me if that you think this is within
>     mcr> the ANIMA WG's charter.
> 
>     tte> Different encap to make ANI more resilient. Sounds to me like
>     tte> perfectly in scope of BRSKI/ACP extensions charter point.
> 
> Good, we agree.
> 
>     tte> I just think we shouldn't rush this but think it through well:
> 
>     tte> First of all, relying on a separate MAC address for ACP is the most
>     tte> eay way out.  But its not clear to me if thi works on every
>     tte> hardware. On the other hand, i can see how i would always have a
>     tte> separate MAC address for ACP if for example i have classic PC
>     tte> hardware and implement ACP on the BMC (e.g.: as extension to
>     tte> OpenBMC). But many switches/routers also have a MAC address pool
>     tte> they can use.
> 
> Having the ACP on the (Open)BMC is definitely a killer use, and I hope to get there soon.

Last time i checked OpenBMC git was very confusing, seemed like mostly
facebook internal adoption, but couldn't figure out any option i could
easily buy individually as an experimentation platform.

> One might still want the ACP running even if the context of a BMC user who
> shoots eirself in the foot.
> The Linux kernel gives one the "macvlan" which is effectively a kind of
> bridge (actually mutually exclusive with being in the a bridge).
> The macvlan gets a kernel allocated randomized mac address, and can be moved
> into a network namespace and effectively hidden, however, there does not seem
> to be a way to keep the physical interface from being marked down.

Right. Ideally you would have SrIOV to create a PCI-bus level disjoint
ethernet interface for the ACP. Alas, today, like MacSEC this is an
option only on high-end Ethernet PCI controllers. Or else you have to much
around in he linux kernel to create protectiona against unintended shutdowns.

>     tte> When it comes to encapsulating across new ethertype, we need to make
>     tte> it extensible, thats IEEE requirement for such a scarce resource. So
>     tte> we might even want to ask around IETF for similar use-cases to have
>     tte> candidate other values for a selector field for example.
> 
>     tte> Need to read through EAP over Ethernet to check what we could
>     tte> share. I forgot all about it.  But ultimatly, its going to be a
>     tte> small "selector-header" on top of the new ethertype that we need to
>     tte> define.
> 
> You want chapter 11 of 802.1X-2020.
> Table 11-3 lists the 9 EAPOL types used.
> No equivalent to IANA Consideratons exist, so I think that it would require a
> revision by the IEEE to allocate a code.  That would really be enough.

Right. I didn't mean to use EAPOL. I meant to document all the arguments why NOT
to use  it, but then also reuse all the reuseable ideas that we need.

>     tte> I can see at least two selectors: ACP-secure-channel and Discovery
>     tte> (DULL-GRASP).
> 
> Both are IPv6 with LL src/dst, different protocols, so as long as we can
> carry IPv6, then we can do both.
> 
>     tte> I have not thought about use-cases for e.g.: just BRSKI without ACP.
> 
> BRSKI can use the DULL GRASP channel to get access to a Join Proxy.
> If used without ACP, then the certificate enrollment could well lead to use
> of EAP to connect.  That's an important connection.
> 
>     tte> If you read up on the public slides i did eternities ago for my
>     tte> prior employer about that vendors (somewhat proprietary) ACP
>     tte> implementation in the context of service provide ethernet services,
>     tte> where a customer L2-ethernet service is multiplexed across a service
>     tte> provider L2VPN service, then those type of service-edge nodes have
>     tte> all type of pain for discovery protocols such as CDP/LLDP, e.g.: for
>     tte> their own instance and for the customer instance. So LLDP has
>     tte> explicit provisions for this, and i have to find time to swap in the
>     tte> context of that stuff again. Much easier if we had in-person IETFs
>     tte> and someone in the know like Norm Finn could remind me ;-))
> 
> Yes, I recognize that.  There is a table 11-1 in the 802.1X document that
> deals with this.  It seems broken that an end system has to know whether it's
> in a some kind of L2-service to do it's announcements, etc.
> 
>     tte> As in: lets make sure that our selector field header could support
>     tte> the same flexibility for those use-cases. At least to the extent
>     tte> that we know for a rev0 spec how many "reserved/ignore" bits we need
>     tte> and then fill them in later.
> 
> My reading is that they accomplish this via table 11-1, and if we were to
> build upon EAPOL, that it would all come for free.

Yepp... more to digest for me here.

Cheers
    toerless

> --
> Michael Richardson <mcr+IETF@sandelman.ca>   . o O ( IPv6 IøT consulting )
>            Sandelman Software Works Inc, Ottawa and Worldwide

-- 
---
tte@cs.fau.de

[Anima] ANIMA when there is a system-wide issue Brian E Carpenter
Re: [Anima] ANIMA when there is a system-wide iss… Michael Richardson
Re: [Anima] ANIMA when there is a system-wide iss… Brian E Carpenter
Re: [Anima] ANIMA when there is a system-wide iss… Brian E Carpenter
Re: [Anima] ANIMA when there is a system-wide iss… Ciavaglia, Laurent (Nokia - FR/Paris-Saclay)
Re: [Anima] ANIMA when there is a system-wide iss… Brian E Carpenter
Re: [Anima] ANIMA when there is a system-wide iss… Michael Richardson
Re: [Anima] ANIMA when there is a system-wide iss… Toerless Eckert
Re: [Anima] ANIMA when there is a system-wide iss… Toerless Eckert
Re: [Anima] ANIMA when there is a system-wide iss… Michael Richardson
Re: [Anima] ANIMA when there is a system-wide iss… Toerless Eckert
Re: [Anima] ANIMA when there is a system-wide iss… Michael Richardson
Re: [Anima] ANIMA when there is a system-wide iss… Brian E Carpenter
Re: [Anima] ANIMA when there is a system-wide iss… Michael Richardson
Re: [Anima] ANIMA when there is a system-wide iss… Toerless Eckert
Re: [Anima] ANIMA when there is a system-wide iss… Toerless Eckert
Re: [Anima] ANIMA when there is a system-wide iss… Michael Richardson
Re: [Anima] ANIMA when there is a system-wide iss… Toerless Eckert
Re: [Anima] ANIMA when there is a system-wide iss… Michael Richardson