[Anima] some implementor comments on RFC8994
Michael Richardson <mcr+ietf@sandelman.ca> Wed, 22 September 2021 21:20 UTC
Return-Path: <mcr+ietf@sandelman.ca>
X-Original-To: anima@ietfa.amsl.com
Delivered-To: anima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B15023A07D1 for <anima@ietfa.amsl.com>; Wed, 22 Sep 2021 14:20:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qQ6W3G85ddLS for <anima@ietfa.amsl.com>; Wed, 22 Sep 2021 14:20:19 -0700 (PDT)
Received: from tuna.sandelman.ca (tuna.sandelman.ca [209.87.249.19]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E118B3A07C3 for <anima@ietf.org>; Wed, 22 Sep 2021 14:20:18 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by tuna.sandelman.ca (Postfix) with ESMTP id 33C02180D7; Wed, 22 Sep 2021 17:27:32 -0400 (EDT)
Received: from tuna.sandelman.ca ([127.0.0.1]) by localhost (localhost [127.0.0.1]) (amavisd-new, port 10024) with LMTP id jmb8QUainq-U; Wed, 22 Sep 2021 17:27:21 -0400 (EDT)
Received: from sandelman.ca (obiwan.sandelman.ca [IPv6:2607:f0b0:f:2::247]) by tuna.sandelman.ca (Postfix) with ESMTP id E7CF71809D; Wed, 22 Sep 2021 17:27:20 -0400 (EDT)
Received: from localhost (localhost [IPv6:::1]) by sandelman.ca (Postfix) with ESMTP id 659E240; Wed, 22 Sep 2021 17:20:03 -0400 (EDT)
From: Michael Richardson <mcr+ietf@sandelman.ca>
To: anima@ietf.org
cc: Minerva-project@lists.sandelman.ca
X-Attribution: mcr
X-Mailer: MH-E 8.6+git; nmh 1.7+dev; GNU Emacs 26.1
X-Face: $\n1pF)h^`}$H>Hk{L"x@)JS7<%Az}5RyS@k9X%29-lHB$Ti.V>2bi.~ehC0; <'$9xN5Ub# z!G,p`nR&p7Fz@^UXIn156S8.~^@MJ*mMsD7=QFeq%AL4m<nPbLgmtKK-5dC@#:k
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg="pgp-sha512"; protocol="application/pgp-signature"
Date: Wed, 22 Sep 2021 17:20:03 -0400
Message-ID: <27729.1632345603@localhost>
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/NPegpCKfF2fwQqd0E4gYEvV3gGw>
Subject: [Anima] some implementor comments on RFC8994
X-BeenThere: anima@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Autonomic Networking Integrated Model and Approach <anima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/anima>, <mailto:anima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/anima/>
List-Post: <mailto:anima@ietf.org>
List-Help: <mailto:anima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/anima>, <mailto:anima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Sep 2021 21:20:24 -0000
As some of you know I have been working on an RFC8994 (ACP) implementation since Nov. 2020. About 700 hours internally funded, with bursts of effort followed by weeks of it being lower priority. The code is in Rust, and it's my first major project in Rust. But, with sub-components in C (OpenswanX:IKEv2), and C++ (Unstrung:RPL) It is not yet integrated with my RFC8995 code, that comes next. I wanted to share a few thoughts. Since TL;DR>, I'm putting my discussion points first, and then the summary of architecture. Some things belong in IPsecME WG, but I'll post that separately at some point. This posting is from notes I wrote up at: https://github.com/AnimaGUS-minerva/connect/tree/main/doc I've managed to assemble all the pieces, and I see DIO/DAOs in the overlay network, but then I add a third node and things fail. Dr. Atwood and his students want to play with this code. I am now convinced that either my plans how to use VTI/fwmark with the Linux XFRM kernel stack will not work, or that I'm using it wrong, and I have to go back to a simpler test case. Things are falling apart when I'm trying to make the second tunnel. This email is not about that part. 1) the high-level IPsec policy that we do has essentially: myid= certificate othername=rfc8994....@example hisid= certificate othername=* But, also is very specific about the IP addresses. In Openswan terminology: ::/0===fe80::d4d4:9aff:fe37:72c0[E=rfc8994+fd739fc23c3440112233445500000300+@acp.example.com] ...fe80::e0d0:4eff:fee4:79d6[E=*]===::/0 First note: it is very odd to create a policy that accepts any identity on the right(remote), while actually locking the right hand side IP address down. In general, this creates a template policy which is then instantiated as remote peers arrive. Locking down to a single IP address, because of IPv4 NAPT44 could actually still have multiple peers on the right. Normally, such a template can not be initiated, but some changes were made to enable this template to initiate if the righthand IP address is present. Upon reflection, it might be that an override to the policy system to disable that the wildcard creates a template might have been better. This change is being considered, and I've hacked it in as a test. The template, plus single instance works, but is kinda a of a mess. I think that this will help both ends to know that the policy is active. Otherwise, the initial responding side sees that the TEMPLATE is not active, even though an instance of it is. **SPEC THOUGHTS*** An alternative is that the GRASP announcement could be extended to include the DN of the announcing node. This could be done by adding a fourth element to the locator that was the DER encoded DN of the announcing node. There are privacy implications of this as the DN contains the assigned ULA prefix for the node. 2) On flat (ACP-unware) LAN with a bunch of ACP nodes, the resulting set of tunnels could be O(n^2), to no real advantage. All systems essentially share most of the same fate being on the same L2 fabric. An example of such a scenario is a cabinet full (~40 to ~80) of hypervisor systems with an ACP system running in the hypervisor and/or in the BMC. (This assumes that the L2 fabrics between cabinets are not bridged, which they frequently are, via all manner of BGP-based DC stuff that the IETF RTG area has worked on) It's enough that each node connects to 3 or 4 other systems. At the RPL level, for 80 nodes in a cabinet, a fan-out of 2 or 3 means a depth of 5, which seems just fine to me. Otherwise, given no other feedback, (i.e. "ETX") RPL would just find the node with the lowest rank and that node would have 79 children. Of course, if the ToR switch was GRASP/ACP aware, that's exactly what we'd want, as that would exactly follow the physical topology. But in that case, the ToR switch would really be there to support such a load. We need two things: a) some way to translate something from IKEv2 into an ETX that RPL can deal with. IKEv2 is going to do Dead Peer Detection (DPD) messages across the IKEv2 PARENT SA. We could do some kind of round-trip calculation there. Of course, if DPD messages ever get lost, then that would go into the ETX calculation. b) an ability for a responding IKEv2 to say, perhaps in the R1 message, that the node is already at ideal capacity, and that perhaps bringing a tunnel up *now* isn't the best choice. In which case, the I2/R2 would proceed without a CHILD SA proposal (no tunnel). It's better to do this at R1, even though that isn't authenticated (yet!), because that avoids allocating CHILD SA resources that won't get used. I think that IKEv2 has some extensions involving gateway overcapacity that might help us here. 3) I had a lot of troubles with simultaneous initiation from both ends. This will be the subject of an email to IPsecME WG. RFC7296 has text but it needs slight clarification. I don't quite understand why I had such issues, but I think that the templates are partly responsible. That is, the occurances should have been rare, but I ran into them all the time. So I had to solve the problem. There are still some issues that I haven't quite figured out, and I've "solved" the problem by having some nodes never initiate. 4) I also noticed that there is a race condition between seeing the GRASP AN_ACP and setting up the policy. Node A says, "AN_ACP", "I'm here". Node B sees it, and initiates to Node A. But, node A hasn't seen node B's AN_ACP yet, so node A hasn't got a policy to talk to node A yet. The node B->A result is an IKEv2 authorization/authentication failure. Then node A will see node B's AN_ACP, install a policy, and initiate from A->B, and everything is fine. What could occur is that I could remove the very specific remoteip= in the policy, and have a less specific policy that accepted a connection for remoteid=* from any IPv6-LL. I'm not really crazy about that solution. I'm not actually sure that there is a problem. The issue is noise. 5) Whether I create ethernet pairs (for physical interfaces that are part of a bridge, which is common for hypervisors), or macvlan interfaces [which are bridges down inside the Linux kernel, and so conflict with them], I get randomized L2 addresses. Thus I get IIDs which change each time. Plus, the VTI interfaces that I create also have randomized IIDs as well. The result is that "up-arrow-return" to restart a test results in new IPv6 LL addresses each time. This is not exactly a problem for any of the protocols, but it is annoying for testing. I haven't figured out how to clean up for failing interfaces/tunnels yet. My tunnel interfaces are named "acp_XXX" [for incrementing XXX], and I fear that while a system runs, there will be more than 999 interfaces coming and going. I am thinking about having it be acp_HHHH, where HHHH is the last 16 bits of the peer's IPv6. But conflicts will occur at some point and mess stuff up. I could go to all 64-bits of IID. A reason to attempt to keep the interface names in the ACP that connect to the same peer host is for network monitoring. It's relatively easy with SNMP/YANG to make nice graphs for interfaces that keep the same name, even if the ifindex changes. -- Michael Richardson <mcr+IETF@sandelman.ca> . o O ( IPv6 IøT consulting ) Sandelman Software Works Inc, Ottawa and Worldwide
- [Anima] some implementor comments on RFC8994 Michael Richardson
- [Anima] Race condition [was: some implementor com… Brian E Carpenter
- Re: [Anima] Race condition [was: some implementor… Michael Richardson