Re: [v6ops] Primary/failover use-case for draft-fbnvv-v6ops-site-multihoming ?

Nick Buraglio <buraglio@forwardingplane.net> Wed, 26 July 2023 16:42 UTC

Return-Path: <buraglio@forwardingplane.net>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8F46EC1BE89E for <v6ops@ietfa.amsl.com>; Wed, 26 Jul 2023 09:42:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.104
X-Spam-Level:
X-Spam-Status: No, score=-7.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=forwardingplane.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 43RJkGMO7nxM for <v6ops@ietfa.amsl.com>; Wed, 26 Jul 2023 09:41:59 -0700 (PDT)
Received: from mail-4317.proton.ch (mail-4317.proton.ch [185.70.43.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1DFC9C1BE89D for <v6ops@ietf.org>; Wed, 26 Jul 2023 09:41:59 -0700 (PDT)
Date: Wed, 26 Jul 2023 16:41:48 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=forwardingplane.net; s=protonmail; t=1690389716; x=1690648916; bh=XSF7p5yR/clkzkVvcg+T3n/Qvq3L5Z9OJ8KlTr9p2zU=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=QvKHN5LSWyFkt5JExblQgH0MZ9eskxrHqv3kJNhJrajRqsNOoXUE84sWzBUesAYWb HdlyoKMFxTi9MisUyCqkg/LDX+sPUvfrRPRKO++Gp8B1D46TPa49KUDiBewZ94dP3v BmXWCT2GZUrdWa4zTQaVQZuXpSLqr1XpGwz5AxzK9BiuZH/YBX5vEbVqq7JBTnB/ga +WRdv35cyip/GWwJRePKeJKu/ngJHNZfC/ypevlVM39hfXv6V7caxM0lyMfgNlH/xS oYM6Vbf1AFfQ9hCzcAi426WkYGFdtlkqXTuXQWCwamIA8X7xAibincS+w1sTAE55+W 4roKAhqdpbtKw==
To: Erik Nygren <erik+ietf@nygren.org>
From: Nick Buraglio <buraglio@forwardingplane.net>
Cc: Vasilenko Eduard <vasilenko.eduard=40huawei.com@dmarc.ietf.org>, "v6ops@ietf.org list" <v6ops@ietf.org>
Message-ID: <96F9D234-902F-4BB2-A4DA-0909500C80B9@forwardingplane.net>
In-Reply-To: <CAKC-DJhL8wr6pQhZT2kCqTwKvb2SghX_NX+0XzLR87sGjB+EhA@mail.gmail.com>
References: <168872027038.54873.9391913547328336551@ietfa.amsl.com> <eee131c5b7214a0eb2d9fa9aa7adbd17@huawei.com> <CAKC-DJhL8wr6pQhZT2kCqTwKvb2SghX_NX+0XzLR87sGjB+EhA@mail.gmail.com>
Feedback-ID: 79645396:user:proton
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="b1_MppGiePe3KD4xm0E4bynLhHIptjR01w3XGhvO5yws"
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/T5Aqp7qpdKypDKkQBDUzWoiuTpE>
Subject: Re: [v6ops] Primary/failover use-case for draft-fbnvv-v6ops-site-multihoming ?
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Jul 2023 16:42:04 -0000

> On Jul 25, 2023, at 6:23 PM, Erik Nygren <erik+ietf@nygren.org> wrote:
>
> Thanks for the updates to draft-fbnvv-v6ops-site-multihoming. While this has a long history and lacks good solutions today, I think this is a critical problem for us to come up with easily usable and functional solutions for, at least for the most important use-cases.

> One use-case which might be worth having specific focus on in the draft would be the primary/failover scenarios. For home and SMB users this may be one of the most important cases to make "just work" but explicitly having a primary/backup asymmetry may allow for compromises when on "backup" and may allow for simplifications that aren't possible when trying to operate meaningfully in active/active. It could be that a way to approach this might also be to have a draft very much focused just at the primary/failover case and giving some guidance on how CPE devices may be able to handle this today, along with the downsides and things that might break after a failover.

Primary/backup was the bulk of what I was looking at, and I definitely agree that this is the largest use case in the wild, at least at the level we are thinking about here. My experience has shown me that active/Active is achievable using similar techniques as active/backup, but it’s even more clunky.

> For example, some things that might come into play or be part of a solution here:
>
> * When on the primary, you'd expect full e2e IPv6 to work (ie, normally no NAT66/NPT66/etc)
> * On failover, losing e2e and introducing some form of NAT/NPT/etc for IPv6 traffic may be a reasonable trade-off. Connections will break regardless, but a goal should be to allow clients to be able to re-establish them quickly.

The above is the only way I was able to get any notable success in creating a multi homed IPv6 deployment with the services available to a “normal customer”. NAT66 is significantly easier for this solution and definitely works as expected with the obvious caveats - which while unsavory to some, accomplish the goal of a failover. In fact, my 1G fiber had been experiencing some interruptions due to an upstream issue, and I was happily failing over to my DOCSIS for quite a few days without my even noticing, and 80% of my traffic is IPv6. This is a testament to “it may be ugly but it works”.

> * Trying to switch the network's IP addressing over to the backup network's addressing might be an option, but would need some guidance, especially since previously advertised prefixes will still be out there.

This is - at least today - inconsistent at best. I had to write some scripting to make it a bit easier, but it’s still a mess.

> * Doing some form of NAT66/NPT66/etc from either the GUA of the primary into the GUA space of the backup is one option. Perhaps this could happen during the transition. Depending on how long the backup may be in-use, it might be safer to just keep doing this for the lifetime and wait until near the end of the lifetime of the primary to start advertising the backup.
> * Having a ULA always present might also help to allow in-network/local connectivity (eg, to printers) but runs into the ULA-vs-rfc1918 priority issue.

Yup, this is exactly how I have done it. I have ULA as well for internal comms, and have adjusted source selection where possible, but it’s a fraction of the devices.

> * The prefix sizes from the primary/backup may be different (eg, T-Mobile US Home Internet that I use for backup only gives a single /64, so if normally using a /56 from a broadband provider with multiple active prefixes per use for different VLANs/SSIDs then some flavor of NAT66 is needed regardless).
> * Not everything will work in the failover mode, but keeping as much working as possible seems desirable.

> I worry that if we try to be purists here people will just deploy solutions outside the IETF (eg, what happened with NAT44) or this will result in another reason for IPv6 to be declared unsuitable and necessary to turn off, which I assume none of us one.

I 100% agree with this. My very strong opinion is that we would be woefully remiss to ignore working mechanisms and fail to acknowledge them, *especially* when they are undesirable.
It is better to control the narrative, and acknowledgment is absolutely not an endorsement. Controlling the narrative for things that are counter to a desired outcome is a key element in reducing the resulting blast radius.
Ignoring them is a license for someone else to write the stories ending, and NAT44 is a great example of how that turns out.

> Best, Erik

Thanks for your insightful input. It is very much appreciated.

----
nb