Re: A common problem with SLAAC in "renumbering" scenarios

Nick Hilliard <nick@foobar.org> Thu, 14 February 2019 10:01 UTC

Return-Path: <nick@foobar.org>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7C74B12870E for <ipv6@ietfa.amsl.com>; Thu, 14 Feb 2019 02:01:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Level:
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6tNGnHJrIx1F for <ipv6@ietfa.amsl.com>; Thu, 14 Feb 2019 02:01:23 -0800 (PST)
Received: from mail.netability.ie (mail.netability.ie [IPv6:2a03:8900:0:100::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C760712D4E6 for <ipv6@ietf.org>; Thu, 14 Feb 2019 02:01:22 -0800 (PST)
X-Envelope-To: ipv6@ietf.org
Received: from cupcake.local (089-101-195156.ntlworld.ie [89.101.195.156] (may be forged)) (authenticated bits=0) by mail.netability.ie (8.15.2/8.15.2) with ESMTPSA id x1EA1JUx024566 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 14 Feb 2019 10:01:19 GMT (envelope-from nick@foobar.org)
X-Authentication-Warning: cheesecake.ibn.ie: Host 089-101-195156.ntlworld.ie [89.101.195.156] (may be forged) claimed to be cupcake.local
Subject: Re: A common problem with SLAAC in "renumbering" scenarios
To: Mark Smith <markzzzsmith@gmail.com>
Cc: Brian E Carpenter <brian.e.carpenter@gmail.com>, 6man WG <ipv6@ietf.org>
References: <60fabe4b-fd76-4b35-08d3-09adce43dd71@si6networks.com> <m1gptWx-0000G3C@stereo.hq.phicoh.net> <69609C58-7205-4519-B17A-4FBC8AE2EA16@employees.org> <d40b41c3-ff1b-cab4-a8de-16692a78e8fd@go6.si> <D1E45CAD-08D0-43D4-90F7-C4DD44CB32C0@employees.org> <alpine.DEB.2.20.1902041330531.23912@uplift.swm.pp.se> <46B8DB92-DC81-4242-9780-0D00FB6BDB7A@employees.org> <1c7ebabb-d6f6-d877-d4aa-d6c0fc7d5c60@go6.si> <6278.1549471453@dooku.sandelman.ca> <CAO42Z2xdKtLJV11KXELBKca6CWn=B6Avz6bO_94kFFXaKiZ-pQ@mail.gmail.com> <4602.1549908472@localhost> <CAO42Z2w1swQNuwnrOyTCEMXt0NSyrBx7Ww3kUN-7dfEV=fvk3A@mail.gmail.com> <c16e0e1f-1ed2-ad88-80f1-070bdd8bccca@go6.si> <1F2C2AEE-1C7D-481C-BBA7-7E507312C53A@employees.org> <e56a6e5b-648d-200e-c35d-97f15a31fb2a@asgard.org> <CAO42Z2zh7fKAgQJq9aLCTiFoSSsTeGM=pK3gXitg+gcxH=9fhQ@mail.gmail.com> <d38857c2-6e92-91d6-bb5d-d3eeeb61276a@gmail.com> <CAO42Z2yb47OyXk__Sz-kO00pfcBJgLAhff5DF=mpAddR0iCnAA@mail.gmail.com>
From: Nick Hilliard <nick@foobar.org>
Message-ID: <2612280f-195a-ae7a-b3b1-9022d9282fa7@foobar.org>
Date: Thu, 14 Feb 2019 10:01:18 +0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 PostboxApp/6.1.10
MIME-Version: 1.0
In-Reply-To: <CAO42Z2yb47OyXk__Sz-kO00pfcBJgLAhff5DF=mpAddR0iCnAA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/CHMi6-y8kk3pXVjpFV3M1cjoUmA>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Feb 2019 10:01:25 -0000

Mark Smith wrote on 14/02/2019 04:46:
> So to me, the cheapest and best solution is for the an ISP to provide
> a static/stable PD prefix to the customer (even though I've written
> code to try to mitigate it i.e. . DeprecatePrefix and
> DecrementLifetimes in radvd in the past.)

Mark,

Largely, I agree with your rationale on this.  The problem is that ISPs 
will be loath to commit to a stable prefix for the lifetime of a 
customer account.  This approach doesn't scale for larger providers 
because it commits them to making routing compromises for the lifetime 
of customer accounts, e.g. injecting dynamically assigned prefixes into 
their routing tables.  Accretion of configuration fossils causes long 
term complexity problems.

Stepping back though, connectivity loss due to stale PDs is caused not 
by failure of a single component, but by a cascade of failures and 
miscommunications between multiple discrete components;  the core 
problem is how to deal with the transfer of state.   Most of the 
discussion in this thread has been centered around three ideas: changing 
SLAAC, state survival in CPEs, and suggesting that SPs implement design 
changes.  My take is that we should make suggestions about fixing all 
three problem areas, i.e. add a section to slaac-renum about SP design. 
Whether this would belong in a standards track document is open to 
question, though.  Perhaps it would be useful to document the proposed 
SLAAC changes in a ST document and have a separate document describing 
the stale PD problem, with guidelines for CPE vendors and SPs as to how 
the problem can be worked around.

Modifying SLAAC would be likely to take years to roll out to end hosts.

The other two suggestions - behavioural changes to CPEs and design / 
implementation changes on service provider networks - will require 
buy-in from cpe vendors and service providers.  Barbara's comments about 
equipment implementation limitations are spot on, and I think we will 
see some push back from SPs on this.

Regarding the discussion on DUIDs, it's hard to imagine why a CPE which 
can provide a persistent DUID and which has an active lease on a 
particular prefix would be provided with a different PD reply if it 
issues a request or renew after a reboot. There's an argument to make 
that SPs should refrain from punishing their customers with the 
limitations of their provisioning platform, particularly in a situation 
which can kill v6 connectivity for extended periods of time like this.

Conversely, a different DUID means different lease parameters, so the 
correct behaviour would be for the SP to assign a different dhcp reply, 
assuming the original lease was still active.  A SP can't generally 
second-guess the intentions of the CPE if it requests different parameters.

tl;dr: complex problem; no single solution.

Nick