Re: [v6ops] SLAAC renum: Problem Statement & Operational workarounds

Fernando Gont <fgont@si6networks.com> Fri, 01 November 2019 09:45 UTC

Return-Path: <fgont@si6networks.com>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 37571120838 for <v6ops@ietfa.amsl.com>; Fri, 1 Nov 2019 02:45:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sfZNT9nCyik1 for <v6ops@ietfa.amsl.com>; Fri, 1 Nov 2019 02:45:23 -0700 (PDT)
Received: from fgont.go6lab.si (fgont.go6lab.si [IPv6:2001:67c:27e4::14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3B34312023E for <v6ops@ietf.org>; Fri, 1 Nov 2019 02:45:23 -0700 (PDT)
Received: from [192.168.1.36] (unknown [177.27.208.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by fgont.go6lab.si (Postfix) with ESMTPSA id 887B486ACF; Fri, 1 Nov 2019 10:45:19 +0100 (CET)
To: Ted Lemon <mellon@fugue.com>
Cc: v6ops list <v6ops@ietf.org>
References: <CAO42Z2yQ_6PT3nQrXGD-mKO1bjsW6V3jZ_2kNGC2x586EMiNZg@mail.gmail.com> <B53CE471-C6E8-4DC1-8A72-C6E23154544F@fugue.com> <325e84aa-1703-e1ce-55a6-8790ceb7aff0@si6networks.com> <4C6471D4-0F5B-49EE-A38A-22AB2B87DA7E@fugue.com> <7007fd81-eae9-c165-c405-162b561f165a@si6networks.com> <69BD70A3-D9BF-48CB-9E68-D242333E9683@fugue.com>
From: Fernando Gont <fgont@si6networks.com>
Openpgp: preference=signencrypt
Message-ID: <7870ab3d-9570-4f79-df5b-3653ab3a6c54@si6networks.com>
Date: Fri, 01 Nov 2019 06:45:04 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0
MIME-Version: 1.0
In-Reply-To: <69BD70A3-D9BF-48CB-9E68-D242333E9683@fugue.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/zshkGPR3w0JV47twX1jEw8EldB4>
Subject: Re: [v6ops] SLAAC renum: Problem Statement & Operational workarounds
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Nov 2019 09:45:31 -0000

On 1/11/19 05:50, Ted Lemon wrote:
> On Oct 31, 2019, at 10:27 PM, Fernando Gont <fgont@si6networks.com
> <mailto:fgont@si6networks.com>> wrote:
>>>>
>>>> Did happy eyeballs encourage broken IPv6 connectivity, or did it
>>>> actually help IPv6 deployment?
>>>
>>> Don’t get me wrong—I’m not saying we shouldn’t do things to improve the
>>> situation.   I am saying that we should be strategic about it.
>>
>> For the home network case, the situation right now is that there are
>> deployments that break, and ISPs meaning to deploy IPv6 that can't do
>> stable prefixes. Certainly, that's a very bad strategy on our side.
>>
>> That said, and as noted, the home network case is just *one* scenario
>> where this problem may be faced. And a subset of the mitigations for
>> this scenarios are useful for the general case.
> 
> You mention later that if it takes more than ten seconds for the
> connection to resume, the user will be on the phone to the ISP.  It
> sounds, then, like they already have an incentive to configure their
> networks so that this doesn’t happen. 

In the presence of Happy Eyeballs, the problem will be masqueraded.


> So why aren’t they doing that?   What’s the obstacle?

Are you referring to the CPE case, or to the rest?


[....]
> 
>>>> That's not correct. Neither the Valid Lifetime nor the Preferred
>>>> Lifetime affect address selection.
>>>
>>> Okay, there’s something to fix.  Why would these not affect source
>>> address selection?
>>
>> For many reasons: one of them: two different routers might be using
>> different values for these timers, in an un-coordinated manner. And
>> there's not reason for which the smaller or larger lifetimes should
>> imply an address is preferred.
>>
>> OTOH, you might thing about preferring "the last advertised prefix". BUt
>> that would mean that src address would flap, which is bad for
>> trouble-shooting.
> 
> On what /actual/ network would this scenario occur, though, in a way
> that would actually cause the problem you anticipate?  I know you can
> set this up in the lab.   But in practice, I never have two competing
> routers advertising different and incompatible lifetimes for the same
> prefix on my home network.  So if you want to break things in other ways
> in order to be able to adapt to this situation, it should be likely.
>  When and where is it likely?

If you assume all routers employ the same values for the valid/preferred
lifetimes, then what you propose is the equivalent to "prefer the last
advertised prefix". In which case, in the presence of multiple RAs the
src address of new sessions will deterministically flap. That doesn't
seem nice from a troubleshooting perspective.



> As for previous comments about source address selection, I think that if
> you have two prefixes that are otherwise equivalent (a tie), and one has
> a preferred lifetime of zero, while the other has a preferred lifetime
> of not-zero, it would be dysfunctional to choose the one with the
> preferred lifetime of zero.  What on earth is the purpose of preferred
> and valid lifetimes if SAS isn’t taking them into account?

The timers are only supposed to be of use if they go off. Given the
default values the use (one month, and one week), they never go off.

The valid lifetime would trigger garbage collection for stale prefixes.
The preferred lifetime would keep a fresh and working address as the
preferred address for new flows. -- if only they went of in a timelier
manner.

So, I'd rather ask the question: what on earth is the purpose of setting
a timer that will never go off in a meaningful situation and period of time?




>>>>> If you Really Really want to be able to have the routers send out RAs
>>>>> that deprecate the default route, and, as Mark is saying here, to
>>>>> upgrade millions or perhaps billions of hosts, why not ask for
>>>>> something that’s a real improvement?
>>>>
>>>> Every piece helps.
>>>
>>> Right, but if the effort involved in two different options differs by
>>> epsilon, you should always choose the option that produces the better
>>> outcome, shouldn’t you?
>>
>> Is there any of the proposed fixes that we shouldn't be doing, already,
>> anyway?
> 
>    o  CPE routers SHOULD NOT automatically send DHCPv6-PD RELEASE
>       messages upon reboot events.
> 
> 
> We should definitely be doing this, but it might be worth pointing out
> that a very simple fix for this problem would be to have the server
> acknowledge the release but ignore it.

What if the user does want to release the prefix, e.g. for privacy reasons?



>  The server is allowed to give
> out the same prefix again even when it’s received a release, so the
> release is not a “change my prefix” signal, and perhaps we should
> explicitly advise that it not be treated that way.   This would be an
> easy tweak in a DHCP server, much easier than updating a billion CPEs.
> 
>    o  A CPE router sending RAs that advertise dynamically-learned
>       prefixes (e.g. via DHCPv6-PD) on an interface MUST record, on
>       stable storage, the list of prefixes being advertised on each
>       network segment.
> 
> 
> Okay.   However, if an ISP makes the change I suggest above, this is no
> longer necessary, although I think it’s still good advice.
> 
> Your other proposed changes seem fine to me; my main focus here is
> really on whether there are other things we could be doing that would
> improve things even more.

I'm all for that.



>>>> What's the practical difference between that, an a network that supports
>>>> RA-Guard?
>>>
>>> On a network that supports RA guard, there probably is no difference,
>>> but RA guard on some networks, as we’ve discussed in the past, will
>>> actually make the network not work.   RA guard requires an active
>>> administrator.   So the CPE case we’re talking about isn’t relevant.
>>
>> The point is: in a network where you do not employ RA-Guard, you are
>> already trusting the router. Actually, it's worse: you trust all local
>> systems.
> 
> That’s why I’m suggesting that there might be a way to allow us to trust
> less and verify more, rather than having an entity on the wire that has
> no basis for knowing which devices are trustworthy and which are not,
> making that decision for us (NKB).

Not that I oppose to that. But I just note that there's no reason for ot
trusting a router that advertises PIO,VL=0, because your trust model is
that you do trust the router (if not your whole local network).


> 
>>>>> When another RA arrives, see if it was signed with the same key.   If
>>>>> so, it came from the same router, and can be trusted to update
>>>>> whatever information that router sent, including flash-deprecating a
>>>>> prefix.   If not, ignore it.
>>>>
>>>> In the non-SEND trust model, you do trust the local router. Why did you
>>>> trust the local router to configure your network, but not for
>>>> deprecating the prefix?
>>>
>>> In the non-SEND trust model, you trust the local network.   You
>>> /hope/ that the RA you get “from the router” is actually from the router.
>>
>> Exactly: that's the point: you already trust the local network. What's
>> the rationale for trusting one of the RAs, but not the others?
> 
> If I get an RA that’s not signed by the same key as a route in my
> routing table that’s working, I don’t let it override what is currently
> in my routing table and working.

But that's not the current deployed reality. IN current deployments, RAs
are not signed. So why wuld you be skeptical to PIO, VL=0?

Thanks,
-- 
Fernando Gont
SI6 Networks
e-mail: fgont@si6networks.com
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492