Re: [v6ops] SLAAC renum: Problem Statement & Operational workarounds

Ted Lemon <mellon@fugue.com> Fri, 01 November 2019 08:51 UTC

Return-Path: <mellon@fugue.com>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 23C381200C3 for <v6ops@ietfa.amsl.com>; Fri, 1 Nov 2019 01:51:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=fugue-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8Yr1ZXBhEYcz for <v6ops@ietfa.amsl.com>; Fri, 1 Nov 2019 01:50:59 -0700 (PDT)
Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 14B87120121 for <v6ops@ietf.org>; Fri, 1 Nov 2019 01:50:59 -0700 (PDT)
Received: by mail-qt1-x82c.google.com with SMTP id y10so5273678qto.3 for <v6ops@ietf.org>; Fri, 01 Nov 2019 01:50:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fugue-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=f/GT7qxLNIYINgpvwopmx/FGOo+Kndpnr3g9CMUVGcg=; b=vL72j+T1Ke2Qnyn5WUK7rTdSy8ZfcoHM8o2MUjVijTM1QdBohCqxGY/+iPO4tKLexB vExGrr5UPOthznbVa0n+s212vU0BYFBOSAYinBt3KniT4pl3Gu4441Qb0oiShsfVArEG ur/0+CNfmyJvV1fMaoc7mgePweXxqF0gRu4qoOWuz5Wb5LAG05QbiCH5S/Kw+NIVYGIf fzc7eOBESaS6Z7DRZz/VslXLwNiKCADR2z03e6Dc5Bj6EIBaSSti13MXhNDuwT9B2YBP DPCtmwwguFUxhRXRWEkoyE3+Ft0PHgKgrhuJ8AV8hCONZHQE0U8B/2VOArdz3GRdq2/1 quEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=f/GT7qxLNIYINgpvwopmx/FGOo+Kndpnr3g9CMUVGcg=; b=MWksNlVFv0KsLNJ5PLj0A/4aMWmAKGfcPSJC4lddEXbr1LVIz+Z2KmGiHLYNwYozU9 x+SC5sE9ZnJL5wKWbmNxQ+ptHeMvflPNwduBeIVpJEIvPNbFPjO67MEX03H5I3A1SA0T AzYz0Xq4n1En+Z6yH/9zeacaKCG4YYmmUlBmwl8kR9g+WH+giI7SpIsdGPYIHM3743gA WnIEbXq4s0G3WpuU/P17ryOZvFHe0PuhkxpIUJZrysCFYsa6IKTgK+Ac+dftIIj/iMah tkR+hjL7pFatWIS+XspcjbRYS2T3CGyqSgmcMqL+uiOLv3ApXz0vWYNVX4hoUYdFat7S 0RmQ==
X-Gm-Message-State: APjAAAXbWwhOy/ADmXoSMq3OwEckus1oP3oAVogPPtimyQN4dqxfGkjo xDPi6K57QscCxCZ+BghU4O9j23XnA9gvhg==
X-Google-Smtp-Source: APXvYqzwouNWiQju6gN0GaN1cJTj4Ar+p90yfUL08ymeaczi+10ev2PUOVmlDf3nz/5BmEJk+gJHvw==
X-Received: by 2002:a05:6214:1323:: with SMTP id c3mr8826407qvv.243.1572598257917; Fri, 01 Nov 2019 01:50:57 -0700 (PDT)
Received: from ?IPv6:2601:18b:300:36ee:bdc1:1d6d:228a:c9c1? ([2601:18b:300:36ee:bdc1:1d6d:228a:c9c1]) by smtp.gmail.com with ESMTPSA id f39sm3361940qtb.26.2019.11.01.01.50.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 01 Nov 2019 01:50:57 -0700 (PDT)
From: Ted Lemon <mellon@fugue.com>
Message-Id: <69BD70A3-D9BF-48CB-9E68-D242333E9683@fugue.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_28FC60B6-5768-4DF3-9C6B-0F23DED14CA3"
Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3600\))
Date: Fri, 1 Nov 2019 04:50:55 -0400
In-Reply-To: <7007fd81-eae9-c165-c405-162b561f165a@si6networks.com>
Cc: v6ops list <v6ops@ietf.org>
To: Fernando Gont <fgont@si6networks.com>
References: <CAO42Z2yQ_6PT3nQrXGD-mKO1bjsW6V3jZ_2kNGC2x586EMiNZg@mail.gmail.com> <B53CE471-C6E8-4DC1-8A72-C6E23154544F@fugue.com> <325e84aa-1703-e1ce-55a6-8790ceb7aff0@si6networks.com> <4C6471D4-0F5B-49EE-A38A-22AB2B87DA7E@fugue.com> <7007fd81-eae9-c165-c405-162b561f165a@si6networks.com>
X-Mailer: Apple Mail (2.3600)
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/kHt3PNdpFowuO6C4QAS6atwiFq0>
Subject: Re: [v6ops] SLAAC renum: Problem Statement & Operational workarounds
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Nov 2019 08:51:03 -0000

On Oct 31, 2019, at 10:27 PM, Fernando Gont <fgont@si6networks.com> wrote:
>>> 
>>> Did happy eyeballs encourage broken IPv6 connectivity, or did it
>>> actually help IPv6 deployment?
>> 
>> Don’t get me wrong—I’m not saying we shouldn’t do things to improve the
>> situation.   I am saying that we should be strategic about it.
> 
> For the home network case, the situation right now is that there are
> deployments that break, and ISPs meaning to deploy IPv6 that can't do
> stable prefixes. Certainly, that's a very bad strategy on our side.
> 
> That said, and as noted, the home network case is just *one* scenario
> where this problem may be faced. And a subset of the mitigations for
> this scenarios are useful for the general case.

You mention later that if it takes more than ten seconds for the connection to resume, the user will be on the phone to the ISP.  It sounds, then, like they already have an incentive to configure their networks so that this doesn’t happen.   So why aren’t they doing that?   What’s the obstacle?

>>>> What should be happening on the host with a prefix that’s deprecated
>>>> is that TCP connections should be timing out.   This doesn’t take
>>>> very long.  
>>> 
>>> ~9 minutes, IIRC.
>> 
>> Should be 90 seconds.
> 
> Nope. The default user timeout is 5 minutes as per RFC793. (The first
> para of the second page of RFC5482 provides a summary wrt the
> specification of the user timeout).

Hm, okay.

>>> That's not correct. Neither the Valid Lifetime nor the Preferred
>>> Lifetime affect address selection.
>> 
>> Okay, there’s something to fix.  Why would these not affect source
>> address selection?
> 
> For many reasons: one of them: two different routers might be using
> different values for these timers, in an un-coordinated manner. And
> there's not reason for which the smaller or larger lifetimes should
> imply an address is preferred.
> 
> OTOH, you might thing about preferring "the last advertised prefix". BUt
> that would mean that src address would flap, which is bad for
> trouble-shooting.

On what actual network would this scenario occur, though, in a way that would actually cause the problem you anticipate?  I know you can set this up in the lab.   But in practice, I never have two competing routers advertising different and incompatible lifetimes for the same prefix on my home network.  So if you want to break things in other ways in order to be able to adapt to this situation, it should be likely.  When and where is it likely?

As for previous comments about source address selection, I think that if you have two prefixes that are otherwise equivalent (a tie), and one has a preferred lifetime of zero, while the other has a preferred lifetime of not-zero, it would be dysfunctional to choose the one with the preferred lifetime of zero.  What on earth is the purpose of preferred and valid lifetimes if SAS isn’t taking them into account?

>> What is “FHS”?
> 
> Sorry, for that: First Hop Security, Cisco's speak for RA-Guard, ND
> inspeaction, and so on…

Ah.   Maybe we should call that NKB (network knows best).

>>>> If you Really Really want to be able to have the routers send out RAs
>>>> that deprecate the default route, and, as Mark is saying here, to
>>>> upgrade millions or perhaps billions of hosts, why not ask for
>>>> something that’s a real improvement?
>>> 
>>> Every piece helps.
>> 
>> Right, but if the effort involved in two different options differs by
>> epsilon, you should always choose the option that produces the better
>> outcome, shouldn’t you?
> 
> Is there any of the proposed fixes that we shouldn't be doing, already,
> anyway?

   o  CPE routers SHOULD NOT automatically send DHCPv6-PD RELEASE
      messages upon reboot events.

We should definitely be doing this, but it might be worth pointing out that a very simple fix for this problem would be to have the server acknowledge the release but ignore it.  The server is allowed to give out the same prefix again even when it’s received a release, so the release is not a “change my prefix” signal, and perhaps we should explicitly advise that it not be treated that way.   This would be an easy tweak in a DHCP server, much easier than updating a billion CPEs.

   o  A CPE router sending RAs that advertise dynamically-learned
      prefixes (e.g. via DHCPv6-PD) on an interface MUST record, on
      stable storage, the list of prefixes being advertised on each
      network segment.

Okay.   However, if an ISP makes the change I suggest above, this is no longer necessary, although I think it’s still good advice.

Your other proposed changes seem fine to me; my main focus here is really on whether there are other things we could be doing that would improve things even more.

>>> What's the practical difference between that, an a network that supports
>>> RA-Guard?
>> 
>> On a network that supports RA guard, there probably is no difference,
>> but RA guard on some networks, as we’ve discussed in the past, will
>> actually make the network not work.   RA guard requires an active
>> administrator.   So the CPE case we’re talking about isn’t relevant.
> 
> The point is: in a network where you do not employ RA-Guard, you are
> already trusting the router. Actually, it's worse: you trust all local
> systems.

That’s why I’m suggesting that there might be a way to allow us to trust less and verify more, rather than having an entity on the wire that has no basis for knowing which devices are trustworthy and which are not, making that decision for us (NKB).

>>>> When another RA arrives, see if it was signed with the same key.   If
>>>> so, it came from the same router, and can be trusted to update
>>>> whatever information that router sent, including flash-deprecating a
>>>> prefix.   If not, ignore it.
>>> 
>>> In the non-SEND trust model, you do trust the local router. Why did you
>>> trust the local router to configure your network, but not for
>>> deprecating the prefix?
>> 
>> In the non-SEND trust model, you trust the local network.   You
>> /hope/ that the RA you get “from the router” is actually from the router.
> 
> Exactly: that's the point: you already trust the local network. What's
> the rationale for trusting one of the RAs, but not the others?

If I get an RA that’s not signed by the same key as a route in my routing table that’s working, I don’t let it override what is currently in my routing table and working.