Re: Adoption Call for "Improving the Robustness of Stateless Address Autoconfiguration (SLAAC) to Flash Renumbering Events"

Fernando Gont <fernando@gont.com.ar> Mon, 29 June 2020 05:43 UTC

Return-Path: <fernando@gont.com.ar>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A773F3A0886 for <ipv6@ietfa.amsl.com>; Sun, 28 Jun 2020 22:43:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.612
X-Spam-Level:
X-Spam-Status: No, score=-1.612 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.276, SPF_HELO_NONE=0.001, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xha7m7eiBbgV for <ipv6@ietfa.amsl.com>; Sun, 28 Jun 2020 22:43:30 -0700 (PDT)
Received: from tools.si6networks.com (v6toolkit.go6lab.si [IPv6:2001:67c:27e4::57]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C0CA43A0884 for <ipv6@ietf.org>; Sun, 28 Jun 2020 22:43:26 -0700 (PDT)
Received: from [192.168.4.129] (unknown [186.19.8.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by tools.si6networks.com (Postfix) with ESMTPSA id 890FE3FE2D; Mon, 29 Jun 2020 07:43:05 +0200 (CEST)
From: Fernando Gont <fernando@gont.com.ar>
Subject: Re: Adoption Call for "Improving the Robustness of Stateless Address Autoconfiguration (SLAAC) to Flash Renumbering Events"
To: Lorenzo Colitti <lorenzo=40google.com@dmarc.ietf.org>, Gyan Mishra <hayabusagsm@gmail.com>
Cc: Bob Hinden <bob.hinden@gmail.com>, IPv6 List <ipv6@ietf.org>
References: <CC295D49-5981-41C3-B4DB-E064D66616CE@gmail.com> <adddbd07-2262-b585-68a1-00fc28207a84@gmail.com> <CABNhwV0MFe-d6-DL2SuhuyPSq7Mn0-TS=poDn9ynAqn1ZWXOKA@mail.gmail.com> <CAKD1Yr3zEcZ5=1ttDbZGDtN86qy+wRbFXmOHXqngqu6NuYYJ5g@mail.gmail.com>
Message-ID: <2759b55c-871f-dc41-c180-47c1ebd1135d@gont.com.ar>
Date: Mon, 29 Jun 2020 02:42:49 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <CAKD1Yr3zEcZ5=1ttDbZGDtN86qy+wRbFXmOHXqngqu6NuYYJ5g@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/G1g-t2j1odSd7MiM3f7udiOMwdU>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jun 2020 05:43:33 -0000

On 29/6/20 00:54, Lorenzo Colitti wrote:
[...]
> 
> As I pointed out at 
> https://mailarchive.ietf.org/arch/msg/ipv6/1yimPGA9Eq8h0eS0LH22jy8ezwg/
>  , I don't think draft-ietf-v6ops-slaac-renum convincingly makes the
> case that this is a widespread problem. Many of the issues reported
> in that draft involve either standards violations (e.g., flash
> renumbering while there is a valid, unexpired DHCPv6 lease) or
> configuration errors (like an automated push system that pushes an
> invalid configuration).

I don't think shifting blame to e.g. the ISP will result in a better 
user experience.

That said,
    [UK-NOF]   Palet, J., "IPv6 Deployment Survey (Residential/Household
               Services) How IPv6 is being deployed?", UK NOF 39, January
               2018,
               <https://indico.uknof.org.uk/event/41/contributions/542/
               attachments/712/866/bcop-ipv6-prefix-v9.pdf>.

reports that 37% of of responding ISPs do dynamic prefixes. That seems 
pretty widespread to me. (not to mention the other possible scenarios).



> But most importantly, I believe the fundamental change proposed by
> this document (the text in section 4.5) is harmful for several
> reasons.
> 
> 1. It complicates SLAAC in several ways. It requires hosts to keep
> track of a lot more state. It associates PIOs with a particular
> router not just for the purpose of routing but also for the purpose
> of lifetime processing. It seems to special-case ULA prefixes,
> treating them differently from non-ULA prefixes, and even tying them
> together ("Only RAs that advertise Global Unicast prefixes may
> deprecate Global Unicast Addresses (GUAs), while only RAs that
> advertise Unique Local prefixes may deprecate Unique Local Addresses
> (ULAs)").

The mitigation in Section 4.5 requires only one additional variable per
advertised prefix: LTA_LA (a timestamp of when the prefix was last
advertised). Is that the "a lot more state" you are referring to?



> 2. it attempts to detect network changes using heuristics which I
> think will be brittle in the field, in particular, in the presence of
> packet loss. We must bear in mind that many handheld devices
> intentionally drop significant percentages of multicast traffic
> (upwards of 50%), when on Wi-Fi networks because not listening to
> multicast traffic at every beacon interval provides very substantial
> battery savings.

Could you please elaborate on why you think this would make
implementations brittle?

If such devices can successfully employ SLAAC, there's no reason
why the proposes mitigation would make them more brittle. Simply pick
LTA_DEPRECATE and LTA_INVALID that suits you.



> 3. It only considers PIOs. But SLAAC can convey many parameters that
> are specific to the given network or given router. The most obvious
> example would be if a router advertises, say, a PIO of 2001:db8::/64
> and RDNSS servers of 2001:db8::cafe and 2001:db8::beef. (This is, for
> example, what Android does when acting as a router for hotspot
> purposes.) Even if the host correctly deprecates the PIOs, the host
> will still have a broken DNS configuration. Fixing this would require
> complicating the already brittle and complex heuristics in this
> document, and will require tying together options like RDNSS and PIO
> that are currently not tied together in any way. But there are many
> other options that would need to be treated in this way in order to
> solve the problem with this approach. For example, the PREF64 option
> is potentially dependent on the network attachment. How would the
> heuristics need to change for that option?

1) The point of the WG adopting a document is for the WG to work on it.
It is not necessarily an indication that the document in question is
already complete.

2) When it comes to the specific example you've cited, I'd say:
    * Quite normally, you have multiple configured RDNSS servers, for
redundancy purposes. So you presumably already have code to use a 
different RDNSS if the current one doesn't work. So, in that light, the 
existing code will take care of it.
    * That said, it would be sensible to set and cap the RDNSS lifetimes
a la Section 4.1.2, and, similarly, set the lifetime as a function of
the Router Lifetime. This will help with the associated garbage 
collection. -- i.e., one might want to incorporate this into the document.
    * If one wanted to further improve/fine tune this with the same logic
as in Section 4.5, the idea would be simple: if the same router
advertises a new RDNSS, but not the existing ones, simply reduce the old
RDNSS lifetimes. However, as per the previous bullets, hosts are already
expected to deal with a list of RDNSS, and use the ones that work.



> 4. A consequence of #3 above is that any *new* option we define also
>  needs to update the heuristics, and needs rules on when and how to 
> invalidate it, potentially by being tied to other options that are 
> already considered by the heuristics.

They need not. If nodes can gracefully deal with stale information
provided by such options, there's no need to invalidate them, and hence
no need for heuristics. OTOH, if hosts are not able to deal gracefully
with stale information provided by such options, and you don't devise a
mechanism to take care of such old information, then you have a broken
protocol.



> This means it will be more difficult to add new options to SLAAC in
> the future. Further, if we do not carefully consider these heuristics
> when adding other options, we'll end up with bugs in the standards.
> An example of such a bug is already present in this draft: the
> heuristics make assumptions on other parts of the standards, for
> example they depend on the value of MaxRtrAdvertisementInterval.  But
> 1800 seconds is incorrect, the correct number is in RFC 8319 and is
> 65535 seconds. This is such a big difference (a factor of 36!) as to
> call into question the correctness of the algorithm that depends on
> it, even in the draft as it currently stands.

Thanks for the note, we will correct that in the next rev.

That said, that doesn't affect the algorithm. If one wanted to be 
super-conservative, one could simply change LTA_INVALID_DEFAULT to 
65535, as you have noted (at the end of the day, the most important 
thing is to unprefer the addresses). That said, with the recommendation 
in Section 4.4, using 1800 (the value that we are currently employing, 
instead of RFC65535) would still be safe.



> Instead of advancing this document I would suggest writing an 
> operational document suggesting best practices for configurations. If
> we think that some of those best practices (even if allowed by the 
> standards already) require tweaks to the standards, then I think we 
> could have a new, small document making those minor changes. We
> probably already need such a document for the "allow zero valid
> lifetimes in PIOs" tweak.

There are a few things to note here:

1) Currently, some specs have "default" values, and at times there are 
BCPs that have "recommended" values -- such as the default "Router 
Lifetime" specified in RFC4861, and the "recommended" values in RFC7772. 
As someone at the last RIPE IPv6 meeting, default values essentially 
turn out to be "these values any sane person would override to something 
else". So, for the values in Section 4.1.1, I'd rather have a Std Track 
document that specifies sensible default values, rather than having a 
Std Track document that specifies inappropriate default values, and an 
operational document that somehow overrides the default values with 
something sensible.

2) In that light, this document contains what we think are required 
tweaks to the standards to improve the reaction of slaac to renumbering 
events.

3) I would expect that the decision to adopt this document does not 
necessarily imply that the document is published "as is", but rather 
than we use this document as a starting point. As part of such work, we 
(wg) might decide to change some things, drop some of the proposed 
mitigations, or split the document into smaller pieces.

Thanks,
-- 
Fernando Gont
e-mail: fernando@gont.com.ar || fgont@si6networks.com
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1