Re: Adoption Call for "Improving the Robustness of Stateless Address Autoconfiguration (SLAAC) to Flash Renumbering Events"

Lorenzo Colitti <lorenzo@google.com> Mon, 29 June 2020 07:50 UTC

Return-Path: <lorenzo@google.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5D9903A0CFA for <ipv6@ietfa.amsl.com>; Mon, 29 Jun 2020 00:50:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oAlBp1Z1U6lC for <ipv6@ietfa.amsl.com>; Mon, 29 Jun 2020 00:50:12 -0700 (PDT)
Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C17983A0C97 for <ipv6@ietf.org>; Mon, 29 Jun 2020 00:50:08 -0700 (PDT)
Received: by mail-il1-x133.google.com with SMTP id t27so8657979ill.9 for <ipv6@ietf.org>; Mon, 29 Jun 2020 00:50:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PxmuM88WHyN0JvxKQY9LRBOS5whG9kBvxIp4ROVINXg=; b=X0PxbDo+66EuJeZG30JhKRe9ZUkfZa7tGj+DmDFfLU0S6F+/58xk31l3oYee0nZfGx MhZo70fA3OAKdCJeC30yV+hWJTLUmxbHK4mPXBVv0i5dLd3xmf11Uy9o6dPu5OoUi2lH 10mj4S1Bz7ASMjgSHcQ6WyF0s0lob0UaWmZVHN0kLyTuD6wh6wNJZwgkEC9wBG2do5r1 evgpISyZq9HFfwtzs2lL+Rlbsn9U3wcxkrQdx4K52sl4jEmBRbVPW2GQYu1bMh4SiDIU 877PQ0qEzu6GVsIWF3DrOb7peW6Tk0/Vv5xVccg8d5Zl32ibinGO9JF8ZwUZoogOeZBT Io2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PxmuM88WHyN0JvxKQY9LRBOS5whG9kBvxIp4ROVINXg=; b=mCAn7BmGfhXEiUAhbk12Qlqeeiq0SnzgB8os/6ARbN9RY8e6zcQkiwDptX+uQ4h7+C zFoBBJ2Qr6cEjs7OqeBhwM8Bj1BJmEVSu0QXOzD3VA0bryrdgv2IUBDaeVdKphBUhD+C qVSRnkcwSYZeUBTrdy9E++yvb7sLfTZoW0Ws4k0452y8KTUtoLA3bcpIsrj+OwmZmjL7 9ohhpN3fKAhlq7dJJGaTTAU2c59fQRGRCzpHWDITEMfv6QGYAPX8+ij5TDNimhsY/uMG NTE/VqOO4Ht830zNTWHBsoua8YBzf6LE1kzOSMQSSetUsmKLn8pVWYit/mloeweOjAGF grLw==
X-Gm-Message-State: AOAM5323PKTBovtcaRZEgdYpUfT3cpIYqXHc7vfxarXPRqZ8KgVcz3oP G/rUkd5swvnizUwueqLEYlUGeiwTg1CFmGPrat8p+w==
X-Google-Smtp-Source: ABdhPJx/fP1/ExIM6lh++bcJa7z85vNi7TxFUneLfU/jwzAjFoij3plY9mCSSGUzxdYD+ilz5ZGJ5A4zmzGg4p9+3rg=
X-Received: by 2002:a92:502:: with SMTP id q2mr14060367ile.61.1593417007687; Mon, 29 Jun 2020 00:50:07 -0700 (PDT)
MIME-Version: 1.0
References: <CC295D49-5981-41C3-B4DB-E064D66616CE@gmail.com> <adddbd07-2262-b585-68a1-00fc28207a84@gmail.com> <CABNhwV0MFe-d6-DL2SuhuyPSq7Mn0-TS=poDn9ynAqn1ZWXOKA@mail.gmail.com> <CAKD1Yr3zEcZ5=1ttDbZGDtN86qy+wRbFXmOHXqngqu6NuYYJ5g@mail.gmail.com> <2759b55c-871f-dc41-c180-47c1ebd1135d@gont.com.ar>
In-Reply-To: <2759b55c-871f-dc41-c180-47c1ebd1135d@gont.com.ar>
From: Lorenzo Colitti <lorenzo@google.com>
Date: Mon, 29 Jun 2020 16:49:55 +0900
Message-ID: <CAKD1Yr2Uv=2PaoJschS_a6KSE_V8CgL=WkUxnUnBFqQ9Rkoe4Q@mail.gmail.com>
Subject: Re: Adoption Call for "Improving the Robustness of Stateless Address Autoconfiguration (SLAAC) to Flash Renumbering Events"
To: Fernando Gont <fernando@gont.com.ar>
Cc: Gyan Mishra <hayabusagsm@gmail.com>, Bob Hinden <bob.hinden@gmail.com>, IPv6 List <ipv6@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000b949ac05a9344e28"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/6rTgiawkk1BAKJVeCtuaO6fAxyU>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jun 2020 07:50:22 -0000

On Mon, Jun 29, 2020 at 2:43 PM Fernando Gont <fernando@gont.com.ar> wrote:

> reports that 37% of of responding ISPs do dynamic prefixes. That seems
> pretty widespread to me. (not to mention the other possible scenarios).
>

Please read the substance of the email I linked earlier. For example: for
the problem to occur it is not sufficient that there be flash renumbering.
The problem only occurs if there is a flash renumbering AND a crash with
loss of state AND layer 2 remains up. I stand by my assertion is that that
is a rare case


> > 1. It complicates SLAAC in several ways. It requires hosts to keep
> > track of a lot more state. It associates PIOs with a particular
> > router not just for the purpose of routing but also for the purpose
> > of lifetime processing. It seems to special-case ULA prefixes,
> > treating them differently from non-ULA prefixes, and even tying them
> > together ("Only RAs that advertise Global Unicast prefixes may
> > deprecate Global Unicast Addresses (GUAs), while only RAs that
> > advertise Unique Local prefixes may deprecate Unique Local Addresses
> > (ULAs)").
>
> The mitigation in Section 4.5 requires only one additional variable per
> advertised prefix: LTA_LA (a timestamp of when the prefix was last
> advertised). Is that the "a lot more state" you are referring to?
>

It *is* a lot more state compared to what implementations keep now. Right
now, there's only the two lifetimes. In theory there's also the router that
advertised the prefix, but I believe most popular implementations don't
actually store that (it's only required for rule 5.5, and AFAIK only
Windows implements rule 5.5).


> > 2. it attempts to detect network changes using heuristics which I
> > think will be brittle in the field, in particular, in the presence of
> > packet loss. We must bear in mind that many handheld devices
> > intentionally drop significant percentages of multicast traffic
> > (upwards of 50%), when on Wi-Fi networks because not listening to
> > multicast traffic at every beacon interval provides very substantial
> > battery savings.
>
> Could you please elaborate on why you think this would make
> implementations brittle?
>
> If such devices can successfully employ SLAAC, there's no reason
> why the proposes mitigation would make them more brittle. Simply pick
> LTA_DEPRECATE and LTA_INVALID that suits you.
>

And how do I determine "what suits me"? Can an implementation pick the same
value and have it work well on all networks? It seems to me that it can't,
because there are lots of variables that cannot be determined without
accumulating state on previous network behaviour such as packet loss and RA
frequency. It seems pretty clear that 5 seconds is not great in most
scenarios because if RAs are sent infrequently, then a single lost RA will
cause the device to conclude that some address is no longer preferred
(which by the way isn't really very useful because as long as there is an
active TCP connection on a deprecated address, that connection will remain
stuck for potentially tens of seconds or even minutes; but new connections
will instead use some other prefix, or even use IPv4).

I forgot to mention that this proposal also substantially complicates the
state machine by tying addresses to each other. Right now, from a SLAAC
point of view, each address is independent. Its lifetime can change if an
RA is received, but other than that, whether it is deprecated or not does
not depend on the state of other addresses on the interface. This document
would change that.


> > 3. It only considers PIOs. But SLAAC can convey many parameters that
> > are specific to the given network or given router. The most obvious
> > example would be if a router advertises, say, a PIO of 2001:db8::/64
> > and RDNSS servers of 2001:db8::cafe and 2001:db8::beef. (This is, for
> > example, what Android does when acting as a router for hotspot
> > purposes.) Even if the host correctly deprecates the PIOs, the host
> > will still have a broken DNS configuration. Fixing this would require
> > complicating the already brittle and complex heuristics in this
> > document, and will require tying together options like RDNSS and PIO
> > that are currently not tied together in any way. But there are many
> > other options that would need to be treated in this way in order to
> > solve the problem with this approach. For example, the PREF64 option
> > is potentially dependent on the network attachment. How would the
> > heuristics need to change for that option?
>
> 1) The point of the WG adopting a document is for the WG to work on it.
> It is not necessarily an indication that the document in question is
> already complete.
>

Yup. But I don't think the approach taken by this document is a promising
one. I think it adds too much complexity compared to the advantages that it
brings, and most importantly, it places a burden on future design work in
this area as well.


> 2) When it comes to the specific example you've cited, I'd say:
>     * Quite normally, you have multiple configured RDNSS servers, for
> redundancy purposes. So you presumably already have code to use a
> different RDNSS if the current one doesn't work. So, in that light, the
> existing code will take care of it.
>     * That said, it would be sensible to set and cap the RDNSS lifetimes
> a la Section 4.1.2, and, similarly, set the lifetime as a function of
> the Router Lifetime. This will help with the associated garbage
> collection. -- i.e., one might want to incorporate this into the document.
>     * If one wanted to further improve/fine tune this with the same logic
> as in Section 4.5, the idea would be simple: if the same router
> advertises a new RDNSS, but not the existing ones, simply reduce the old
> RDNSS lifetimes. However, as per the previous bullets, hosts are already
> expected to deal with a list of RDNSS, and use the ones that work.
>

Sure, we can fix that problem with more complexity. Like I said, if we
apply the approach taken in this document for PIOs to DNS, then we need
more rules, and more logic, and more dependencies between options. My main
concern with this approach is that we have to deal with this complexity for
all current options, and likely future options as well.


> > 4. A consequence of #3 above is that any *new* option we define also
> >  needs to update the heuristics, and needs rules on when and how to
> > invalidate it, potentially by being tied to other options that are
> > already considered by the heuristics.
>
> They need not. If nodes can gracefully deal with stale information
> provided by such options, there's no need to invalidate them, and hence
> no need for heuristics. OTOH, if hosts are not able to deal gracefully
> with stale information provided by such options, and you don't devise a
> mechanism to take care of such old information, then you have a broken
> protocol.
>

But that's exactly my point. Who decides the answer to the "if" in your
sentence above? And who writes the documents that inform implementations of
how to deal gracefully with stale information? The WG when working on those
future options, right? So we're adding more work to the WG whenever we
define new options.


> 1) Currently, some specs have "default" values, and at times there are
>
BCPs that have "recommended" values -- such as the default "Router
> Lifetime" specified in RFC4861, and the "recommended" values in RFC7772.
> As someone at the last RIPE IPv6 meeting, default values essentially
> turn out to be "these values any sane person would override to something
> else". So, for the values in Section 4.1.1, I'd rather have a Std Track
> document that specifies sensible default values, rather than having a
> Std Track document that specifies inappropriate default values, and an
> operational document that somehow overrides the default values with
> something sensible.
>

Setting defaults seems much more appropriate for an operational document
than a standards track - particularly because a standards track document
refers to all implementations, whereas operational documents can change the
defaults based on the scenario that is being deployed. But if there is
consensus in this WG to change the defaults of the existing standards, then
that's fine.


> 2) In that light, this document contains what we think are required
> tweaks to the standards to improve the reaction of slaac to renumbering
> events.
>

Right, but apart from the tweaks, the document also contains pretty
fundamental changes to how SLAAC works. This is the work that I don't think
we should take on.


> 3) I would expect that the decision to adopt this document does not
> necessarily imply that the document is published "as is", but rather
> than we use this document as a starting point. As part of such work, we
> (wg) might decide to change some things, drop some of the proposed
> mitigations, or split the document into smaller pieces.
>

Yup. Like I said, most of the document consists of simple tweaks that in
many cases are already allowed by existing standards. That definitely seems
publishable.

Another much simpler approach that could be taken to solve this problem is
to recommend that if a host receives an RA where previous prefix(es) - or
more in general, previous options - have disappeared, then it should
attempt to re-check that information's validity in some way (e.g., by
attempting off-link connectivity).

Cheers,
Lorenzo