Re: Adoption Call for "Improving the Robustness of Stateless Address Autoconfiguration (SLAAC) to Flash Renumbering Events"

Brian E Carpenter <brian.e.carpenter@gmail.com> Mon, 29 June 2020 21:00 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 478503A0D3D for <ipv6@ietfa.amsl.com>; Mon, 29 Jun 2020 14:00:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TdAjJROQ5Syr for <ipv6@ietfa.amsl.com>; Mon, 29 Jun 2020 14:00:42 -0700 (PDT)
Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8B4643A0D30 for <ipv6@ietf.org>; Mon, 29 Jun 2020 14:00:42 -0700 (PDT)
Received: by mail-pf1-x430.google.com with SMTP id u185so6382514pfu.1 for <ipv6@ietf.org>; Mon, 29 Jun 2020 14:00:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=sDF+JYMLTHj7Kpg3o52nKTkdZX6c0zD4e7BJM4HyTxY=; b=R7fKG5xMAfJzYckhJ30AKJq46YNlsOCnuhBHaNgXULBVpgA327VMneoLc6QMH/oO5F j845CHzLRuedLhWE5vgQ3bnxPK1isVfT2ZgyhEOg/6IEDnX/UVJHn6uFazBWePlkdpkb 2hSyXXmsdYH4CrSiDVyyHXJbj6INoSWytS/ZZrUxknfl98pBEqMj6135uRJgsUKKzp7h zIYq++VR4BzYPNqIB8HwLwjYHhoUE7veruWGDhfJM/TkRBbCXf2B4u7S26HXrCMm0RKu E7jaqOtzpv5DEkJqWN3j+9EJIqFdrY13pFC+sXyft7syPMefOHX0FexhQ/hagOKkU3JG nLaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=sDF+JYMLTHj7Kpg3o52nKTkdZX6c0zD4e7BJM4HyTxY=; b=nJAFdOnt0dpLYIGJKNpY1BKpVDygnk2R+Oqz/OPCPYvskVUK3qE6iNn7zOAZQuprT2 zOvKLUfy5ItjB0+K9JFtJ9XWaLQNXif8v0yqQfVLZrkDfNtoNahFHUa3pcMd3BpEIC0L Oa6DYozKQacGINRBhHfNdyqY3/UJ0BCdcwox+50ssueUkzTy6p6FVtqWMbxVjSITuHKH ylpp7YYMvH6815S/5Ztuaqsmm1ERG3FtOiOrjxKbts7z2+OHI99CzNd/G11fsbBK0Di4 Kz3h108LZis1js8Cfo1PqomKSqsMTjBf35+/f9yQ25a2E5jDEyTEesojzxMQrXfQdTlW Bknw==
X-Gm-Message-State: AOAM532+xoVIJJu+huz/7bM+wXwE5yLUdZjhEUNU3O/JXEJwVwFDgB69 TKe85/p5Ess2t3ydiEKWXJE=
X-Google-Smtp-Source: ABdhPJyL2o3RVXMVeCudwOmBlDYk4cdhI8Jjuo/o8XJ2WI50i/1VExSjKUApTrIzzKgiGrtwBFH6cw==
X-Received: by 2002:a62:1c8a:: with SMTP id c132mr16473425pfc.69.1593464441755; Mon, 29 Jun 2020 14:00:41 -0700 (PDT)
Received: from [192.168.178.20] (203.90.69.111.dynamic.snap.net.nz. [111.69.90.203]) by smtp.gmail.com with ESMTPSA id 193sm517135pfz.85.2020.06.29.14.00.39 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jun 2020 14:00:40 -0700 (PDT)
Subject: Re: Adoption Call for "Improving the Robustness of Stateless Address Autoconfiguration (SLAAC) to Flash Renumbering Events"
To: Lorenzo Colitti <lorenzo=40google.com@dmarc.ietf.org>, Fernando Gont <fernando@gont.com.ar>
Cc: IPv6 List <ipv6@ietf.org>, Bob Hinden <bob.hinden@gmail.com>
References: <CC295D49-5981-41C3-B4DB-E064D66616CE@gmail.com> <adddbd07-2262-b585-68a1-00fc28207a84@gmail.com> <CABNhwV0MFe-d6-DL2SuhuyPSq7Mn0-TS=poDn9ynAqn1ZWXOKA@mail.gmail.com> <CAKD1Yr3zEcZ5=1ttDbZGDtN86qy+wRbFXmOHXqngqu6NuYYJ5g@mail.gmail.com> <2759b55c-871f-dc41-c180-47c1ebd1135d@gont.com.ar> <CAKD1Yr2Uv=2PaoJschS_a6KSE_V8CgL=WkUxnUnBFqQ9Rkoe4Q@mail.gmail.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Organization: University of Auckland
Message-ID: <fb4ada37-c654-b881-0321-dd82a093411c@gmail.com>
Date: Tue, 30 Jun 2020 09:00:37 +1200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <CAKD1Yr2Uv=2PaoJschS_a6KSE_V8CgL=WkUxnUnBFqQ9Rkoe4Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/RrTfTKWJ9jkxBvQvmbB6GZn-eW0>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jun 2020 21:00:50 -0000

Hi Lorenzo,
On 29-Jun-20 19:49, Lorenzo Colitti wrote:
> On Mon, Jun 29, 2020 at 2:43 PM Fernando Gont <fernando@gont.com.ar <mailto:fernando@gont.com.ar>> wrote:
> 
>     reports that 37% of of responding ISPs do dynamic prefixes. That seems
>     pretty widespread to me. (not to mention the other possible scenarios).
> 
> 
> Please read the substance of the email I linked earlier. For example: for the problem to occur it is not sufficient that there be flash renumbering. The problem only occurs if there is a flash renumbering AND a crash with loss of state AND layer 2 remains up. 

Assuming you mean loss of state in the CE router, my experience was that an ADSL disconnect/reconnect did apparently cause loss of state. And my Windows host didn't seem to treat a short break in WiFi connectivity without a change of SSID as a layer 2 failure.

Past tense because I am no longer on ADSL and have a brand new CE router, so I cannot reproduce the effect. Anyone who is on ADSL can try it though, by briefly unplugging the relevant RJ11.

    Brian

> I stand by my assertion is that that is a rare case
>  
> 
>     > 1. It complicates SLAAC in several ways. It requires hosts to keep
>     > track of a lot more state. It associates PIOs with a particular
>     > router not just for the purpose of routing but also for the purpose
>     > of lifetime processing. It seems to special-case ULA prefixes,
>     > treating them differently from non-ULA prefixes, and even tying them
>     > together ("Only RAs that advertise Global Unicast prefixes may
>     > deprecate Global Unicast Addresses (GUAs), while only RAs that
>     > advertise Unique Local prefixes may deprecate Unique Local Addresses
>     > (ULAs)").
> 
>     The mitigation in Section 4.5 requires only one additional variable per
>     advertised prefix: LTA_LA (a timestamp of when the prefix was last
>     advertised). Is that the "a lot more state" you are referring to?
> 
> 
> It *is* a lot more state compared to what implementations keep now. Right now, there's only the two lifetimes.. In theory there's also the router that advertised the prefix, but I believe most popular implementations don't actually store that (it's only required for rule 5.5, and AFAIK only Windows implements rule 5.5).
>  
> 
>     > 2. it attempts to detect network changes using heuristics which I
>     > think will be brittle in the field, in particular, in the presence of
>     > packet loss. We must bear in mind that many handheld devices
>     > intentionally drop significant percentages of multicast traffic
>     > (upwards of 50%), when on Wi-Fi networks because not listening to
>     > multicast traffic at every beacon interval provides very substantial
>     > battery savings.
> 
>     Could you please elaborate on why you think this would make
>     implementations brittle?
> 
>     If such devices can successfully employ SLAAC, there's no reason
>     why the proposes mitigation would make them more brittle. Simply pick
>     LTA_DEPRECATE and LTA_INVALID that suits you.
> 
> 
> And how do I determine "what suits me"? Can an implementation pick the same value and have it work well on all networks? It seems to me that it can't, because there are lots of variables that cannot be determined without accumulating state on previous network behaviour such as packet loss and RA frequency. It seems pretty clear that 5 seconds is not great in most scenarios because if RAs are sent infrequently, then a single lost RA will cause the device to conclude that some address is no longer preferred (which by the way isn't really very useful because as long as there is an active TCP connection on a deprecated address, that connection will remain stuck for potentially tens of seconds or even minutes; but new connections will instead use some other prefix, or even use IPv4).
> 
> I forgot to mention that this proposal also substantially complicates the state machine by tying addresses to each other. Right now, from a SLAAC point of view, each address is independent. Its lifetime can change if an RA is received, but other than that, whether it is deprecated or not does not depend on the state of other addresses on the interface. This document would change that.
>  
> 
>     > 3. It only considers PIOs. But SLAAC can convey many parameters that
>     > are specific to the given network or given router. The most obvious
>     > example would be if a router advertises, say, a PIO of 2001:db8::/64
>     > and RDNSS servers of 2001:db8::cafe and 2001:db8::beef. (This is, for
>     > example, what Android does when acting as a router for hotspot
>     > purposes.) Even if the host correctly deprecates the PIOs, the host
>     > will still have a broken DNS configuration. Fixing this would require
>     > complicating the already brittle and complex heuristics in this
>     > document, and will require tying together options like RDNSS and PIO
>     > that are currently not tied together in any way. But there are many
>     > other options that would need to be treated in this way in order to
>     > solve the problem with this approach. For example, the PREF64 option
>     > is potentially dependent on the network attachment. How would the
>     > heuristics need to change for that option?
> 
>     1) The point of the WG adopting a document is for the WG to work on it.
>     It is not necessarily an indication that the document in question is
>     already complete.
> 
> 
> Yup. But I don't think the approach taken by this document is a promising one. I think it adds too much complexity compared to the advantages that it brings, and most importantly, it places a burden on future design work in this area as well.
>  
> 
>     2) When it comes to the specific example you've cited, I'd say:
>         * Quite normally, you have multiple configured RDNSS servers, for
>     redundancy purposes. So you presumably already have code to use a
>     different RDNSS if the current one doesn't work. So, in that light, the
>     existing code will take care of it.
>         * That said, it would be sensible to set and cap the RDNSS lifetimes
>     a la Section 4.1.2, and, similarly, set the lifetime as a function of
>     the Router Lifetime. This will help with the associated garbage
>     collection. -- i.e., one might want to incorporate this into the document.
>         * If one wanted to further improve/fine tune this with the same logic
>     as in Section 4.5, the idea would be simple: if the same router
>     advertises a new RDNSS, but not the existing ones, simply reduce the old
>     RDNSS lifetimes. However, as per the previous bullets, hosts are already
>     expected to deal with a list of RDNSS, and use the ones that work.
> 
> 
> Sure, we can fix that problem with more complexity. Like I said, if we apply the approach taken in this document for PIOs to DNS, then we need more rules, and more logic, and more dependencies between options. My main concern with this approach is that we have to deal with this complexity for all current options, and likely future options as well.
>  
> 
>     > 4. A consequence of #3 above is that any *new* option we define also
>     >  needs to update the heuristics, and needs rules on when and how to
>     > invalidate it, potentially by being tied to other options that are
>     > already considered by the heuristics.
> 
>     They need not. If nodes can gracefully deal with stale information
>     provided by such options, there's no need to invalidate them, and hence
>     no need for heuristics. OTOH, if hosts are not able to deal gracefully
>     with stale information provided by such options, and you don't devise a
>     mechanism to take care of such old information, then you have a broken
>     protocol.
> 
> 
> But that's exactly my point. Who decides the answer to the "if" in your sentence above? And who writes the documents that inform implementations of how to deal gracefully with stale information? The WG when working on those future options, right? So we're adding more work to the WG whenever we define new options.
>  
> 
>     1) Currently, some specs have "default" values, and at times there are 
> 
>     BCPs that have "recommended" values -- such as the default "Router
>     Lifetime" specified in RFC4861, and the "recommended" values in RFC7772.
>     As someone at the last RIPE IPv6 meeting, default values essentially
>     turn out to be "these values any sane person would override to something
>     else". So, for the values in Section 4.1.1, I'd rather have a Std Track
>     document that specifies sensible default values, rather than having a
>     Std Track document that specifies inappropriate default values, and an
>     operational document that somehow overrides the default values with
>     something sensible.
> 
> 
> Setting defaults seems much more appropriate for an operational document than a standards track - particularly because a standards track document refers to all implementations, whereas operational documents can change the defaults based on the scenario that is being deployed. But if there is consensus in this WG to change the defaults of the existing standards, then that's fine.
>  
> 
>     2) In that light, this document contains what we think are required
>     tweaks to the standards to improve the reaction of slaac to renumbering
>     events.
> 
> 
> Right, but apart from the tweaks, the document also contains pretty fundamental changes to how SLAAC works.. This is the work that I don't think we should take on.
>  
> 
>     3) I would expect that the decision to adopt this document does not
>     necessarily imply that the document is published "as is", but rather
>     than we use this document as a starting point. As part of such work, we
>     (wg) might decide to change some things, drop some of the proposed
>     mitigations, or split the document into smaller pieces.
> 
> 
> Yup. Like I said, most of the document consists of simple tweaks that in many cases are already allowed by existing standards. That definitely seems publishable.
> 
> Another much simpler approach that could be taken to solve this problem is to recommend that if a host receives an RA where previous prefix(es) - or more in general, previous options - have disappeared, then it should attempt to re-check that information's validity in some way (e.g., by attempting off-link connectivity).
> 
> Cheers,
> Lorenzo
> 
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------
>