Re: [v6ops] Opsdir last call review of draft-ietf-v6ops-slaac-renum-03

Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de> Thu, 10 September 2020 10:13 UTC

Return-Path: <j.schoenwaelder@jacobs-university.de>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C54093A128E; Thu, 10 Sep 2020 03:13:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O5ipKDyo2kFn; Thu, 10 Sep 2020 03:13:28 -0700 (PDT)
Received: from atlas5.jacobs-university.de (atlas5.jacobs-university.de [212.201.44.20]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 301D63A119B; Thu, 10 Sep 2020 03:13:28 -0700 (PDT)
Received: from localhost (demetrius5.irc-it.jacobs-university.de [10.70.0.222]) by atlas5.jacobs-university.de (Postfix) with ESMTP id 25DA583A; Thu, 10 Sep 2020 12:13:24 +0200 (CEST)
X-Virus-Scanned: amavisd-new at jacobs-university.de
Received: from atlas5.jacobs-university.de ([10.70.0.198]) by localhost (demetrius5.jacobs-university.de [10.70.0.222]) (amavisd-new, port 10032) with ESMTP id dACdirJVf6tE; Thu, 10 Sep 2020 12:13:24 +0200 (CEST)
Received: from hermes.jacobs-university.de (hermes.jacobs-university.de [212.201.44.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hermes.jacobs-university.de", Issuer "DFN-Verein Global Issuing CA" (verified OK)) by atlas5.jacobs-university.de (Postfix) with ESMTPS; Thu, 10 Sep 2020 12:13:24 +0200 (CEST)
Received: from localhost (demetrius5.irc-it.jacobs-university.de [10.70.0.222]) by hermes.jacobs-university.de (Postfix) with ESMTP id 9092620156; Thu, 10 Sep 2020 12:13:22 +0200 (CEST)
X-Virus-Scanned: amavisd-new at jacobs-university.de
Received: from hermes.jacobs-university.de ([212.201.44.23]) by localhost (demetrius5.jacobs-university.de [10.70.0.222]) (amavisd-new, port 10028) with ESMTP id 2qyLtJQ90Y8h; Thu, 10 Sep 2020 12:13:22 +0200 (CEST)
Received: from localhost (anna.jacobs.jacobs-university.de [10.50.218.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by hermes.jacobs-university.de (Postfix) with ESMTPS id EE29820154; Thu, 10 Sep 2020 12:13:21 +0200 (CEST)
Date: Thu, 10 Sep 2020 12:13:20 +0200
From: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>
To: Fernando Gont <fgont@si6networks.com>
Cc: ops-dir@ietf.org, draft-ietf-v6ops-slaac-renum.all@ietf.org, last-call@ietf.org, v6ops@ietf.org
Message-ID: <20200910101320.izuzgihtdkaxyov3@anna.jacobs.jacobs-university.de>
Reply-To: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>
Mail-Followup-To: Fernando Gont <fgont@si6networks.com>, ops-dir@ietf.org, draft-ietf-v6ops-slaac-renum.all@ietf.org, last-call@ietf.org, v6ops@ietf.org
References: <159968910157.15345.3077847299653382902@ietfa.amsl.com> <03acb49d-9c05-521a-9bf8-40da16c5f7a7@si6networks.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
In-Reply-To: <03acb49d-9c05-521a-9bf8-40da16c5f7a7@si6networks.com>
X-Clacks-Overhead: GNU Terry Pratchett
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/nQIc18__k50SamRHBcFdlj1QaX0>
Subject: Re: [v6ops] Opsdir last call review of draft-ietf-v6ops-slaac-renum-03
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2020 10:13:31 -0000

On Thu, Sep 10, 2020 at 06:00:02AM -0300, Fernando Gont wrote:
> Hi, Jürgen,
> 
> Thanks a lot for your comments! In-line....
> 
> On 9/9/20 19:05, Jürgen Schönwälder via Datatracker wrote:
> [....]
> > 
> > Perhaps indicate a bit earlier what unacceptably long means, i.e. we
> > are talking about days and weeks.
> 
> This is a bit subjective. If I'm sitting on my computer doing e.g.
> video-conferencing (i.e., anything interactive), probably anything over a
> few minutes would be unacceptable. In a more general case, what's acceptable
> is a function of how often the problem happens and whether there's any
> ongoing interactive usage -- and that's still subjective.

My point was that at the beginning there is just 'unacceptably long'
and I had not clue whether I should think in terms of seconds,
minutes, hours, days, weeks. It is only later that I am getting told
we talk about days. Writing earlier something like "unacceptably long
(up to multiple days)" would have helped me to appreciate the problem.
 
> > The scenarios described read a bit
> > like somewhat rare events and hence it is useful for the reader to
> > have an idea what unacceptably long means in such events.
> 
> I wondering if adding something like:
> " Any definition of what is considered 'acceptable' here would be
> subjective, and would probably also depend on how often these
> flash-renumbering events occur, whether the affected hosts are employing any
> interactive applications, and other parameters. However, one rough estimate
> would be that hosts should be able to deal with flash-renumbering events
> with a similar timeliness with which they can deal with failing default
> routers."
> 
> would help?

Does not hurt. I think my main issue was that I had no clue about the
dimension of 'unacceptably long', which can be fixed easily by making
it clear we are not talking about minutes.

> > (BTW, I find
> > the scenario not described at the beginning where a router announces
> > SLAAC lifetimes that are not synchronized with obtained prefix
> > lifetimes operationally the more tricky problem since this can lead to
> > regular failures.)
> 
> Fair enough. How about adding this to the bulleted-list:
> 
> " o A router (e.g. Customer Edge router) may advertise autoconfiguration
> prefixes corresponding to prefixes learned via DHCPv6-PD with constant PIO
> lifetimes that are not synchronized with the DHCPv6-PD lease time (as
> required in Section 6.3 of [RFC8415]). While this behavior violates the
> aforementioned requirement from [RFC8415], it is not an unusual behavior,
> particularly when e.g. DHCPv6-PD is implemented in a different software
> module than the SLAAC router component.".
> 

Thanks.
 
> > Section 2.2 seems to confuse soft-state (this is what a learned IPv6
> > prefix is for me) with certain protocol timers. There are many places
> > where protocols use soft-state and implementations use timers to purge
> > or refresh soft-state. That timers generally do not go off in normal
> > conditions is not really correct in this context, DHCP leases are
> > renewed when their lifetime expires, a normal operation.
> 
> Normally, you renew the lease before the lease expires.

But also this is triggered by a timer, no?
 
> > IP address
> > mappings to Ethernet addresses expire when their lifetime timer goes
> > off.
> 
> This one is not the necessarily the best example ;-) (while RFC1122 requires
> that, IIRC in many implementations the entry is refreshed when referenced,
> and it only expires when not referenced/refreshed frequently enough).
> 
> But I do see where you are going and I realize that the text is a bit sloppy
> in this respect. How about tweaking the text as follows:
> 
> ---- cut here ----
>    Many protocols, from different layers, normally employ timers for fault
> isolation/recovery.  The
>    general logic is as follows:
> 
>    o  A timer is set with a value such that, under normal conditions,
>       the timer does *not* go off.
> 
>    o  Whenever a fault condition arises, the timer goes off, and the
>       protocol can perform fault recovery
> 
>    For example, when implementing reliability mechanisms, a timer is
> normally set when a packet is transmitted and, unless a response is received
> before the timer goes off, a fault recovery action (such as packet
> re-transmission) is triggered.
> ---- cut here ----
> 
> ?
> 
> One might also look at this same issue as the timer implying a sensible
> period of time where information should be refreshed, as you correctly point
> out, though.
> 
> (I guess the only difference is that when looking at this form the
> soft-state angle, you're mostly considering the case where information
> changes, whereas when looking at this from the fault-recovery pov, you're
> mostly thinking about failures, rather than updates).

I would probably simply remove text that talks about the general use
of timers. It is not needed to understand the problem, I think it
actually confuses the problem. For me, the prefix is soft-state that a
box learned (like it learns a lot of other soft-state) and so you
either go for 'reasonably short' lifetimes or you design mechanisms to
detect and handle stale information gracefully.
 
> > Switches purge forwarding state regularly when forwarding entries
> > expire. Cached DNS name to IP resolutions expire. The only problem
> > here seems to be that a lifetime of 7 days / 30 days is a bit
> > ridiculous.
> 
> Agreed.
> 
> 
> > Is anyone shipping the RFC 4861 defaults?
> 
> Yes, unfortunately. Some implementations override the RFC4861 defaults.
> Still, RFC4861 defaults are extremely common and widespread.
> 
> 
> 
> > The few
> > implementations I have seen do use a bit more reasonable defaults.  I
> > think this section should be rewritten to replace the "timer going off
> > is associated with a failure" text with a discussion of	soft-state in
> > other protocols. (Section 2.2 is why I ticked 'has issues'.)
> 
> As a second alternative to what I've suggested above:
> 
> ---- cut here ----
>    Many protocols, from different layers, normally employ timers for a
>    variety of purposes, such as in fault isolation/recovery mechanisms,
>    and in the maintenance of data structures that contain bindings of
>    some sort (e.g., the IPv6 Neighbor Cache [RFC4861]).
> 
>    In the case of fault recovery/isolation, the general logic is as
>    follows:
> 
>    o  A timer is set with a value such that, under normal conditions,
>       the timer does *not* go off.
> 
>    o  Whenever a fault condition arises, the timer goes off, and the
>       protocol can perform fault recovery
> 
>     For example, when implementing reliability mechanisms, a timer is
>     normally set when a packet is transmitted and, unless a response is
>     received before the timer goes off, a fault recovery action (such as
>     packet re-transmission) is triggered.
> 
>     On the other hand, when maintaining bindings in data structures, timers
> are usually selected in a way that any bindings that become stale are
> updated in a timely manner.
> ---- cut here ----
>

As said above, I do not think a generic discussion of timers is needed
to make the point this document is trying to make, namely (i) the
suggested default lifetimes are ridiculous long and (ii) there should
be robust ways to recover quickly from situations where a prefix
becomes unusable within its lifetime.

In fact, if one would go and remove redundancy in the document, it
could be way shorter, but that may be a stylistic issue. ;-)

> > Isn't a part of the solution (other than moving to less ridiculous
> > default) that SLAAC hosts experiencing connectivity problems should
> > try to validate the prefix that they have learned (and if the
> > validation fails move to a newly learned prefix)?
> 
> Yes, indeed. That's what we are pursuing in draft-ietf-6man-slaac-renum.
> (see Section 4 of this (draft-ietf-v6ops-slaac-renum-03) document).
> 
> draft-ietf-v6ops-slaac-renum-03 contains the problem statement and
> *operational* mitigations only.
> 
> > Involving the hosts
> > in a resolution of the problem may be more robust than expecting that
> > something in the network takes care of invalidating stale soft-state.
> 
> I agree 100%. That is and has been, indeed, the motivation for pursuing
> draft-ietf-6man-slaac-renum.

/js

-- 
Juergen Schoenwaelder           Jacobs University Bremen gGmbH
Phone: +49 421 200 3587         Campus Ring 1 | 28759 Bremen | Germany
Fax:   +49 421 200 3103         <https://www.jacobs-university.de/>