Re: [v6ops] Opsdir last call review of draft-ietf-v6ops-slaac-renum-03

Fernando Gont <fgont@si6networks.com> Thu, 10 September 2020 09:15 UTC

To: Jürgen Schönwälder <j.schoenwaelder@jacobs-university.de>, ops-dir@ietf.org
Cc: draft-ietf-v6ops-slaac-renum.all@ietf.org, last-call@ietf.org, v6ops@ietf.org
References: <159968910157.15345.3077847299653382902@ietfa.amsl.com>
From: Fernando Gont <fgont@si6networks.com>
Message-ID: <03acb49d-9c05-521a-9bf8-40da16c5f7a7@si6networks.com>
Date: Thu, 10 Sep 2020 06:00:02 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <159968910157.15345.3077847299653382902@ietfa.amsl.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/v6ops/8b6CMUEJHYQVHGyjOxnEwnZwnAY>
Subject: Re: [v6ops] Opsdir last call review of draft-ietf-v6ops-slaac-renum-03
Precedence: list

Hi, Jürgen,

Thanks a lot for your comments! In-line....

On 9/9/20 19:05, Jürgen Schönwälder via Datatracker wrote:
[....]
> 
> Perhaps indicate a bit earlier what unacceptably long means, i.e. we
> are talking about days and weeks.

This is a bit subjective. If I'm sitting on my computer doing e.g. 
video-conferencing (i.e., anything interactive), probably anything over 
a few minutes would be unacceptable. In a more general case, what's 
acceptable is a function of how often the problem happens and whether 
there's any ongoing interactive usage -- and that's still subjective.

> The scenarios described read a bit
> like somewhat rare events and hence it is useful for the reader to
> have an idea what unacceptably long means in such events.

I wondering if adding something like:
" Any definition of what is considered 'acceptable' here would be 
subjective, and would probably also depend on how often these 
flash-renumbering events occur, whether the affected hosts are employing 
any interactive applications, and other parameters. However, one rough 
estimate would be that hosts should be able to deal with 
flash-renumbering events with a similar timeliness with which they can 
deal with failing default routers."

would help?

> (BTW, I find
> the scenario not described at the beginning where a router announces
> SLAAC lifetimes that are not synchronized with obtained prefix
> lifetimes operationally the more tricky problem since this can lead to
> regular failures.)

Fair enough. How about adding this to the bulleted-list:

" o A router (e.g. Customer Edge router) may advertise autoconfiguration 
prefixes corresponding to prefixes learned via DHCPv6-PD with constant 
PIO lifetimes that are not synchronized with the DHCPv6-PD lease time 
(as required in Section 6.3 of [RFC8415]). While this behavior violates 
the aforementioned requirement from [RFC8415], it is not an unusual 
behavior, particularly when e.g. DHCPv6-PD is implemented in a different 
software module than the SLAAC router component.".

?

> Section 2.2 seems to confuse soft-state (this is what a learned IPv6
> prefix is for me) with certain protocol timers. There are many places
> where protocols use soft-state and implementations use timers to purge
> or refresh soft-state. That timers generally do not go off in normal
> conditions is not really correct in this context, DHCP leases are
> renewed when their lifetime expires, a normal operation. 

Normally, you renew the lease before the lease expires.

> IP address
> mappings to Ethernet addresses expire when their lifetime timer goes
> off. 

This one is not the necessarily the best example ;-) (while RFC1122 
requires that, IIRC in many implementations the entry is refreshed when 
referenced, and it only expires when not referenced/refreshed frequently 
enough).

But I do see where you are going and I realize that the text is a bit 
sloppy in this respect. How about tweaking the text as follows:

---- cut here ----
    Many protocols, from different layers, normally employ timers for 
fault isolation/recovery.  The
    general logic is as follows:

    o  A timer is set with a value such that, under normal conditions,
       the timer does *not* go off.

    o  Whenever a fault condition arises, the timer goes off, and the
       protocol can perform fault recovery

    For example, when implementing reliability mechanisms, a timer is 
normally set when a packet is transmitted and, unless a response is 
received before the timer goes off, a fault recovery action (such as 
packet re-transmission) is triggered.
---- cut here ----

?

One might also look at this same issue as the timer implying a sensible 
period of time where information should be refreshed, as you correctly 
point out, though.

(I guess the only difference is that when looking at this form the 
soft-state angle, you're mostly considering the case where information 
changes, whereas when looking at this from the fault-recovery pov, 
you're mostly thinking about failures, rather than updates).

> Switches purge forwarding state regularly when forwarding entries
> expire. Cached DNS name to IP resolutions expire. The only problem
> here seems to be that a lifetime of 7 days / 30 days is a bit
> ridiculous.

Agreed.

> Is anyone shipping the RFC 4861 defaults? 

Yes, unfortunately. Some implementations override the RFC4861 defaults. 
Still, RFC4861 defaults are extremely common and widespread.

> The few
> implementations I have seen do use a bit more reasonable defaults.  I
> think this section should be rewritten to replace the "timer going off
> is associated with a failure" text with a discussion of	soft-state in
> other protocols. (Section 2.2 is why I ticked 'has issues'.)

As a second alternative to what I've suggested above:

---- cut here ----
    Many protocols, from different layers, normally employ timers for a
    variety of purposes, such as in fault isolation/recovery mechanisms,
    and in the maintenance of data structures that contain bindings of
    some sort (e.g., the IPv6 Neighbor Cache [RFC4861]).

    In the case of fault recovery/isolation, the general logic is as
    follows:

    o  A timer is set with a value such that, under normal conditions,
       the timer does *not* go off.

    o  Whenever a fault condition arises, the timer goes off, and the
       protocol can perform fault recovery

     For example, when implementing reliability mechanisms, a timer is
     normally set when a packet is transmitted and, unless a response is
     received before the timer goes off, a fault recovery action (such as
     packet re-transmission) is triggered.

     On the other hand, when maintaining bindings in data structures, 
timers are usually selected in a way that any bindings that become stale 
are updated in a timely manner.
---- cut here ----

?

> Isn't a part of the solution (other than moving to less ridiculous
> default) that SLAAC hosts experiencing connectivity problems should
> try to validate the prefix that they have learned (and if the
> validation fails move to a newly learned prefix)?

Yes, indeed. That's what we are pursuing in draft-ietf-6man-slaac-renum. 
(see Section 4 of this (draft-ietf-v6ops-slaac-renum-03) document).

draft-ietf-v6ops-slaac-renum-03 contains the problem statement and 
*operational* mitigations only.

> Involving the hosts
> in a resolution of the problem may be	more robust than expecting that
> something in the network takes care of invalidating stale soft-state.

I agree 100%. That is and has been, indeed, the motivation for pursuing 
draft-ietf-6man-slaac-renum.

Thanks!

Regards,
-- 
Fernando Gont
SI6 Networks
e-mail: fgont@si6networks.com
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492

[v6ops] Opsdir last call review of draft-ietf-v6o… Jürgen Schönwälder via Datatracker
Re: [v6ops] Opsdir last call review of draft-ietf… Fernando Gont
Re: [v6ops] Opsdir last call review of draft-ietf… Juergen Schoenwaelder
Re: [v6ops] Opsdir last call review of draft-ietf… Fernando Gont
Re: [v6ops] [OPS-DIR] Opsdir last call review of … Juergen Schoenwaelder
Re: [v6ops] [OPS-DIR] Opsdir last call review of … Warren Kumari