Re: [Gen-art] Genart last call review of draft-ietf-v6ops-slaac-renum-03

Fernando Gont <fgont@si6networks.com> Sun, 06 September 2020 12:41 UTC

Return-Path: <fgont@si6networks.com>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 57FCA3A0DAC; Sun, 6 Sep 2020 05:41:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.845
X-Spam-Level:
X-Spam-Status: No, score=-2.845 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.948, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k1CaVScSnuAC; Sun, 6 Sep 2020 05:41:21 -0700 (PDT)
Received: from fgont.go6lab.si (fgont.go6lab.si [IPv6:2001:67c:27e4::14]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5E9DE3A0D5C; Sun, 6 Sep 2020 05:41:16 -0700 (PDT)
Received: from [10.0.0.134] (unknown [186.19.8.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by fgont.go6lab.si (Postfix) with ESMTPSA id 22998283B59; Sun, 6 Sep 2020 12:41:10 +0000 (UTC)
To: "Dale R. Worley" <worley@ariadne.com>
Cc: gen-art@ietf.org, last-call@ietf.org, v6ops@ietf.org, draft-ietf-v6ops-slaac-renum.all@ietf.org
References: <87pn72cpmk.fsf@hobgoblin.ariadne.com>
From: Fernando Gont <fgont@si6networks.com>
Message-ID: <d4ee7239-6d84-add7-f1c1-7f94633658fa@si6networks.com>
Date: Sun, 6 Sep 2020 09:40:33 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <87pn72cpmk.fsf@hobgoblin.ariadne.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/gen-art/kOJMQJvuRc_1z36S3UBAw8omqEo>
Subject: Re: [Gen-art] Genart last call review of draft-ietf-v6ops-slaac-renum-03
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 06 Sep 2020 12:41:27 -0000

Hello, Dale,

On 3/9/20 22:11, Dale R. Worley wrote:
> Fernando Gont <fgont@si6networks.com> writes:
>> Thanks so much for your feedback! Please find my responses in-line....
> 
> You're welcome!
> 
> To be clear, I'm not disagreeing with you about any of this as questions
> of *fact*, but rather saying that these points are obscure to people who
> aren't familiar with the practicalities, and it may help the work to
> explicate them.

Thanks for raising them! -- I'm all for improving the document where 
possible...



>>> There is one technical aspect that doesn't seem to be addressed
>>> explicitly:  Given that a router should never advertise a prefix via
>>> SLAAC that extends beyond the lease time it has received via DHCP,
>>> even if the router is restarted and it receives a new prefix via DHCP,
>>> the old prefix remains delegated to the network for the remainder of
>>> its lease time, and in a way, it remains *valid* for the hosts to use
>>> addresses derived from the old prefix for the remainder of the
>>> lifetime they received from SLAAC.  The existence of this document
>>> shows that such usage does not work well, but perhaps there is some
>>> place early in the discussion where it can be made clear that validity
>>> is not sufficient in practice.
>>
>> Flash renumbering essentially breaks that premise.
>>
>> In all of the scenarios discussed in Section 1, a prefix that was
>> originally valid, has become invalid.
>>
>> In principle, when a prefix is advertised via SLAAC, it means "you can
>> use this prefix for 'Valid Lifetime', unless you hear otherwise from
>> me". For a number of reasons, hosts may not hear otherwise from the
>> router -- whether because the router fails to signal that condition, or
>> because hosts fail to receive that notification. Besides, as noted in
>> the document, the current spec prevents hosts from honoring Valid
>> Lifetimes smallerFernando AND Gont than two hours.
> 
> OK, but it would seem to be a violation for a router to advertise a
> Valid Lifetime of X unless it had obtained from its upstream provider a
> guarantee that it owned that prefix for X.

In e.g. the case of the home router scenario, the home router does 
obtain a promise from the upstream. However, at some point some event 
happens (see the bulleted list in Section 1), and the promise is broken.

It is not that the systems involved were meant to break the promise, but 
rather that the operational reality caused the promise to be broken.

Should we add something like:

"We note that while IPv6 network renumbering is expected to take place 
in a planned manner, flash-renumbering scenarios such as the ones 
described above are nevertheless an operational reality."

?


>  Since there's never a
> guarantee that a host will be able to "hear otherwise" in the future,
> it's never safe to give a host an assignment that extends beyond the
> time that you know it's valid.  ...  I suspect that in fact routers
> don't adhere to this rule, but if so, it seems like something you should
> underline as being one root cause, and perhaps list it as an item to be
> investigated in future work.

Please see above.



>>> Nits/editorial comments:
>>>
>>> 1.  Introduction
>>>
>>>         [...] but will normally retain and actively employ the addresses
>>>         configured for the previously-advertised prefix, since their
>>>         associated Preferred Lifetime and Valid Lifetime allow them to do
>>>         so.
>>>
>>> Naively, it seems like the new prefix will almost always have longer
>>> lifetime values than the old prefix, and given that this seems to be
>>> how orderly renumbering causes hosts to transition from using the old
>>> prefix to the new prefix, it's not clear how hosts "will normally
>>> ... actively employ the addresses configured for the
>>> previously-advertised prefix".  Naively, hosts only seem to be
>>> permitted to employ the old prefix, but the preferred behavior would
>>> be to use the new prefix whenever possible.
>>
>> Not really. Different routers may employ different lifetimes for the
>> prefixes they advertise. And a given router may also employ different
>> lifetimes for the different prefixes it advertises. So comparing the
>> advertised lifetimes of two different prefixes is not meaningful.
>>
>> Please see:
>> https://tools.ietf.org/html/draft-ietf-6man-slaac-renum-01#appendix-A.2
>> for further details.
> 
> I see the point.  But it seems to me that the sentence should be
> modified to "may", unless the current practices cause continuing use of
> the previous prefix to be *more likely*.  In the latter case, you
> probably want to insert the reference you give.

Yep, current practice makes it more likely. Note: I wouldn't mind 
including the reference. However, as noted in the referenced Section, 
such text is expected to be removed before publication....

That aside, I wonder which part you think would need clarification:
* Why nodes don't pick the source address based on the lifetime value?
* Why they keep using the first address?





>>>      o  During the planned network renumbering, a router may be configured
>>>         to send an RA with the Preferred Lifetime for the "old" Prefix
>>>         Information Option (PIO) set to zero and the new PIO having non-
>>>         zero Preferred Lifetime.
>>>
>>> This sentence reminds me:  There are a number of places where PIO is
>>> mentioned.  My understanding from here is that PIO is only used in RA,
>>> so for simplicity/clarity, mentions of PIO should always include that
>>> they are within RAs, so that the reader remember that all prefix
>>> announcements are either RAs or SLAAC.  Conversely, if PIOs are used
>>> both by SLAAC and RAs, that should be emphasized early on, so the
>>> reader knows that later mentions of PIOs apply to both protocols
>>> equally.  Then again, perhaps RAs are messages within SLAAC, in which
>>> case that should be made clear.  In any case, the document should
>>> state the relationship between SLAAC, RA, and PIO.
>>
>> FWIW, I would expect that part to be somehow addressed by including
>> RFC4861 and RFC4862 as normative references.
>>
>> That said, I guess one could add something like this to the Intro:
>>
>> "IPv6 Stateless address autoconfiguration (SLAAC) [RFC4862] conveys
>> information about prefixes to be employed for address configuration via
>> Prefix Information Options (PIOs) sent in Router Advertisement (RA)
>> messages".
>>
>> Would that do?
> 
> Yes, I was starting to suspect that SLAAC and RA were actually the same
> thing.  Though I'd un-passive-ize it by writing:
> 
> "In IPv6 Stateless address autoconfiguration (SLAAC) [RFC4862], routers
> convey to hosts on their networks information about prefixes to be
> employed for address configuration via Prefix Information Options (PIOs)
> sent in Router Advertisement (RA) messages".

May I simplify the text as:
"In IPv6 Stateless address autoconfiguration (SLAAC) [RFC4862], routers
convey information about prefixes to be employed for address 
configuration via Prefix Information Options (PIOs) sent in Router 
Advertisement (RA) messages".

?


> 
>>> In various places, "timely" and derived words are used.  In some
>>> places, the text asserts that certain intervals are not timely (in an
>>> absolute sense) without any discussion about what the standard of
>>> timeliness is.  It seems that some such discussion needs to be made,
>>> or a statement made that such a discussion needs to be undertaken as
>>> part of the work.
>>
>> There's not really a standard for timeliness. Throughout the document,
>> whenever we refer to something as "timely" we mean: "a period of time
>> that seems sensible to the user".
>>
>> For example, at come I use this:
>>    Prefix                   : xxxxxxxxxxxxxxxx::/64
>>     On-link                 :          Yes
>>     Autonomous adFernando AND Gontdress conf.:          Yes
>>     Valid time              :         1800 (0x00000708) seconds
>>     Pref. time              :          900 (0x00000384) seconds
>>
>> That is, 30 minutes for the valid lifetime, and 15 minutes for the
>> preferred lifetime.
>>
>> I have experienced the problem described in our document recently.
>> However, it only happens (if it does), when I reboot my router. Since
>> that, whether explicitly or implicitly (as a result of a blackout) is
>> not happening frequently 15' (at most) to prefer a newer address or 30'
>> (at most) to completely get rid of old addresses is good enough.
>>
>> If the problem happened more frequently, I would certainly reduce such
>> values probably to a half of the current values.
> 
> Of course, the question is intrinsically vague.  But section 3.2 lists
> specific values, giving no particular ratinale, and then states "they
> will not lead to a timely recovery from the problem discussed in this
> document." 

How about changing that sentence to:
  "  However, while the aforementioned values represent an improvement
     over the default values specified in [RFC4861], they represent a 
trade-off among a number of factors, including responsiveness, possible 
impact on the battery life of connected devices [RFC7772], etc. Thus, 
they may or may not provide sufficient mitigation to the problem 
discussed in this document.
"

?



> It seems like a remarkably concrete assertion; literally,
> the statement is absolute.  And also, if those values aren't
> satisfactory, why does the document mention them?  It would seem more
> useful to give values that one wanted to propose as giving timely
> action, and then note that there are tradeoffs and other values may be
> better in various situations.

Please let me know if the above would work better...



>>>      Some devices have implemented ad-hoc mechanisms to address this
>>>      problem, such as sending RAs to invalidate apparently-stale
>>>      prefixes [...]
>>>
>>> This seems to contradict the statement in section 2.3 that an RA
>>> cannot effectively reduce the Valid Lifetime of a prefix (as
>>> maintained by a host) to zero:  "Item e) [...] specifies that an RA
>>> may never reduce the "RemainingLifetime" to less than two hours."
>>> Indeed, a crux of this document is that there is no way for a router
>>> to immediately invalidate the use of a prefix on a network whose
>>> addressing it configures.  So the described mechanism needs to be
>>> clarified.
>>
>> How about s/invalidate/deprecate/?
> 
> What I'm trying to get at is that the routers might send such RAs, but
> since the hosts are rquired to not act on them, it should have no effect
> and it isn't really a mechanism.  In particular, I suspect that some
> routers send messages like this (which they shouldn't), and some hosts
> violate the standard (or at least extend it) by acting on those messages
> (either by abandoning the prefix or by deprecating it relative to other
> prefixes).  Whatever it is should be explicated, because it's not in the
> RFCs, but it is part of current practice.

In retrospective, maybe the text was a bit imprecise in this respect.

It should probably have said:

"Some devices have implemented ad-hoc mechanisms to try to mitigate this
problem, such as sending RAs with a Preferred Lifetime of zero and a 
small Valid Lifetime to try to deprecate and phase out the 
apparently-stale prefixes"


Indeed, that's what the box being referenced is doing: 
https://www.si6networks.com/2016/02/16/quiz-weird-ipv6-traffic-on-the-local-network-updated-with-solution/

Thoughts?



>>>      Under normal network conditions, these timers will be reset/refreshed
>>>      to the default values.  However, under problematic circumstances,
>>>
>>> It seems like the first sentence is not to the point of this
>>> paragraph, and it would be clearer just starting with "Under problematic
>>> circumstances ...".
>>
>> Not sure what you mean. Could you clarify?
> 
> I was thinking of updating the text to read:
> 
>     Under problematic circumstances,
>     such as where the corresponding network information has become ...
> 
> The previous sentence is "Under normal network conditions, these timers
> will be reset/refreshed to the default values." and to include it would
> require some reworing, and I'm not sure it would add enough value to be
> worth it.  In particular, the previous sentences talk about the default
> values that are put into PIOs, which of course are not themselves reset
> or refreshed.  So you'd have to say "Under normal network conditions,
> the Pref. Life. and Val. Life. timers in hosts are periodically
> reset/refreshed to the values in the PIOs, which are usually the default
> values."

This seems to convey a different meaning... What we mean to say, in line 
with the earlier discussion on the use of timers is:: "when everything 
is okay, these timers will be refreshed. NOw.. if there's any problem, 
these timers will cause addresses to be deprecated at [X], and be 
invalidated at [Y]".

Am I missing something?



>>> 2.4.  Lack of Explicit Signaling about Stale Information
>>>
>>>      [...] such signaling would be mostly ignored.
>>>
>>> This needs clarification why it is "mostly" rather than "always".
>>
>> Here we mean that, in practice, the signal is ignored. (the packet *is*
>> received and processed, but as per the current specs, hosts would reduce
> assuming that there is "not" here:                                 ^
>> the Valid Lifetime further than two hours). Nowadays, there are
>> implementations that are not compliant with the specs in this respect..
>> so some implementations would honor the signal, while others would not.
> 
> IMO it would be quite useful to mention here that there are noncompliant
> implementations, and if possible, describe how they are noncompliant.
> Because that's part of the landscape within which this work is needed.

Since the currently-non-compliant behavior is being proposed in a 6man 
wg document (draft-ietf-6man-slaac-renum), how about adding this 
parenthetical note:

"  NOTE: Some implementations, such as [Linux-kernel], do not comply 
with this requirement from [RFC4862], but rather honor any value 
conveyed in a PIO, as proposed in [draft-ietf-6man-slaac-renum]."

Thoughts?

Thanks!

Regards,
-- 
Fernando Gont
SI6 Networks
e-mail: fgont@si6networks.com
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492