Re: Stability and Resilience (was Re: [v6ops] A common...)

On 2/22/19 4:40 PM, David Farmer wrote:
> Yes, at a very high level, it boils down to the same thing.  However, 
> saying "as much as possible", set a very wide standard, providing few 
> parameters to set expectations seems useful and helpful in developing 
> a consensus. For example, if someone says it is impossible for their 
> solution to ever issues the same prefix, they need to be told they 
> should probably find a different solution.

Hmm. Has anyone said it's impossible to ever issue the same prefix?

And even if they did, is it in charter for us to say, "You bought the 
wrong gear, buy something else"?

We're kind of arguing the definition of "possible," I think.

> Where on the other hand, what if an ISP could always provide the same 
> prefix, this basically says that is what they should do, where once 
> they exceed some minimum I would want to give the ISP flexibility to 
> implement what makes sense for their business model.  Also, it needs 
> to be clear this is "in normal circumstances", it should be 
> acknowledged there will be situations where this is not a reasonable 
> expectation.

I was trying to figure out how to add this as a #5, but I kept getting 
stuck on #1. Is it possible to ensure that the same DHCPv6 server always 
responds to queries, and therefore always Renews the lease? Or is this 
specifically for Requests, not Renewals? The synchronization on #1 is 
still not something I'm sure about, and if that's not fixed (and maybe 
it is), I don't see any way to satisfy this proposal.

Lee

>
> On Fri, Feb 22, 2019 at 12:43 PM Lee Howard <lee@asgard.org 
> <mailto:lee@asgard.org>> wrote:
>
>
>     On 2/22/19 12:53 PM, David Farmer wrote:
>>     Generally, I agree with what you are saying, but I'd like to see
>>     something like the following added as well;
>>
>>     Even if an ISP intends to change the IPv6 prefix regularly in
>>     the longer-term, say every few months or even each month at an
>>     extreme, in the shorter-term IPv6 prefixes SHOULD be stable, for
>>     time periods of hours, days, and maybe even weeks at a time.  Or,
>>     put another way, CPE devices SHOULD NOT get a new IPv6 prefix
>>     every time they are rebooted. Note: even in locations where
>>     utility power is generally stable, power outages frequently occur
>>     in clusters over a few hours or days.  This occurs when an
>>     emergency repair is made to restore power and then more permanent
>>     repairs cause short outages in the following hours or days. In
>>     this scenario, each of these events in the cluster SHOULD NOT
>>     result in the CPE receiving a different IPv6 prefix.
>>
>>     Conversely, when widespread power events occur, affecting
>>     thousands or even tens of thousands of customers, it may not be
>>     practical or even possible for an ISP to guarantee all CPE will
>>     receive the same IPv6 prefix they had before. Therefore to the
>>     extent possible, CPE and local networks SHOULD be resilient to
>>     their ISP provided IPv6 prefix changing, sometimes even
>>     unexpectedly changing.
>
>     What's the difference between what you said and "ISPs should, as
>     much as possible, reissue the same prefix to customers."?
>
>     Lee
>
>>
>>     Thanks.
>>
>>     On Fri, Feb 22, 2019 at 10:36 AM Lee Howard <lee@asgard.org
>>     <mailto:lee@asgard.org>> wrote:
>>
>>         I think I have heard the following suggestions in this
>>         conversation. I hope that taken all together, rather than as
>>         individual spot solutions, they can be a consensus
>>         recommendation.
>>
>>
>>         ISPs should, as much as possible, reissue the same prefix to
>>         customers. Some things ISPs can do to increase the chances of
>>         this:
>>
>>         1.
>>
>>             Share lease information between redundant DHCPv6 servers.
>>             Most ISPs probably have redundant servers, since this is
>>             critical provisioning infrastructure. It may be difficult
>>             to synch information between servers for millions of
>>             leases over tens of milliseconds of latency; see RFC6853,
>>             "DHCPv6 Redundancy Deployment Considerations." Maybe DHCP
>>             vendors can report.
>>
>>         2.
>>
>>             Aggregate above the provider edge device, so that
>>             grooming customers between Provider Edge boxes (PEs)
>>             doesn't force a renumbering. It's been a few years since
>>             I worked on CMTSs, but when I did they did not support
>>             MP-BGP well (if at all), so routes had to be aggregated
>>             on the PE, or leaked in the IGP which is bad for
>>             convergence time. Maybe PE vendors can report.
>>
>>         3.
>>
>>             Set DHCPv6 lease timers very low prior to grooming
>>             events. A short interval during the maintenance window
>>             will increase load on the DHCPv6 server until timers have
>>             been returned to normal values.
>>
>>         4.
>>
>>             In the case of a PE reboot, use DHCPv6 Bulk Leasequery to
>>             rebuild the routing table. I think all of the necessary
>>             information is in those responses. Again, last time I was
>>             working on CMTSs, this feature was not supported. Maybe
>>             PE vendors can report.
>>
>>
>>         Networks should, as much as possible, be resilient to prefix
>>         changes. Some things networks can do to improve resilience:
>>
>>         1.
>>
>>             Write a learned prefix to non-volatile memory and issue a
>>             DHCPv6 Renew for that prefix on reboot.
>>
>>         2.
>>
>>             Use dynamic DNS and shorter TTLs.
>>
>>         3.
>>
>>             Implement something like NETCONF to distribute prefix
>>             information to policy devices like firewalls or SD-WAN
>>             controllers. I think a separate document describing this
>>             application of NETCONF would make sense.
>>
>>
>>         In the case of failures, it cannot be assumed that sessions
>>         will stay active. We try to build in redundancy and
>>         resilience where we can, but where there's a single point of
>>         failure (such as CE or PE), and it fails (such as an
>>         unplanned reboot), our expectations should be appropriate.
>>
>>         Is this a reasonable summary?
>>
>>         Lee
>>
>>
>>
>>
>>
>>
>>
>>         --------------------------------------------------------------------
>>         IETF IPv6 working group mailing list
>>         ipv6@ietf.org <mailto:ipv6@ietf.org>
>>         Administrative Requests:
>>         https://www.ietf.org/mailman/listinfo/ipv6
>>         --------------------------------------------------------------------
>>
>>
>>
>>     -- 
>>     ===============================================
>>     David Farmer Email:farmer@umn.edu <mailto:Email%3Afarmer@umn.edu>
>>     Networking & Telecommunication Services
>>     Office of Information Technology
>>     University of Minnesota
>>     2218 University Ave SE        Phone: 612-626-0815
>>     Minneapolis, MN 55414-3029   Cell: 612-812-9952
>>     ===============================================
>
>
>
> -- 
> ===============================================
> David Farmer Email:farmer@umn.edu <mailto:Email%3Afarmer@umn.edu>
> Networking & Telecommunication Services
> Office of Information Technology
> University of Minnesota
> 2218 University Ave SE        Phone: 612-626-0815
> Minneapolis, MN 55414-3029   Cell: 612-812-9952
> ===============================================