Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-autoconfig-05.txt

"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Tue, 11 February 2014 06:57 UTC

Return-Path: <ginsberg@cisco.com>
X-Original-To: ospf@ietfa.amsl.com
Delivered-To: ospf@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 98CD71A07AF for <ospf@ietfa.amsl.com>; Mon, 10 Feb 2014 22:57:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -15.049
X-Spam-Level:
X-Spam-Status: No, score=-15.049 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.548, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3SzLttYsqKEy for <ospf@ietfa.amsl.com>; Mon, 10 Feb 2014 22:57:08 -0800 (PST)
Received: from rcdn-iport-8.cisco.com (rcdn-iport-8.cisco.com [173.37.86.79]) by ietfa.amsl.com (Postfix) with ESMTP id 2F2D41A0798 for <ospf@ietf.org>; Mon, 10 Feb 2014 22:57:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=35122; q=dns/txt; s=iport; t=1392101828; x=1393311428; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=RdyXTvJzOAOipfITUwaGtYFdGMzfcL5cZZ3Y7wIYnls=; b=lbZbNOyIpEsQF5k8ZrOWmrhM+qJz/dWa3LO5wQyVh1ex02plfb0ZU66S K6aVGhuETXAX6Y0vVM4QpzIgkTv4KzXjhRSn1DXHCArJJ2xHhxtptpQVO 5fJ+0Nmhyj27Bf0bkVbEbSdV5bd4osqNRIi+iXEvDgB6mAvGZIwiZmmdG A=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ag4FABPJ+VKtJV2a/2dsb2JhbABQAwaDDDhXvmSBDRZ0giUBAQEDARoBHz8MBAIBCBEEAQEBChQJBzIUCQgCBA4DAggBEAIEh14IDclFF44VDQsSBisHAgSDHoEUBIxhkwyKXYMtQIFq
X-IronPort-AV: E=Sophos;i="4.95,824,1384300800"; d="scan'208";a="303219679"
Received: from rcdn-core-3.cisco.com ([173.37.93.154]) by rcdn-iport-8.cisco.com with ESMTP; 11 Feb 2014 06:57:06 +0000
Received: from xhc-aln-x12.cisco.com (xhc-aln-x12.cisco.com [173.36.12.86]) by rcdn-core-3.cisco.com (8.14.5/8.14.5) with ESMTP id s1B6v6wh025972 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Tue, 11 Feb 2014 06:57:06 GMT
Received: from xmb-aln-x02.cisco.com ([169.254.5.180]) by xhc-aln-x12.cisco.com ([173.36.12.86]) with mapi id 14.03.0123.003; Tue, 11 Feb 2014 00:57:06 -0600
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: "curtis@ipv6.occnc.com" <curtis@ipv6.occnc.com>
Thread-Topic: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-autoconfig-05.txt
Thread-Index: AQHPJpsn/pYRbYVsxUaPEbmLjnjkwpqvmB4w
Date: Tue, 11 Feb 2014 06:57:05 +0000
Message-ID: <F3ADE4747C9E124B89F0ED2180CC814F23C64496@xmb-aln-x02.cisco.com>
References: Your message of "Mon, 10 Feb 2014 08:28:10 +0000." <F3ADE4747C9E124B89F0ED2180CC814F23C6349B@xmb-aln-x02.cisco.com> <201402102003.s1AK3PQZ034418@maildrop2.v6ds.occnc.com>
In-Reply-To: <201402102003.s1AK3PQZ034418@maildrop2.v6ds.occnc.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.21.70.211]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: OSPF List <ospf@ietf.org>
Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-autoconfig-05.txt
X-BeenThere: ospf@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: The Official IETF OSPG WG Mailing List <ospf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ospf>, <mailto:ospf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ospf/>
List-Post: <mailto:ospf@ietf.org>
List-Help: <mailto:ospf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ospf>, <mailto:ospf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Feb 2014 06:57:13 -0000

Curtis -

Inline...

> -----Original Message-----
> From: Curtis Villamizar [mailto:curtis@ipv6.occnc.com]
> Sent: Monday, February 10, 2014 12:03 PM
> To: Les Ginsberg (ginsberg)
> Cc: curtis@ipv6.occnc.com; Acee Lindem; OSPF List
> Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-
> autoconfig-05.txt
> 
> 
> In message <F3ADE4747C9E124B89F0ED2180CC814F23C6349B@xmb-aln-x02.cisco.com>
> "Les Ginsberg (ginsberg)" writes:
> 
> > Curtis -
> >
> > I think we are converging.
> >
> > Some context from my side...
> >
> > I am fully aware that this draft is about Homenet environments, but
> > there is a suspicion in the back of my mind that once the duplicate-id
> > resolution mechanism is defined and deployed that folks may want to
> > use it in other environments e.g. TRILL has targeted auto-config as a
> > goal (just an example). You may remember a few years ago a proposal to
> > automatically resolve system-id conflicts was discussed in IS-IS
> > WG. The proposal had a lot of flaws and we shot it down - but it does
> > suggest that some folks may want to use such a mechanism in other
> > types of deployments someday. So I would like to define things such
> > that it is robust enough to be used elsewhere. And since what I am
> > proposing is quite simple I don't think it unduly burdens the Homenet
> > environments.
> 
> Providers tend to configure anything that is expected to be static.
> At least the smart ones do.
> 
> But then maybe smart people don't run trill.  :-)
> 
> I think the prior IS-IS autoconfig may have been motivated by IEEE use
> of IS-IS for bridging.  Again, not something providers use.
> 
> > As regards preserving router-id across reboots - sure - that is a good
> > idea also. And what I am proposing is supportive of that because it
> > guarantees that so long as an existing router's LSAs are in the LSDB
> > (even if it is currently undergoing maintenance) any new router that
> > comes up (or even another old router that reboots and is not so well
> > behaved as to remember the router-id it previously used) will not take
> > the router-id of any router seen in the LSDB (reachable or not). This
> > is better than the existing logic which leaves the decision to chance.
> 
> The router today that can't remember anything between reboots is
> likely to be a very low end home networking device with no
> non-volitile strorage at all.  Anything running OSPF or ISIS is likely
> to be at least keeping a code image in flash and could keep a
> router-id there too.

There are various reasons the router-id may not be remembered across reboot.

The router may be low end w/o NVRAM.
The router might have a bug in its implementation.
The NVRAM might have gotten corrupted.

But we need not care. The point is we can still minimize disruption to the network by using the old/new paradigm - and that is a good thing.

It is, of course, also a good thing for a router to retain its old identity following reboot - I fully support that - and the old/new paradigm is complementary to the retention of router-id across reboots.

> 
> > More inline.
> >
> > > -----Original Message-----
> > > From: Curtis Villamizar [mailto:curtis@ipv6.occnc.com]
> > > Sent: Sunday, February 09, 2014 12:39 PM
> > > To: Les Ginsberg (ginsberg)
> > > Cc: curtis@ipv6.occnc.com; Acee Lindem; OSPF List
> > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-ospfv3-
> > > autoconfig-05.txt
> > >
> > >
> > > Les,
> > >
> > > Perhaps you should read the abstract of the document you are
> > > commenting about:
> > >
> > >    SPFv3 is a candidate for deployments in environments where auto-
> > >    configuration is a requirement.  One such environment is the IPv6
> > >    home network where users expect to simply plug in a router and have
> > >    it automatically use OSPFv3 for intra-domain routing.  This
> > >    document describes the necessary mechanisms for OSPFv3 to be
> > >    self-configuring.
> > >
> > > Home network!
> > >
> > > Or the introductio:
> > >
> > >    OSPFv3 [OSPFV3] is a candidate for deployments in environments
> > >    where auto-configuration is a requirement.
> > >
> > >    [...]
> > >
> > >  1.2. Acknowledgments
> > >
> > >       This specification was inspired by the work presented in the
> > >       Homenet working group meeting in October 2011 in Philadelphia,
> > >       Pennsylvania.
> > >
> > > The Homenet WG works on what?  Home networks!
> > >
> > > So please keep that in mind when commenting.
> > >
> > > Unless a provider were to be so stupid or lazy to use this on a SP
> > > network then most of the comments from both of us don't apply,
> > > *except* the few comments below about "in a home network".
> > >
> > > Perhaps the draft should add text explicitly stating that the last
> > > router-id used successfully should be used on a reboot rather than a
> > > new random number.  I notice that only the Router-Hardware-Fingerprint
> > > TLV is persistent across reboots.  This is insufficient if we want to
> > > minimize disruption.
> > >
> > > The only case then (if router-id is remembered across reboots) would
> > > be a new router.
> >
> > Also a router which fails to remember its old router-id across a reboot.
> 
> See above.  Seems very broken to not be able to remember a router-id
> across reboot.  What today doesn't have any flash at all?  If
> anything, what in an SP net?  (nothing).
> 
> > > In that case your uptime rule would help.  So
> > > perhaps two things could be reocmmended:
> > >
> > >   1.  In section 4, include a "SHOULD remember the most recent
> > >       successfully used router-id across reboots and reuse that".
> > >       Reword the rest so if that information is not available, then
> > >       pick a random number.
> >
> > Fine with me.
> >
> > >
> > >   2.  a.  In section 6, mention the uptime rule.  Modify the Router
> > >           Uptime TLV as suggested.
> > >
> > >       b.  Alternately add a flag to the Router-Hardware-Fingerprint
> > >       	  TLV that indicates that since last reboot this router-id has
> > >       	  been used and acheived a "full state".  A router just
> > >       	  rebooting would not have ever reached the full state before
> > >       	  noticing a conflict as long as the conflct check is run
> > >       	  before considering itself in the full state.
> >
> > Yes - this is what I had in mind. Also note we need a flag in hellos
> > as well - for which I had proposed using an option bit (or LLS if
> > folks don't want to consume an options bit).
> 
> I'm not sure for OSPF at least why you need a flag in hello.

You need a flag in the hello because you cannot form an adjacency to "yourself". So if duplicate router-id occurs with a neighbor it needs to be resolvable w/o depending on having the LSDB. This is also why Sections 6.1 and 6.2 exist in the draft.

> 
> > But what is your definition of "full state"? It cannot be just having
> > reached "full state" with a single neighbor as it is possible the
> > first neighbor that comes up might also be in the process of coming up
> > itself and does not yet have the full LSDB. What I had in mind was a
> > short but sufficient time that if we had been up for that long we
> > could be comfortable that our existence was known network-wide. I had
> > mentioned 20 minutes - but that was quite a conservative number - I
> > think we could safely be more aggressive (5 minutes??). Once that
> > period had passed we set the flag and leave it set. And if we are
> > smart enough to reuse the same router-id following reboot when we get
> > our own Fingerprint from our old incarnation we will see that the flag
> > is set and can therefore set it immediately following reboot without
> > waiting for 5 minutes.
> 
> The full state is a state in OSPF when the LSDB download between a
> pair of routers has completed.
> 
> The assumption is that we have a "full state" bit in the
> Router-Hardware-Fingerprint TLV defined above.  Taking in your concern
> over becoming full with another router that doesn't have a full LSDB,
> procedure when the router is not full is then:
> 
>   1.  An adjacency becomes "full" (formal state in OSPF).
> 
>   2.  Check to see if the neighbor router is "full" in its
>       Router-Hardware-Fingerprint TLV:
> 
>       a.  If not, and other neighbor adjacncies are not in the full
>           state wait for others neighbor states to become full.
> 
>       b.  If all neighbor adjacencies are in the full state, and none
>           of the neighbor routers is in the full state in its
>           Router-Hardware-Fingerprint TLV, then continue.
> 
>       c.  If the neighbor's Router-Hardware-Fingerprint TLV indicates
>           the full state, continue.
> 
>   3.  Check for router-id collisions.
> 
>       a.  If no collision, set full and done.
> 
>       b.  If a collision, continue.
> 
>   4.  Check the flags in the Router-Hardware-Fingerprint TLV of the
>       router for which a collision has occurred.
> 
>       a.  If the collision is with a full router, then pick new
>           router-id and start over.
> 
>       b.  If the collision is with a router that is marked as having
>           been use successfully in the past and the router-id being
>           used by this router was a random pick, then pick new
>           router-id and start over.
> 
>       c.  Otherwise it is a tie and continue.
> 
>   5.  In the event of a tie, use the tie breaker rules as defined in
>       the existing draft (lowest fingerprint wins).
> 
>   Don't readvertise any LSDB entry from a neighbor router until it
>   resends its Router-Hardware-Fingerprint TLV with the full state
>   indicated.  This will avoid a collision prior to the newly rebooted
>   router reaching the full state and possibly giving up on its first
>   pick or router-id.
> 
>   If a neighbor for which the adjacency is in the full state indicates
>   that it has become "full", go to step 3 if not already in the full
>   state.
> 
> Note that 2b covers the case where no router in the entire network has
> declared itself in the full state (the network epoch).  2a covers most
> cases where the neighor has a partial LSDB.
> 
> This could have been a bit simpler if we didn't want to cover most
> cases where the neighbor didn't have a full LSDB.

This is unnecessarily complex - and introduces some requirements that Acee has quite understandably expressed concern about.

Old/new state should be based quite simply on a minimum uptime - say 5 minutes. If I can't acquire the full LSDB in 5 minutes the network is already compromised in some way - having to perform an unnecessary router-id change is the least of our problems.

Also note that the persistence of a router-id for a router that had been operational but is currently undergoing maintenance lasts for as long as its LSAs persist in the LSDB (one hour) i.e., even if a new router detects a duplicate router-id based on an LSA originated by a router that is currently down it follows the old/new rules in determining whether it should change its router-id.

If a router is down for longer than one hour it no longer has a "reservation" on the router-id it previously used - so it is possible (though of course still unlikely) that a new router could come along and usurp the router-id of the router undergoing maintenance after its LSAs have been purged. This is fine and even preferable as whenever the old router finally comes back to life it will be more disruptive to force the other router which "has become old" to change its router-id.

> 
> > The significance of the time interval is only to define the period
> > during which if we are unlucky enough to have two new routers come up
> > within that interval and happen to pick the same router-id that we
> > will defer to the fingerprint/link local address tie breaker i.e. both
> > routers are considered "new" and so neither one has staked a claim
> > yet.
> 
> The above can do this without a time interval.  With just flag bits,
> there is no need to readvertise any LSA to update a time interval,
> only a need to update the "full state" bit after getting to the full
> state.


I have stated repeatedly that I do not want to advertise an "interval" that needs periodic updating. I simply want a boolean that indicates greater than or less than the minimum interval has passed since a router came up and successfully chose a router-id (i.e. no duplicates detected).

Please take note of my position on this. :-)

   Les

> 
> > If you and I are now in consensus (as I think we are), it is time for
> > the authors of the draft to weigh in and if they agree update the
> > draft with the specifics.
> >
> >    Les
> 
> Very rough concensus but only that the problem is easily solvable, not
> on the specific solution.
> 
> Curtis
> 
> > >           Note: A second flag bit indicating that this router-id had
> > >       	  been used successfully in a past reboot might also help but
> > >       	  would only matter among two routers both rebooting and
> > >       	  neither having reached the full state.
> > >
> > > I think #1 above is sufficient and does more to prevent surprises.  I
> > > think #2 above helps only in the new router case but #2a requires
> > > adding a TLV and so isn't worth it IMHO.  Case #2b accomplished the
> > > same thing with only a flag.  I would not object to #2b above if #1
> > > above is also added.
> > >
> > > See inline anyway.
> > >
> > > In message <F3ADE4747C9E124B89F0ED2180CC814F23C62C22@xmb-aln-
> x02.cisco.com>
> > > "Les Ginsberg (ginsberg)" writes:
> > > >
> > > > Curtis -
> > > >
> > > > > -----Original Message-----
> > > > > From: Curtis Villamizar [mailto:curtis@ipv6.occnc.com]
> > > > > Sent: Saturday, February 08, 2014 7:30 AM
> > > > > To: Les Ginsberg (ginsberg)
> > > > > Cc: curtis@ipv6.occnc.com; Acee Lindem; OSPF List
> > > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-
> ospfv3-
> > > > > autoconfig-05.txt
> > > > >
> > > > >
> > > > > In message <F3ADE4747C9E124B89F0ED2180CC814F23C621A3@xmb-aln-
> > > x02.cisco.com>
> > > > > "Les Ginsberg (ginsberg)" writes:
> > > > >
> > > > > > Curtis -
> > > > > >
> > > > > > Your reply below is talking about things which I think do not
> directly
> > > > > > bear on the value add of what I have proposed.
> > > > > >
> > > > > > You mention various ways to insure that a given device assigns the
> > > > > > same router-id each time it starts up and ways to insure it picks
> the
> > > > > > same sequence of second/third... choices in the event it has to
> change
> > > > > > its router-id. All good suggestions, but what I am talking about is
> > > > > > what we do in the event a conflict occurs despite our best efforts
> to
> > > > > > avoid it. With the current draft content preference is based solely
> on
> > > > > > a fixed identifier (fingerprint) without regard to which choice
> would
> > > > > > minimize disruption to the network. When preference is given to the
> > > > > > "old router" to retain its existing router-id this shortcoming is
> > > > > > addressed.
> > > > >
> > > > > In the lifetime of a router it only gets added once.  In the lifetime
> > > > > of a router we would hope it only reboots zero time but experience so
> > > > > far has been that reboots over a router's lifetime tend to be > 0 and
> > > > > in some cases >> 0.
> > > > >
> > > > > So you are optimizing for a 1 in 4 billion occurance that can happen
> > > > > only once in the lifetime of a router.
> > > >
> > > > The entire duplicate router-id resolution logic is addressing the
> > > >  improbable case. My proposal adds - literally - one line of code to
> > > >  the logic used to decide whether "I" should change my router-id or
> > > >  whether "you" should change your router-id.
> > > >
> > > > >
> > > > > We also need to look at the consequences of this very improbably
> > > > > occurance.  Today's routers accomplish IGP convergence in large
> > > > > networks in subsecond times, in some cases << 1 second.
> > > > >
> > > > > Note that if flooding is completed (both withdraw old and install
> new)
> > > > > in less than the SPF delay which is commonly implemented (some delay
> > > > > after receiving the first flooded IGP change), then there is no
> impact
> > > > > on routing.
> > > >
> > > > Your analysis does not apply to this scenario. The router which
> > > >  changes its router-id is effectively doing a cold start. All
> > > >  adjacencies will go down. All LSAs originated by this router become
> > > >  invalid. All routes will be removed from the forwarding plane. If
> > > >  you are running BGP all the BGP nexthops will be gone on the router
> > > >  which is changing its identity. Restoration of the adjacencies and
> > > >  reacquisition of the LSDB will take multiple seconds. The best you
> > > >  can hope for is several seconds of disruption - it could easily be
> > > >  much longer.
> > > >
> > > > For the new node which has usurped the old node's identity it will
> > > >  have to purge/replace all of the LSAs generated by the old
> > > >  node. While normal operation of the update process will insure that
> > > >  this happens in a reliable way the amount of flooding network-wide
> > > >  required to bringup a new node has now been roughly doubled
> > > >  i.e. the old node must reissue all of its LSAs using a new identity
> > > >  and the new node must purge/replace the old node's LSAs with its
> > > >  own versions. This will result in multiple SPFs on all nodes in the
> > > >  network and likely cause loops/blackholes during the transition
> > > >  since some of the SPFs will be run on versions of the LSDB which
> > > >  are inaccurate (part old node's old LSAs and part new node's
> > > >  LSAs). Suggesting that this could be handled in the same way/time
> > > >  as we typically handle a single link failure isn't credible.
> > >
> > > All routers are supposed to keep a fixed router-id across reboots.  If
> > > interfaces are changed when down, the last used router-id should be on
> > > flash.  If flash is removed and replaced (rather than a new image
> > > installed), then with the same set of interfaces, the same decision
> > > should be made.  We are down to a very special case where both flash
> > > and interfaces are removed and replaced yielding no history and a
> > > different set of MACs to pick from.
> > >
> > > > > > Your statement that what I propose is only relevant when two
> routers
> > > > > > go down does not match the scenarios I envision. If I want to add a
> > > > > > new device to my network or if I need to replace an existing device
> in
> > > > > > my network I am only affecting one device - but as I am introducing
> a
> > > > > > device with a new fingerprint it is possible that it will introduce
> a
> > > > > > conflict with an existing router-id.
> > > > >
> > > > > In provider networks routers are generally added during maintenance
> > > > > windows so should anything unexpected happen, impact is minimized.
> > > > >
> > > > > In home nets, the home user isn't going to notice the convergence
> time
> > > > > if there is any.  A 10 msec SPF delay is likely to be plenty.
> > > >
> > > > As I stated above, disruption will be orders of magnitude longer than
> 10
> > > ms.
> > >
> > > In a home net?  With perhaps a half dozen routers and a default route?
> > > Someone has a very bad OSPF implementation.  :-)  Or did you miss the
> > > "In home nets" at the front of the paragraph.
> > >
> > > For example, in a 10 node network with average degree 4, perhpas 40
> > > links in 10 router LSA exist.  A few RTT (less than 1 msec for a
> > > homenet) for each neighbor adjacency (which happen in parallel) and
> > > ten packets from 4 sources is needed to reach the full state followed
> > > by one SPF to be fully up and running.  Other routers get one
> > > additional router LSA plus four new links in existing router LSA and
> > > have to run an SPF.  Even on a software based homenet router using an
> > > ARM, 10 msec is likely to be enough time and if it is "orders of
> > > magnitude" longer, something is wrong with one of the implementations.
> > > This would be an more complicated than usual home net or even soho,
> > > more likely a small business.
> > >
> > > > > > In a subsequent reply you liked the idea of the new device delaying
> > > > > > advertising reachability until it is has determined that its
> router-id
> > > > > > choice is not in conflict. The old/new router paradigm supports
> this
> > > > > > strategy by assuring that the old router will not consider changing
> > > > > > its router-id until enough time has elapsed for the new router to
> > > > > > transition to being an old router.
> > > > >
> > > > > If it wins the coin toss, the router would advertise at least one LSA
> > > > > to indicate its existance and could hold back on any additional
> > > > > advertisements until the other router has withdrawn routes.
> > > > >
> > > >
> > > > This suggestion does not alter the fact that if the old node changes
> > > > > > its router-id the network has to respond to three events:
> > > >
> > > > 1)Loss of the old node
> > > > 2)Introduction of the old-node with a new identity
> > > > 3)Introduction of the new node with the identity of the old-node
> > >
> > > Again, the old node should remember the last router-id used and try to
> > > reuse it.
> > >
> > > > If however we insure that the old-node does not change its identity
> > > >  then the network only has to respond to a single event - the
> > > >  introduction of the new-node.
> > >
> > > Yes and if it were up and won the resolution last time, it will have
> > > saved that router-id and will reuse it.  If it came up previously and
> > > lost the resolution, then it will remember the router-id it used,
> > > whether second or third pick, and use that.
> > >
> > > > > > Finally, what I propose is extremely simple to implement. I think
> it
> > > > > > isn't much of an exaggeration to say that any one of us could have
> > > > > > implemented the enhancement in the time it has taken to discuss its
> > > > > > merits. So we aren't overengineering for a case which is admittedly
> > > > > > very unlikely to occur - we are adding a modest extension to make
> our
> > > > > > solution less disruptive.
> > > > >
> > > > > Yes but it it *bad* for the more common case where routers go down
> > > > > occasionally.
> > > >
> > > > You are going to have to clarify exactly what "bad side effects" you
> > > > see for what I propose - because I don't see any - whereas I do
> > > > see benefits as described above.
> > >
> > > If router-id is not remembered between reboots, then there is the one
> > > in 4 billion time number of routers (less than 10 for a home net
> > > today, but maybe more in the future).
> > >
> > > If router-id is remembered between reboots, then no matter how long a
> > > router has been down, if nothing else in the network changed, there is
> > > zero chance of having a collision.
> > >
> > > With either method, if router-id is remembered between reboots, then
> > > there is zero chance of collision.
> > >
> > > IMO should this ever be used on a managed network (including a home
> > > net / soho / small business net that happens to be managed) then
> > > having routers come back from a reboot with the same router-ids would
> > > be a big plus.  For example, after a power outage NMS discovery would
> > > not have to be repeated.
> > >
> > > >    Les
> > > >
> > > >
> > > > >
> > > > > >    Les
> > > > >
> > > > > Curtis
> > > > >
> > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Curtis Villamizar [mailto:curtis@ipv6.occnc.com]
> > > > > > > Sent: Friday, February 07, 2014 9:22 AM
> > > > > > > To: Les Ginsberg (ginsberg)
> > > > > > > Cc: Acee Lindem; Curtis Villamizar; OSPF List
> > > > > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-ospf-
> > > ospfv3-
> > > > > > > autoconfig-05.txt
> > > > > > >
> > > > > > >
> > > > > > > In message <F3ADE4747C9E124B89F0ED2180CC814F23C619A9@xmb-aln-
> > > > > x02.cisco.com>
> > > > > > > "Les Ginsberg (ginsberg)" writes:
> > > > > > > >
> > > > > > > > So, I am one person who raised this concern to Acee - but the
> > > proposal
> > > > > > > > outlined by Acee is not what I had in mind. There is no need to
> use
> > > > > > > > "uptime" or to invent some unusual exchange of LSAs prior to
> > > Exchange
> > > > > > > > state.
> > > > > > > >
> > > > > > > > Also, in regards to Curtis's comment - it is not DOS attacks
> that I
> > > am
> > > > > > > > trying to mitigate here. As he says if an attacker is in your
> > > network
> > > > > > > > and able to originate credible packets no strategy is safe.
> > > > > > > >
> > > > > > > > The motivating use case is to minimize disruption of a stable
> > > network
> > > > > > > > when a new router is added or an existing router is
> > > > > > > > replaced/rebooted. In other words non-disruptive handling of
> the
> > > > > > > > common maintenance/upgrade scenarios.
> > > > > > > >
> > > > > > > > What I have in mind is this:
> > > > > > > >
> > > > > > > > 1) A router needs a way to advertise that it has been up and
> > > running
> > > > > > > >    for a minimum length of time - for the sake of discussion
> let's
> > > say
> > > > > > > >    20 minutes. Routers then fall into two categories:
> > > > > > > >
> > > > > > > >   o Old routers (up >= minimum time)
> > > > > > > >   o New routers (up < minimum time)
> > > > > > > >
> > > > > > > > 2) When a duplicate router-id is detected, the first tie
> breaker is
> > > > > > > >    between old routers and new routers. The old router gets to
> keep
> > > > > > > >    its router-id and the new router picks a new router-id.  If
> both
> > > > > > > >    routers are "new" or both routers are "old" then we revert
> to
> > > the
> > > > > > > >    existing tie breakers defined in the document (link local
> > > address
> > > > > > > >    for directly connected routers and fingerprint info for
> > > > > > > >    non-neighbors).
> > > > > > > >
> > > > > > > > 3) Advertisement of the "old/new" state requires a single bit -
> but
> > > it
> > > > > > > >    has to be available both in hellos and the new AC-LSA.
> Adding it
> > > to
> > > > > > > >    the AC-LSA is easy to do. For hellos, there are two
> > > possibilities:
> > > > > > > >
> > > > > > > >    o Use one of the Options Bits
> > > > > > > >    o Use LLS
> > > > > > > >
> > > > > > > > Be interested in how folks feel about this.
> > > > > > > >
> > > > > > > >    Les
> > > > > > >
> > > > > > >
> > > > > > > Les,
> > > > > > >
> > > > > > > Excluding DoS attack, we are talking about a one in 4 billion
> case
> > > > > > > (for any two routers, so with 400 routers, still well under one
> in
> > > 1M)
> > > > > > > where two routers hash a MAC address or pick a one time random
> number
> > > > > > > from out of nowhere and end up with the same number.
> > > > > > >
> > > > > > > If that does happen (and one in 1M is certainly possible), then
> it
> > > > > > > would be nice if the routers always ended up with the same
> router-id.
> > > > > > > This could be accomplished by some fixed method such as hashing a
> > > > > > > constant with the first choice or router-id or using the router-
> id as
> > > > > > > a seed for the random number generator (which will pick the same
> > > > > > > sequence of random numbers each time).  If this is done, then a
> > > > > > > conflict would always produce the same set of next picks.  The
> set of
> > > > > > > routers in a given network would always end up with the same
> > > > > > > router-ids once they all came up and if only one went down at a
> time
> > > > > > > then it would always end up with the same router-id when it came
> up.
> > > > > > >
> > > > > > > Zero conf was mainly intended for unmanaged networks (motivated
> by
> > > > > > > work in the homenet WG).  In these small unmanaged networks it
> > > doesn't
> > > > > > > matter which router gets what router-id as long as they end up
> unique
> > > > > > > and convergence is in a reasonable time relative to keeping
> eyeballs
> > > > > > > happy.  It could be applied to enterprise or providers but in
> either
> > > > > > > case having the routers end up with the same router-ids would
> make
> > > for
> > > > > > > easier management.
> > > > > > >
> > > > > > > For your scenario to matter at all with current rules, both
> routers
> > > in
> > > > > > > the conflict would have to go down.  If only the one that is
> > > preferred
> > > > > > > goes down, the other is not going to change its router-id as a
> result
> > > > > > > so when it comes up it gets its first pick with no conflict.  If
> the
> > > > > > > one that was not preferred goes down, it comes up, sees a
> conflict
> > > and
> > > > > > > takes its second pick (loses the conflict every time).  It is
> only if
> > > > > > > they both go down and the one that normally loses the conflict
> comes
> > > > > > > up first that there is a change in router-id.  That too can be
> solved
> > > > > > > with a rule that you always come up with the last router-id used.
> > > > > > >
> > > > > > > Curtis
> > > > > > >
> > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: OSPF [mailto:ospf-bounces@ietf.org] On Behalf Of Acee
> > > Lindem
> > > > > > > > > Sent: Thursday, February 06, 2014 5:12 PM
> > > > > > > > > To: Curtis Villamizar
> > > > > > > > > Cc: OSPF List
> > > > > > > > > Subject: Re: [OSPF] OSPFv3 Autoconfiguration - draft-ietf-
> ospf-
> > > > > ospfv3-
> > > > > > > > > autoconfig-05.txt
> > > > > > > > >
> > > > > > > > > Hi Curtis,
> > > > > > > > > I agree and believe the significance of this use case where a
> new
> > > > > router
> > > > > > > is
> > > > > > > > > inserted into an auto-configured domain has been greater
> > > exaggerated.
> > > > > > > > > Thanks,
> > > > > > > > > Acee
> > > > > > > > > On Feb 5, 2014, at 3:58 PM, Curtis Villamizar
> > > <curtis@ipv6.occnc.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In message <CF17DD4E.2696B%acee.lindem@ericsson.com>
> > > > > > > > > > Acee Lindem writes:
> > > > > > > > > >
> > > > > > > > > >> The OSPFv3 autoconfiguration draft was cloned and
> presented in
> > > the
> > > > > > > > > >> ISIS WG (http://www.ietf.org/id/draft-liu-isis-auto-conf-
> > > 00.txt).
> > > > > In
> > > > > > > > > >> the ISIS WG, there was a concern that the resolution of a
> > > > > duplicate
> > > > > > > > > >> system ID did not include the amount of time the router
> was
> > > > > > > > > >> operational when determining which router would need to
> choose
> > > a
> > > > > new
> > > > > > > > > >> router ID. With additional complexity, we could
> incorporate
> > > router
> > > > > > > > > >> uptime into the resolution process. One way to do this
> would
> > > be
> > > > > to:
> > > > > > > > > >>
> > > > > > > > > >>     1. Add a Router Uptime TLV to the OSPFv3 AC-LSA. It
> would
> > > > > include
> > > > > > > > > >>        the uptime in seconds.
> > > > > > > > > >>
> > > > > > > > > >>     2. Use the Router Uptime TLV as the primary
> determinant in
> > > > > > > > > >>        deciding which router must choose a new OSPFv3
> Router
> > > > > > > > > >>        ID. Router uptimes less than 3600 (MaxAge) seconds
> > > apart
> > > > > are
> > > > > > > > > >>        considered equal.
> > > > > > > > > >>
> > > > > > > > > >>     3. When an OSPFv3 Hello is received with a different
> link-
> > > > > local
> > > > > > > > > >>     	source address but a different router-id, unicast the
> > > > > OSPFv3
> > > > > > > > > >>     	AC-LSA to the neighbor so that OSPFv3 duplicate
> > > router
> > > > > > > > > >>     	resolution can proceed as in the case where it is
> > > received
> > > > > > > > > >>     	through the normal flooding process. This is somewhat
> > > of a
> > > > > > > > > >>     	hack as the we'd also need to accept OSPF Link State
> > > > > Updates
> > > > > > > > > >>     	from a neighbor that is not in Exchange State or
> > > greater.
> > > > > > > > > >>
> > > > > > > > > >> An alternative to #3 would be to use Link-Local Signaling
> > > (LLS)
> > > > > for
> > > > > > > > > >> signaling the contents of the OSPFv3 AC-LSA. However,
> you'd
> > > only
> > > > > want
> > > > > > > > > >> to send the Router-Uptime and Router Hardware Fingerprint
> when
> > > a
> > > > > > > > > >> duplicate Router-ID is detected. This requires
> implementing
> > > the
> > > > > > > > > >> resolution two ways but may be preferable since it doesn't
> > > require
> > > > > > > > > >> violating the flooding rules.
> > > > > > > > > >>
> > > > > > > > > >> In any case, I'd like to get other opinions as to whether
> this
> > > > > problem
> > > > > > > > > >> is worth solving.
> > > > > > > > > >>
> > > > > > > > > >> Thanks,
> > > > > > > > > >> Acee
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Acee,
> > > > > > > > > >
> > > > > > > > > > If the basis for router-id on boot up results in a fixed
> value,
> > > and
> > > > > if
> > > > > > > > > > a duplicate will occur on a give network, then which of two
> > > > > duplicate
> > > > > > > > > > routers gets that value may change after one of them
> reboots.
> > > If
> > > > > > > > > > uptime is not considered, it will never change as long as
> one
> > > > > router
> > > > > > > > > > stays up at any given time.
> > > > > > > > > >
> > > > > > > > > > We are talking about a very low probability event (a
> duplicate)
> > > > > except
> > > > > > > > > > if this is a DoS attack and then either using or not using
> > > uptime
> > > > > > > > > > won't matter since the attacker will claim an impossibly
> long
> > > > > uptime.
> > > > > > > > > >
> > > > > > > > > > Curtis