Re: Zone name changing via SNMP

John Norstad <j-norstad@nwu.edu> Wed, 31 March 1993 21:29 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa09807; 31 Mar 93 16:29 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa09803; 31 Mar 93 16:29 EST
Received: from cayman.cayman.com by CNRI.Reston.VA.US id aa21887; 31 Mar 93 16:28 EST
Received: by cayman.Cayman.COM (4.1/SMI-4.0) id AA19098; Wed, 31 Mar 93 14:43:26 EST
Return-Path: <j-norstad@nwu.edu>
Received: from merle.acns.nwu.edu by cayman.Cayman.COM (4.1/SMI-4.0) id AA19094; Wed, 31 Mar 93 14:43:21 EST
Received: from jlntoy.acns.nwu.edu by merle.acns.nwu.edu with SMTP (16.6/16.2) id AA08994; Wed, 31 Mar 93 13:39:48 -0600
Message-Id: <9303311939.AA08994@merle.acns.nwu.edu>
Date: Wed, 31 Mar 1993 13:42:29 -0600
To: APPLE-IP@cayman.com
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: John Norstad <j-norstad@nwu.edu>
X-Sender: jln@merle.acns.nwu.edu (Unverified)
Subject: Re: Zone name changing via SNMP

I was happy to see Fidelia Kuang's note that some of the Apple folks are at
least going to take a stab at zone changing via SNMP. I wish you the best
of luck! 

I have some more comments.

My main concern is to register my stongest possible vote for some kind of
reasonably quick solution to the zone changing problem, even if it is
imperfect. Today, even a simple operation like adding a new zone to the
existing zone list for a network is often a herculean task. Thank God that
at least our backbone Ciscos here at NU permit remote out-of-band
management. We also have FastPath 4 and 5 boxes, Gatorboxes, Compat. Sys.
boxes, Novell, AIR, Liason, and even Banyan Vines routers. It gets
complicated. Anything which would make this easier would be much
appreciated.

Tom Evans did his usual excellent job of exploring all the horrible
complexities of trying to do this "perfectly". Yes, I did read his long
note from many months ago, and I had it in mind when I wrote my note. I
also realize that Tom was not seriously proposing such an implementation,
but was rather trying to debunk the whole idea of zone changing via SNMP.
I'm also well aware that my "idea" is not original. (I call it an "idea"
because I hesitate to dignify it with the status of "proposal".)

Tom addresses a number of big problems in that old note. Permit me to
briefly summarize them in light of my note.

1. Router discovery. This is a management software issue, not a MIB or
router software issue. If I had a dumb console which couldn't even find all
the routers on a net, I'd set the necessary MIB variables in each router by
hand (or more likely, write a UNIX script to do it for me). As the internet
manager, I do know where all the routers are. (If I don't, I have bigger
problems than trying to change zone lists!) A smart console would of course
be nice, but it's not required.

2. Non-SNMP routers on the net being reconfigured. Yes, of course these
still must be done by hand. But again, this is not a MIB or router software
issue. Sure, it would be nice if my console could discover these for me and
alert me that they must be done by hand. But again, I'm willing to take
care of this myself if my console is not smart. I'd much rather have a
solution which did not address this problem than no solution at all.

3. Real-time reconfiguration. Much of the complexity in Tom's note is a
direct consequence of the fact the he is trying to reconfigure the boxes in
real time. He has to deal with all the ugly problems of retaining
connectivity with the boxes in question during the reconfiguration. Timed
suicide avoids all of these issues. That's the reason I used timed suicide
in my note. The ability to schedule odd hour reconfigurations is just an
extra benefit.

4. End-node recovery on the net being reconfigured. My idea does not
accomplish this worthwhile goal. As Tom outlines, achieving this requires a
significantly more sophisticated algorithm in the router code. Again, if we
could get this, I'd be happy, but I'm more than willing to require restarts
of all AppleTalk nodes on the net if that's all I can get. Also note that
adding a new non-default zone name to a zone list does not require
restarts. This is the kind of reconfiguration I most often need to do
anyway. (Once a zone name has been assigned and starts being used by users,
I find it impossible in practice to change it anyway, because so many user
Macs have aliases and other kinds of "net object pointers" using the old
zone name. I can't imagine any reasonable proposal which would address this
issue. This is not a problem when adding a new zone name or changing
network numbers.)

5. Minimizing the reconfiguration period. Tom's note outlines an algorithm
where the console monitors the internet to discover when the network being
reconfigured has aged out of all the routing tables in all the routers on
the internet. In my idea, I have the person doing the reconfiguring supply
the reconfiguration delay time. Again, in my idea I have sacrificed
functionality for the sake of implementation simplicity. 

Taking all this together, we end up with something which I feel is
imperfect but adequate, much better than what we have today (nothing), and
above all something which could be implemented, tested, and formally
specified as part of a MIB in a reasonable amount of time.

More comments:

Yes, I guess we would have to make use of a timer which counts down to the
reconfiguration date/time, rather than an absolute date/time. Too bad, but
still workable. Kind of ugly without the help of software in the management
console to do the date/time subtraction, however. If I had to, I'd probably
write a simple UNIX script to do this work for me.

I think we should keep the pending network number range start and range end
per-port variables. As Tom pointed out in his note, changing network
numbers in real time presents some of the same problems as changing zone
lists (although not as many), and that's why I included network number
changing in my note. Again, using timed suicide to change network numbers
avoids the problems.

The seed vs. non-seed variable can be nuked. Indeed, according to Inside
AT, configuring a port with a net range of 0-0 indicates non-seed anyway.
(Do routers actually implement it this way?)

To summarize the very simplest version of my idea once again, from the
router's point of view: At the reconfiguration time (when the counter
reaches zero), the router deletes the network from its routing table and
marks the port "on-hold". In this simplest version, "on-hold" means
"disabled". The only change in the routing algorithm is that all routing
through the port is disabled. After the reconfiguration delay, the port is
reconfigured and the port initialization process is restarted.

This simplest version does nothing more than mimick what we have to do now
by hand: First shut down all the routers on the net (or shut down ports, if
the routers permit this), then wait for the network to age out across the
entire internet, then reconfigure and restart the routers (or ports) one at
a time.

After rereading the relevent part of Inside AT, I see that there's no need
to be concerned about seed vs. non-seed ports. If a non-seed port on the
net happens to come back up before any seed port, it will just wait for a
seed port to start broadcasting RTMP packets.

This scheme is absolutely minimal. Its the very simplest possible solution
to the problem. Tom's note outlines a very complex solution. There is a
whole range of possible solutions between these two extremes. For example,
using some of Tom's ideas we could perhaps avoid confusing end nodes on the
net being reconfigured. I also suggested a possible modification (and
increase in complexity) which would permit routing to continue "through"
the network during the reconfiguration period. (This isn't as simple as I
originally thought, though.)

As an internet manager, I would much rather see the minimal solution or a
weak solution implemented soon than have no solution at all, or have to
wait a long time for a strong solution. As a customer of several router
vendors, I'd be willing to see a delay of a few months in getting my hands
on the new MIB if it meant I could get some sort of solution to the zone
changing problem as part of the new MIB.

In any case, it seems reasonable to attack a problem this complex by
beginning with as many simplifying assumptions as possible, then adding
refinenments based on experience. That's how I'd start, anyway.

Tom Evans told me that he worries that this scheme is just too complicated
and error prone for dumb customers to understand and use, too hard for the
vendors to document, and too hard for the vendors to support. Perhaps. I do
understand his concern. I'd just hate to be denied access to a powerful and
very useful tool just because other people have trouble using it properly.

Even with my approach, which avoids all the really tough problems, I think
decent management software could go a long way to minimize user complexity.
I can easily imagine a relatively simple Mac program which would do router
discovery, warn the user about non-SNMP routers on the net, and present a
very simple interface for making changes to the net's configuration. The
net configuration window would display the current network number range and
zone list and let the user edit them. It would also let the user specify
the date and time at which the changes should occur and the reconfiguration
delay time. That's about it in terms of the human interface. This could be
a standalone program or part of a more sophisticated SNMP management
console software package.

Tom also mentioned the problem of a router going down and being restarted
between the time the user configures the timed change and the time when the
change is scheduled to take place. In this case, the router has lost all
the reconfiguration information, and big problems result. The timed suicide
will work properly only if all the routers on the net participate on
schedule.

One possibility Tom explored in his note would be to make one router a seed
(the most reliable one) and the others non-seed just for the duration of
the reconfiguration.

Another possibility would be to have the management software monitor the
confured routers and make certain they retain their settings right up to
the scheduled reconfiguration time. This approach would also help keep the
timers sychronized. 

Indeed, the entire collection of reconfiguration information could be kept
in the management software, and not actually sent to the routers until 10
minutes or so before the scheduled reconfiguration time. If any of the
routers are down at this point, the management software could easily back
out of the whole operation. I like this solution to the problem the best.
It doesn't eliminate the "catastrophe window", but it makes it much
smaller.

I attended a presentation by Gary Hornbuckle where he talked about the "law
of the conservation of complexity" (he was quoting someone else). This is a
good example. Changing the zone list for a network on an active internet is
very complex. Different solutions distribute the complexity differently
between the routing protocol, the MIB, the router software, the management
software, and the user. Currently, the user is burdened with almost all of
the complexity. Under my minimal approach, we accomplish a great deal (if
not everything we'd like) by adding no complexity at all to the routing
protocol and very little complexity to the MIB and the router software.
Much of the remaining complexity can be handled by good management
software.

One way of looking at Tom's note is that it attempts to remove the greatest
possible amount of complexity from the user, in the grand tradition of the
Mac and AppleTalk. As Tom demonstrated quite convincingly, however, this is
only possible at the expense of enormous increased complexity in the other
components of the system. My much more modest scheme distributes the
complexity differently, with a bit more complexity for the user, but I
think it's a much more balanced approach.

And remember that "user" in this context means "internet manager", not "end
user" or even "LAN manager". With decent management software as outlined
above, our "user" needs to be aware of the kind of connectivity loss which
his internet will experience during the reconfiguration period, he needs
guidance in specifying the reconfiguration delay time, he needs to know
under what circumstances end nodes will need to be restarted after the
reconfiguration, and he needs to be aware of what kinds of things can go
wrong. Is this really too much to expect, or too difficult to document and
support? Is it even all that much better under Tom's scenario? It's
certainly much better than the current situation, where he has do all the
reconfiguration by hand in addition to dealing with the problems mentioned
above.

I apologize for the length of this note. I think I've made the points I
wanted to make, and I thank you for listening. I look forward to hearing
about what happens with the Apple experiment.

John Norstad
Academic Computing and Network Services
Northwestern University
j-norstad@nwu.edu