Re: Re:The MIB+ and Zone changes

Robert_Jeckell@3mail.3com.com Tue, 30 March 1993 18:30 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa04447; 30 Mar 93 13:30 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa04443; 30 Mar 93 13:30 EST
Received: from cayman.cayman.com by CNRI.Reston.VA.US id aa07778; 30 Mar 93 13:30 EST
Received: by cayman.Cayman.COM (4.1/SMI-4.0) id AA12340; Tue, 30 Mar 93 12:11:20 EST
Return-Path: <Robert_Jeckell@3mail.3com.com>
Received: from gatekeeper.3Com.COM by cayman.Cayman.COM (4.1/SMI-4.0) id AA12326; Tue, 30 Mar 93 12:11:06 EST
Received: from gw.3Com.COM by gatekeeper.3Com.COM with SMTP id AA28586 (5.65c/IDA-1.4.4-910725 for <apple-ip@cayman.com>); Tue, 30 Mar 1993 09:10:55 -0800
Received: by gw.3Com.COM id AA00595 (5.65c/IDA-1.4.4 for apple-ip@cayman.com); Tue, 30 Mar 1993 09:10:53 -0800
Date: Tue, 30 Mar 1993 09:11:00 -0800
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Robert_Jeckell@3mail.3com.com
Subject: Re: Re:The MIB+ and Zone changes
To: apple-ip@cayman.com
Message-Id: <930330.090732@3Mail.3Com.COM>
Forwarded: Message from {tom@wcc.oz.au}:ugate:3Com of 3-30-93

FYI... 

------------------------ Forwarded Message -----------------------
Date: 30 Mar 93 18:29:26 +1000 (Tue)
From: tom@wcc.oz.au (Tom Evans)
Subject: Re: Re:The MIB+ and Zone changes
-----------------------------------------------------------------
This is mail. If you find anything of interest in this, please feel
free to mail back to the group.
    
         Is there a proposal for doing zone name changing using SNMP
     which is under consideration now?  If not, does anyone have one? 
     I know that Tom Evans did a bunch of work on this back around
     December when the discussion was hot and heavy.  It seemed to me
     that the conclusion was it's not so easy, but I may be remembering
     incorrectly.

    Below is the relevant zone change text from the 12/03/92 MIB
    that was removed from the latest version:

Having done extensive "what-if" work on this and mailing to the group,
the main thing that needs to be in the MIB is a FAR MORE COMPLEX AND
CAPABLE  "onHold" variable. If it left "as-is" in the MIB is would be
useless.

Here's my proposal from last December - it gives details.

Another Zone-Name-Change Proposal
Version 0.2, dated 3-DEC-92, 11:17.

I propose, from a Router-code-writer's perspective that the "onHold
state" is a port-related function, and is best indicated by setting
the atPortState to "onHold". Most of the rest of the Router's "required
onHold behaviour" flows naturally from this point by the existing
code behaviour in Routers (so there's less we have to write/test).

I also wish to address another recently-discovered problem - that the
Macintoshes themselves can't cope with zone-name-changes unless a
particular "low memory global" is changed (and I haven't seen a
"POKE(address, value)" SNMP function in the Mac MIB yet :-).

There are a number of advantages to this method:

    1. Retains Internetwork connectivity.

    2. Easier for Router vendors to implement (the "port-restart"
       code is already written).

    3. No messing around with NBP tables required (doing this
       would be HARD and liable to cause lots of bugs and problems).

    4. Re-establishes correct zone information in the devices on
       the network being changed (gets around Apple's ZIP notify
       behaviour).

    5. Allows for network-range changing.

    6. Allows for "optimum" speed - no need for a long fixed
       hold-down period.

    7. Robust - The procedure can be backed-out (to the original
       network state) when errors are found that prevent completion.

    8. The procedure will not fail in the presence of Routers that
       can't follow the procedure (either for not implementing the
       function, or for having bugs), but reports them so the network
       manager can reconfigure them off-line.

The procedure is listed below as actions taken by the MS (Management
Station), designated "Mx", interleaved with what the Routers are doing
as a result of these actions, designated "Ry".

THREE states are added to the atPortStatus variable, designated here
as "onHoldNoAge", onHoldNoAdvertise" and "onHoldReinit".

M1. Network manager commands a zone-list change of network "N".

M2. MS searches for all Routers connected to network "N" (note 10).

M3a.    If the MS is on network "N", it records the port addresses
    for all Routers on their "N" ports, as the rest of the
    Internet is going to disappear soon (note 1).

M3b.    If the MS isn't on network "N", for every Router it finds a
    valid port-address for that router on a network OTHER than
    network "N" (as network "N" is going to disappear from the
    Internet soon) (note 11).

M4. The MS determines if all discovered Routers are SNMP-capable
    (note 12).  If any of them aren't, this is reported and the
    procedure is aborted. These Routers must be manually disabled for
    the zone-list change procedure to work.

M5. The MS attempts to record the current atportState of the ports 
    on network "N" on all Routers, and then set it to "onHoldNoAge".
    If this fails at any point (note 13), the failing Routers are
    reported, all ports are restored to their original (as recorded by
    the MS) states (usually "Routing") and the procedure is aborted.
    The purpose of state "onHoldNoAge" is so that the Routers can
    survive other Routers' ports being set to "onHoldNoAdvertise"
    without problems.

R6. Routers do not age RTMP entries that have the entry's
    outgoing-port set to onHoldNoAge (or onHoldNoAdvertise). This
    means ALL entries - not just for network "N".

M7. After all Routers' "N" ports are set to onHoldNoAge, the MS
    repeats the procedure, but sets the same ports to
    "onHoldNoAdvertise". If any problems are encountered they are
    reported, all ports are set back to "onHoldNoAge", and then
    back to their recorded initial state (usually "Routing").
    The MS sets all the determined "one-porters" to
    "onHoldNoAdvertise" first (note 11).

R8. Routers with a port set to "onHoldNoAdvertise" do the following
    things differently to "normal".

    a. They now DON'T send RTMP packets out the "onHoldNoAdvertise"
       port. This allows all devices on network "N" to time-out the
       Routers, and go back to "non-router" mode. When they
       reacquire the network number (when the Routers are enabled
       again) they will acquire the new zone list. This gets
       around the Mac's current behaviour.

    b. As in R6, Routers do not age RTMP entries that have the
       entry's outgoing-port set to onHoldNoAge (or
       onHoldNoAdvertise). This enables step "a" above to NOT cause
       internetwork connectivity to be lost.

    c. However, Routers DO age RTMP entries that are zero-hops
       and have the outgoing-port set to onHoldNoAdvertise. This
       causes the RTMP entries corresponding to the "onHold" ports
       to age out and thus "Notify-Neighbour" tuples are
       automatically generated. It also causes the tuples for this
       network (network "N") to be left out of all outgoing RTMP
       packets.

    d. They don't respond to any RTMP Request or ZIP Request
       packets for network "N" either. It may be sufficient for
       them not to respond when network "N" has aged out of the
       RTMP table (so much of this may happen with existing
       code).

M9  The MS now monitors all/some/? Routers, asking for an SNMP GET
    on network "N" (note 2), (note 4).

R10 If the "Notify Neighbour" tuples worked, network "N" will age
    out FAST, so the entries in the Routers will age-out and die. If
    they didn't, then "echoes" of "N" will come back in from other
    Routers with high (and increasing over time) hop-counts.

M11 The MS can see these "echoes". When they're all gone (plus
    enough time for another worst-case echo - 10 seconds * 15
    hops) the MS changes the port Zone configuration in all the
    Seed Routers. If network "N" won't die, then there's a problem 
    with the network and this is reported and the MS keeps going but
    without changing the zone-list information (note 5).

M12 When all SEED Routers have been changed, the MS changes all
    the non-Seed Routers ports to "onHoldReinit" (this ordering isn't
    strictly necessary - it copes better with the network-number
    change option though). Note that the non-seed Routers MUST NOT
    be the type that go into "guess-a-number" mode. When hit with
    "onHoldReinit" they MUST stay non-seed.

R14 Non-seed Routers when set to "onHoldReinit" (note 8) change
    this state to whatever they do on "normal" port initialisation
    (note 6), perform initialisation and wait for an RTMP packet
    from a Seed Router. Note that the non-seed Routers MUST NOT
    go into "guess-a-number" mode. When hit with "onHoldReinit"
    they MUST stay non-seed.

M15 The MS now changes the Seed Routers port to "onHoldReinit". It
    changes the Router that the MS is using to gain access to network
    "N" last. This latter step avoids having to wait for that
    Router to reinitialise its port so the MS can get "through" it
    to the other Routers. This is essential though if changing the
    network range for network "N".

R16 Seed Routers when they get their Port status set to
    "onHoldReinit" change this state to whatever they do on port
    initialisation (note 6) and initialise the port as on normal
    port startup.

Optionally:
M15a    The MS configures ALL Routers except for it's "nearest" one
    to "non-seed" (removes network configuration from atPort),
    sets them to "onHoldReinit" and configures its nearest one
    with a NEW Network Number. Then it sets that one to
    "onHoldReinit" (note 7).

M15b    The MS waits till that port is up and sending RTMP packets,
    and then reconfigures all required "new" Seed Routers over
    the new-and-working network.

Note 1.
There's a non-obvious problem here. After step R8, an MS on network "N"
now has a network-address that ISN'T in the RTMP tables of any of the
Routers on network "N" (or hopefully anywhere). Thus, in order for the
Routers to be able to return SNMP Responses to the MS, they must have
"special-case" code that doesn't rely on the RTMP table - in fact it
must specifically ignore the RTMP table. This should be "easy", as
Routers currently have to handle a similar situation when answering
GetNetInfo packets from "confused" Macintoshes on local networks. These
packets have to be "tagged" when received so they can be
special-case-directed back to the correct port. The same mechanism (read
necessary-kluge) will have to be pressed into service for this case too
(note 3).

Note 2.
An MS on network "N" will not be able to communicate with any Routers
apart from those on network "N". This is because the network that the
MS is on will be ageing-out of all Routers, and thus packets will not
be able to be returned to network "N".

Thus an MS on network "N" has limited information as to when network "N"
has aged out internet-wide. It can examine the Routers on network "N" to
see if they are receiving "echoes" of network "N" from other Routers,
but there's a problem. A pair of "self-deceiving" Routers "X" and "Y"
that for topological (or normal buggy code) reasons haven't obeyed the
"Notify Neighbour" tuples will only be "visible" to an MS on network "N"
while the hop-count being advertised by "X" PLUS the hops from a Router
on network "N" to Router "X" is less than 15. When this ages out to more
than 15 in the "N" Routers, it may still be "alive" in "X" and "Y". It
may be possible for the MS to observe the "speed" of the ageing, and
then "predict" when "X" and "Y" will have given up, but this is getting
risky (and hard). See note 3. It COULD bring the network up and then
inspect the ZIP tables in ALL Routers (not all - only the SNMP-capable
ones, so we can't guarantee this at all) to see if it is right, and then
try and take corrective action after the event, but (nasty, right?).

Note 3.
The problems pointed out in Notes 1 and 2 may be considered so difficult
that we arbitrarily state that the MS can't be on network "N". Comments?

Note 4.
Performing an SNMP GET operations requesting the state of network "N" is
not a particularly intensive operation, especially if it is during an
"expected" 10-minutes-or-faster-if-we-can wait for ageing. The MS
can inspect the "local" Routers, and then inspect the furthest-hop
Routers (that support SNMP). It can even inspect the non-SNMP Routers
with RTMP non-split-routing-info requests.

Note 5.
I've got this one wrong haven't I? See the "sum-of-hops-over-15"
problem in Note 2.

Note 6.
I've got the "onHoldReinit" state acting as a COMMAND to go through
"normal" reinitialisation, rather than setting the state to one of the
current ones. This is to allow the Router code to reset the port "in its
own good time" (i.e. scheduled to happen "later") rather than forcing it
to start on receipt of the SNMP packet. One interpretation of the
"correct" response to a state-change would be not to answer the SNMP SET
until the state change has been completed - this may take 10 seconds on
a LocalTalk port - it takes that long to send out the ENQs - and the
MS would have timed out by then. Using "onHoldReinit" dodges all these
problems.

Note 7.
Setting all non-local-to-the-MS-Routers to non-seed before changing
the network number on the local one artfully avoids all the problems
related to the subtle distinctions (and bugs) existing between the
"configuration view" and the "running view" of the port.

Note 8.
Yes, this looks like a good place to hook-in the old-favourite
"timed suicide" command (count down this value then reinitialise the
port) code, but you'll notice this isn't necessary with this
proposal. If anyone wants "timed suicide", then propose it separately
(note 9).

Note 9.
Let's generalise the "timed suicide" concept to a MIB structure
(a table) with three columns. The first column is WHEN (the counter).
The second column is ANOTHER MIB variable (that is, a pointer to
another MIB variable). The THIRD is what to set that variable to
when the counter goes off. You can now do anything, anytime,
including self-nuking. Add plenty of ":-)" to taste.

Note 10.
MS finds all Routers on network "N" by sending "directed broadcast"
RTMP and/or ZIP packets. This will need some work so as to cope with
various "bugs" in existing Routers that may cause problems with this
procedure.

Note 11.
There exists a class of Routers that may have only one addressable port.
Half-Routers and some "tunnel" Routers specifically. In the case where
the MS isn't on network "N" it will be unable to obtain an "alternate"
address for these "one-porters". The MS may advise of this (requiring
these devices to be manually disabled/reconfigured). The MS may also
be able to force all the one-porters to "onHoldNoAdvertise" in step M7
to get around this, and then reconfigure and set them to
"onHoldReinit" after step M15 (it is only step M15 that the MS will be
able to access these devices).

Note 12.
MS determines SNMP-capability by sending an SNMP GET to the ROuter. If
it isn't SNMP-capable, it won't answer.

Note 13.
If the Router doesn't support the required SNMP SET operations, it
will return an SNMP packet with an error indication.

----------------------------------------------------------------

Here's some code fragments indicating the modifications necessary
for implementing this proposal - for those of us who understand code
better than specifications :-).

In the following, "ifp" is an "interface pointer" (point to the port
data structure) and "r" as an RTMP table-entry pointer. The rest
should be self explanatory.

1.  Add the states "onHoldNoAge", "onHoldNoAdvertise" and
    "onHoldReinit" to atportState (feel free to suggest other names).

2.  Make the following changes in the Router's port-handling code:

    2a. If (ifp->state == onHoldReinit) {
            take the port down and then bring it up;
            if (seeding) ifp->state = routing;
            else ifp->state = unconfigured;
        }

3.  Make the following change to the RTMP send-timer function:

    for (ifp in all ports) {
        /*
         * Don't send RTMP tuples out an "onHoldNoAdvertise" port.
         */
        if (ifp->state != onHoldNoAdvertise) {
            send_rtmp_tuples(ifp)
        }
    }

4.  Make the following change to the RTMP validity-timer function:

    Change from the normal ageing code of:

        for (r in rtmpTable entries) {
            if (r->distance != 0) /* check for direct-connect */
                age_entry(r)
        }

    to the following:

        for (r in rtmpTable entries) {
            if (r->distance != 0) { /* check for direct-connect */
                /*
                 * Not direct-connect. Only age if the
                 * outgoing-port ISN'T in an onHold
                 * state.
                 */
                if (r->ifp->state != onHoldNoAge &&
                    r->ifp->state != onHoldNoAdvertise)
                    age_entry(r);
            } else {
                /*
                 * It is direct-connect. Age out if it
                 * IS in onHoldNoAdvertise state
                 * (reverse of above).
                 */
                if (r->ifp->state == onHoldNoAdvertise)
                    age_entry(r);
            }
        }


That's about it for the code-changes - all the rest should "fall out in
the wash".

----------------------------------------------------------------

Thanks to Karen Frisa for reviewing this document and suggesting
improvements.

========================
Tom Evans  tom@wcc.oz.au
Webster Computer Corp P/L, 1270 Ferntree Gully Rd Scoresby, Melbourne 3179
Victoria, Australia 61-3-764-1100  FAX ...764-1179  A.C.N. 004 818 455