EARP proposal

bagnall_d@apollo.com Fri, 30 November 1990 15:51 UTC

Received: from MERIT.EDU by NRI.NRI.Reston.VA.US id aa06698; 30 Nov 90 10:51 EST
Received: Fri, 30 Nov 90 10:48:35 EST from APOLLO.COM by merit.edu (5.59/1.6)
Received: from xuucp.ch.apollo.hp.com by amway id <AA04899@amway> Fri, 30 Nov 90 10:48:23 EST
Received: by xuucp.ch.apollo.com id <AA01782@xuucp.ch.apollo.com>; Fri, 30 Nov 90 09:31:33 EST
Message-Id: <9011301531.AA01782@xuucp.ch.apollo.com>
Received: by daphne.ch.apollo.hp.com id AA02469; Fri, 30 Nov 90 10:38:50 EST
From: bagnall_d@apollo.com
Date: Fri, 30 Nov 90 03:54:23 EST
Subject: EARP proposal
To: fddi@merit.edu
Status: O

    I am submitting the following document for review before the meeting in Boulder.  Two
 of the authors, myself and Caralyn Brown, will be at the meeting to answer any questions.

                                  --Doug Bagnall

 ----------------------------------- Cut here -------------------------------------------

FDDI Working Group                                           D. Bagnall
Preliminary Document                                           C. Brown
                                                                D. Hunt
                                                           M. J. Strohl
                                                          November 1990

                 Extended Address Resolution Protocol


   The purpose of this document is to offer for review a new elective
   standard.  The distribution of this document is unlimited.


   The following memo proposes a new form of the address resolution
   protocol for use over Local Area Networks (LANs) where a one-to-many
   mapping of Internet Protocol (IP) [4] to link layer addresses is
   desirable.  The one-to-many mapping may be implicit in the
   relationship between IP and an underlying, multi-rail link layer, or
   may be useful in providing more reliable delivery in the face of
   congestion or for improving error recovery procedures.  The new
   protocol is not meant to supersede that specified in RFC 826 [3] but
   to supplement it.


   The April 1988 memo of J. Lekashman [2] first explored the
   advantages of providing multiple link layer paths between pairs of
   IP addresses.  The seminal idea of writing a new protocol, however,
   came about in discussions in which Carol Iturralde, now at Digital
   Equipment Corporation, played a major role.  Comments from many
   others have also been incorporated within the document including
   especially those from Vernon Schryver of Silicon Graphics, Dave
   Katz of Merit, and J. Noel Chiappa.


   Network services are no longer just another feature tacked on to an
   already complete operating system.  With the advent of distributed
   file systems and network computational servers, the network has
   moved from the periphery of the operating system to become one of
   its core services.  In the traditional IP networking model, a host
   computer has one IP address assigned to every physical device.
   When a device fails, its associated IP address becomes unreachable,
   and all reliable transport connections channeled through it die.

                                                               [Page 1]

Preliminary                     EARP                      November 1990

   This one-to-one mapping of IP to link layer or hardware addresses is
   simple to understand and hosts using it are easy to build, but, as
   reliable network services have become more critical, the limitations
   of the mapping have become more intolerable.

   With a one-to-many mapping of IP to hardware addresses, the offered
   resources of a network server are no longer held hostage to the
   state of a single network device.  Instead, several network devices
   attached to a single LAN can function as a single, logical link
   layer service for IP and its attendant upper layer protocols.
   Transport Control Protocol (TCP) connections, in particular, are not
   tied to either a single transmitting device on the local host or to
   a single receiving device on the remote host.  As long as there is
   at least one functioning link path between sender and receiver, the
   TCP connection can continue to transfer data.


   Since not every Internet host will support the Extended Address
   Resolution Protocol (EARP), the new protocol will be used in
   conjunction with standard ARP.  (The term host here refers to either
   an Internet host or to a gateway when the gateway uses that portion
   of a host's functionality devoted to the establishment of link layer
   paths between stations on a common LAN.)  Hosts which implement EARP
   will prefer to use that protocol when possible but will be able to
   both send and receive standard ARP packets when their messages would
   not otherwise be understood.  The purpose of using either ARP or
   EARP is to yield a mapping between a remote IP address and one or
   more link layer addresses.  EARP will simply provide more complete
   information than ARP, thus allowing a host to make a better decision
   as to how to direct a frame to its remote peer.

   When a host has determined that a data packet is to be transmitted
   to another host on the same LAN, it looks in a table to find a
   mapping between the packet's destination IP address and a link layer
   or hardware address.  With standard ARP, the table will list one IP
   address and one hardware address.  With EARP, the table will list
   one IP address and possibly several hardware addresses.  When it
   finds a single address, the transmitting host uses that address as
   the destination address of the unicast frame which encapsulates the
   packet to be sent to its IP peer at the remote host.  If, however,
   the sending host finds several hardware addresses in its table, it
   must choose one to specify for the current frame.  If the host, for
   instance, were to want to distribute the packet load among the
   several network devices at the receiving host, it might choose
   destination hardware addresses from its table in round-robin order,
   always selecting the least recently used address for the current

                                                               [Page 2]

Preliminary                     EARP                      November 1990

   A busy Internet server may become congested with received packets
   on one network device while another device is idle.  So that a host
   can assume some control over the distribution of incoming traffic,
   EARP includes a special ranking field with each source hardware
   address in a request or response packet.  The ranking field allows
   the host at a minimum to designate a primary interface to its remote
   peer;  at its most complex, the field allows the host to specify a
   hierarchy among its interfaces.  The intention is that a host with
   a busy server can balance the input packet load from many clients
   by assigning each a different primary interface.  The other
   hardware addresses at the server host would of course still be
   available for backup.

   Some LANs can be conveniently divided into several distinct rails or
   link layer paths with each rail a separate physical conduit for
   transmitting frames.  Individual hosts at network initialization
   attach separately to each of the rails with a distinct link layer
   address assigned to each attachment.  Frames transmitted on any
   given rail must use the correct destination address for the target
   host on that rail.  EARP supports the concept of a link layer path
   by associating with each source link layer address in an EARP packet
   a path number.  Path numbers are indicated with a number of from 0
   to the total number of paths less one, and they are stored in the
   address resolution table along with their associated addresses.

   To give a concrete example, the Fiber Distributed Data Interface
   (FDDI) uses two separate rings in normal operation, each of which
   can be viewed as a separate rail.  On each of these rings, a given
   Internet host will have one and only one hardware address.  If both
   of the rings are included in one IP subnet, an EARP host A will
   probably have an entry in its ARP table for any other host B which
   includes that host's IP address and both of its link layer addresses
   (assuming, of course, that host B has two addresses).  The table
   entry on host A for host B will thus logically be,

   <IP address of B><Link address B0><Ring 0><Link address B1><Ring 1>

   When sending a packet to the remote host, the local host will choose
   a ring or rail and use the hardware address for that rail as the
   destination address in the frame.

   As another example, say Ethernet host B has two network boards on
   the same Ethernet segment which share one IP address.  Here the
   concept of separate rails does not hold.  Each of host B's addresses
   is just as valid a destination address in any frame as any other of
   its addresses.  This host can thus be represented in the ARP table
   of host A, as,

                                                               [Page 3]

Preliminary                     EARP                      November 1990

   <IP address of B> <Link address B0> <Link address B1>

   When host A wishes to send a packet to host B, it can choose either
   link address B0 or address B1 as the destination address in the
   Ethernet frame it transmits on the segment.


   To establish the one-to-many mapping of IP to link addresses, an
   EARP host must send EARP request packets demonstrating its desire
   to receive EARP response packets in return.  If the underlying
   LAN is based on a structure of multiple rails, then one EARP message
   is sent on each rail with the local host's link layer address for
   that rail as the single source address.  The host will in turn
   expect a single response on each rail with the target host's link
   layer address on that rail as the source address in the response

   On LANs which do not support the concept of multiple rails, a
   single EARP request packet is sent with a listing of the host's m
   link layer addresses as the source hardware addresses.  The
   requesting host will expect in return a response packet with a list
   of the n link layer addresses at the target host as the source
   hardware addresses.  Otherwise, the request and response packets
   will look very similar to those in standard ARP.

   To return to the above examples, on an FDDI LAN made up of two
   rings using a single IP subnet, an EARP host will send a request
   message on ring 0 with its ring 0 address as the source address and
   with a path encoding of 0.  The single, empty target address will be
   sent without a path indication.  On ring 1, the host will send a
   packet with its ring 1 address and a path of 1.  On ring 0, the
   sending host will expect to receive a response packet with its own
   ring 0 address as the target and with the ring 0 address of the
   remote host as the source address.  The response on ring 1 will be

   For an Ethernet EARP host, the request packet will include m source
   addresses and a single, empty target address.  The response will
   include the remote host's n Ethernet addresses as the source and the
   first of the m addresses of the original host as the target.  One
   Ethernet address in either the request or the response packet may be
   designated as the primary link layer address for the sending host.
   Thus, host A when it sends a request EARP message to host B may
   designate one Ethernet address as the one B should use to send it
   data packets.  B, in its turn, may in the response EARP message
   specify one of its own Ethernet addresses as the primary address for

                                                               [Page 4]

Preliminary                     EARP                      November 1990

   host A.  This mechanism will allow busy servers to suggest how their
   clients could receive better and faster service.


   An EARP request is essentially a standard ARP request with some
   additional information specifying the number of link layer addresses
   associated with the source protocol address and, optionally, the
   path number and ranking associated with each link address.  The path
   number is specified if the underlying link layer is made up of
   several rails.  On such a LAN, the path number ensures that the
   packet has been received on the appropriate rail by the target host.
   On LANs which support only a single rail, the path number is decimal
   255.  In addition to the path indication, there is a field for each
   source address specifying the ranking of hardware addresses.  The
   field is probably most useful if only one of the source hardware
   addresses is given a rank, but the designation of a ranking
   hierarchy is permissible.  Ranks are assigned from a high of 0 to a
   low of 254.  A value of 255 indicates that no rank has been

   EARP is essentially a simple request/response protocol just as is
   standard ARP;  the opcode field of the packet indicates whether
   the sender is requesting an IP to link layer address mapping or is
   returning one.  The same field in an EARP packet, however, also
   indicates the mode of the request or response.  Two different modes
   are possible.  In normal mode, a request or response packet
   contains a complete and valid address mapping for the sender.  In
   advisory mode, the packet contains a subset of the normally
   complete address mapping of the sender.  The best method for
   demonstrating the difference is again with two examples.

   On an FDDI LAN of two rings sharing a single IP subnet, a host
   wishing to discover the IP address mapping for a remote host would
   want to send a separate EARP request packet on each of its two rings
   or logical rails.  If both of its ring attachments were operational,
   both request packets would be sent in normal mode.  If, however, one
   of the two ring attachments of the sender were disabled, then it
   would send a single EARP request packet in advisory mode out over
   the still functioning ring attachment.  The receiving host would
   then understand both that the sender is normally capable of sending
   on both rings and that it cannot do so now.  An obvious corollary
   is that hosts which normally can send out on only a single ring
   will never send an EARP packet in advisory mode.  Since these hosts
   never have more than one path on which to send a packet, they can
   never send a warning on one path that their other path is non-

                                                               [Page 5]

Preliminary                     EARP                      November 1990

   On Ethernet, a host wishing to establish an address mapping for a
   remote host would send a single EARP request packet in advisory
   mode when one of its several network devices were down.  Only the
   addresses of those devices currently capable of transmitting and
   receiving would be included.  If all of its network devices were
   functional, the host would send out a single packet in normal mode.

   The general packet format is thus,

   16 bits   Protocol version number
   16 bits   Hardware type code
   16 bits   Protocol type code
    8 bits   Number of octets in each hardware address (j)
    8 bits   Number of octets in each protocol address (k)
   16 bits   Opcode
    k octets Protocol address of sender
   16 bits   Count of the sender's link layer addresses which follow

      For each link layer address associated with the given source IP
      address on a given path,
             j octets  Hardware address of sender
             8 bits    Corresponding path number or 255 decimal
             8 bits    A rank from 0 to 254 or 255 for no ranking.

   k octets  Protocol address of target
   j octets  Hardware address of target.  For request packets, all
             zeros.  For response packets, the first of the source
             hardware addresses in the original request packet.

   The protocol version number allows for later enhancements of the
   protocol.  Its current value is 1.

   The hardware and protocol type fields are encoded exactly as they
   are in standard ARP with the single exception that single subnet IP
   FDDI LANs use hardware type code 256.  See RFC-1010 [6] for the
   other hardware type codes.

   The number of octets in a hardware address depends on the type of
   device.  For FDDI or Ethernet, the value is 6.  The number of octets
   in an IP address is 4.

   The opcodes are
             1   Request in normal mode
             2   Response in normal mode
             3   Request in advisory mode
             4   Response in advisory mode

   For IP, the protocol address of the sender is its four octet

                                                               [Page 6]

Preliminary                     EARP                      November 1990

   Internet address.

   The count field is used to indicate how many <link layer address>
   <path number><ranking> triplets are included for the sender.  The
   first triplet will correspond to the interface that is sending this
   message.  If a path is specified, then the count should be 1.
   Otherwise, the count can be any positive integer value.

   To return to the examples, the EARP host on the FDDI LAN would send
   the following packet on ring 0 to specify normal mode,

          1   Protocol version number
        256   Hardware type code
       2048   Protocol type code
          6   Number of octets in each hardware address
          4   Number of octets in each protocol address
          1   Opcode
   4 octets   IP address for the single subnet including rings 0 and 1
          1   Count of the sender's link layer addresses which follow
   6 octets   Hardware address of the host on ring 0
          0   Path number
        255   No ranking specified
   4 octets   IP address of target
   6 octets   6 zeroed octets for the target hardware address

   The EARP request packet on ring 1 would differ only in the single
   hardware address included and in the path number.  Neither ring
   address would be given a rank if both paths were to be considered
   as of equal weight.

   For the above Ethernet example, the request packet might look as

          1   Protocol version number
          1   Hardware type code
       2048   Protocol type code
          6   Number of octets in each hardware address
          4   Number of octets in each protocol address
          1   Opcode
   4 octets   IP address for this Ethernet interface
          2   Count of the sender's link layer addresses which follow
   6 octets   First hardware address
        255   No path number specified
          0   This is the primary  hardware address
   6 octets   Second hardware address
        255   No path number specified
        255   No ranking designated
   4 octets   IP address of target

                                                               [Page 7]

Preliminary                     EARP                      November 1990

   6 octets   6 zeroed octets for the target hardware address

   Three assumptions about the underlying link layer services are made
   in the final format of an EARP packet.  First, it is assumed that
   the underlying link or physical layer includes some sort of data
   integrity check.  For FDDI and Ethernet, the frame check sequence
   guarantees that the EARP frame has not been corrupted in transit.
   Second, the underlying medium must support broadcast frames.  All
   request EARP packets are, in fact, sent to the broadcast address.
   Response packets are encapsulated in unicast frames.

   The third assumption is that the structure of the physical frame
   or link layer encapsulation includes a two-octet type field which
   can be used to de-multiplex received frames.  On Ethernet, the
   type field is part of the frame.  For FDDI and IEEE 802.X LANs,
   the type is part of the SNAP header included in the link layer
   header.  See RFC-1103 [1] and RFC-1042 [5] for more information.
   The two-octet Ethertype value for EARP is TBD.


   If an EARP host receives an ARP request packet in which its IP
   address is the target, it should return a standard ARP response with
   the address of one of its network devices as the single hardware
   source address.  If it receives an EARP request directed to it,
   then an EARP response should be returned.

   There are two exceptions to this rule.  EARP hosts sometimes
   broadcast EARP request packets in order to warn other EARP hosts
   that the status of one or more of its network devices has changed;
   a device may have either just failed or just come back on-line.
   The target IP address in such a request is the same as the source,
   and the mode is advisory in the case of a failing device and normal
   in the case of a recovering device.  The reason for sending a request
   with a host's own IP address as the target is that no other host
   will then try to respond.  The sending host will, of course, just
   drop the packet when it is received.

   In the original examples, when one of an FDDI station's two ring
   attachments is no longer available, it sends an advisory request
   packet out of its still functioning attachment with that
   attachment's ring address to inform other EARP hosts that their
   table mappings are no longer valid.  When one or more of the
   Ethernet host's network devices fails, it should send out an EARP
   advisory request listing its still available hardware addresses.
   This action alerts other clients for which the now unavailable
   devices were designated as preferred that the devices are now
   inoperative.  When either the FDDI ring attachment or the failed

                                                               [Page 8]

Preliminary                     EARP                      November 1990

   Ethernet device comes back on-line, normal request packets should
   be sent, one on each ring for FDDI and one including all active
   devices for Ethernet.  Again, the source and target IP addresses
   should be that of the local interface and the frame destination
   address should be the broadcast address.

   Extreme caution should be used in sending out packets to inform
   other hosts of a change in network interface status.  If a device
   is cycling between on and off-line states, the effect on the LAN
   can be disastrous.  It is always best to be conservative when
   transmitting broadcast frames, but since EARP packets tend to be
   more complex to parse than ordinary ARP packets, a conservative
   transmission policy is even more than usually warranted.  One
   method of guaranteeing a conservative approach is to use a deadman
   timer.  When one of its network devices fails, a host should set the
   timer rather than sending out an advisory request packet.  If when
   the timer expires the device is still down, then the host has no
   choice but to send the packet.

   One other remark before the exposition of the input processing
   algorithm.  Although the RFC 826 specifies that the response
   message should include the original source IP address as the target
   protocol address in the response message, it would be better to
   use instead the IP address of the receiving interface.  In this
   way the receiving host can guarantee that the original requestor
   receives a correct IP mapping in return.

   The algorithm is then,

   Merge_flag = false.
   If an entry for this IP address already exists, then
      If this is an EARP request, then
         If the host is not marked in the table as an EARP host, then
            Change the entry to show that this is an EARP host.
         If this is an advisory request message, then
            Invalidate all previous hardware addresses, paths, and
            address ranking indications.
         Update the table with the new hardware address(es).
         If a path number has been included, then
            Record the path number with the address in the table.
         Merge_flag = true
         If the host is not already marked as understanding EARP, then
            Update the table entry with the new hardware address.
         Merge_flag = true
   If the target IP address is mine, then
      If the source IP address is also mine, then
         Stop.  My own packets should be ignored.

                                                               [Page 9]

Preliminary                     EARP                      November 1990

         If this is an EARP request, then
            If Merge_flag is not set to true, then
               Add an EARP entry to the table.
            If a path number has been included, then
               Record the path number with the address in the table.
            If a ranking has been specified, then
               Record the rank with the address in the table.
            Format and send an EARP response.
            If Merge_flag is not set to true, then
               Add a standard ARP entry.
            Format and send a standard ARP response.

   It should be noted in the algorithm that the indication of an
   address ranking is recorded only when taken from a packet addressed
   explicitly to the local host.  In this way, a large server host can
   specify different preferred addresses to different hosts.


   A host which understands only standard ARP needs to distinguish
   between only two types of remote host.  One type will never respond
   to an IP address mapping request because the host is either off-line
   or non-existent.  The other type of host can respond to an ARP
   request but did not respond to the last request because it never
   received the packet, because it had to drop the request due to
   internal congestion, or because the response was lost in the network.
   EARP hosts, however, also have to distinguish a third type of remote
   host, one which drops EARP request packets because it does not
   understand the new protocol.

   The most important distinction to be made is that between hosts
   which can participate in an EARP dialogue but did not respond to
   the last EARP request packet and those hosts which cannot
   participate at all.  Both types of host will respond to an EARP
   request in the same way, with silence, but with non-participatory
   hosts that silence will be persistent.  In most situations an EARP
   host could feel reasonably certain that it would never receive an
   EARP response from a remote host if after sending two EARP request
   packets and after waiting twice for a response it still had not
   received a packet.  At that point, the host could try initiating a
   standard ARP dialogue.

   EARP hosts, however, will have to work in an environment in which
   many other hosts will never learn to understand the new protocol,
   and any time spent in trying to intiate an EARP dialogue with one of
   them will be wasted.  Since the time is wasted, it is better kept to

                                                              [Page 10]

Preliminary                     EARP                      November 1990

   a minimum.  EARP hosts should therefore send an EARP request only
   once, and when the response timer expires, they should immediately
   switch to an ARP dialogue.

   Assuming then that either the IP layer has requested that a packet
   be sent to a remote host for which no entry exists in the address
   mapping table, or that the response timer for an incomplete entry
   in the table has expired, an EARP host should act as follows.

   If no request packets have yet been sent to the remote host, then
      Send the EARP request to the remote host.
      Set a flag in the entry that an EARP request has been sent.
      Start the response timer.
      If a standard ARP request packet has not yet been sent, then
         Send an ARP request to the remote host.
         Set a flag in the entry that an ARP request has been sent.
         Re-start the response timer.
         Delete address resolution table entry for the remote host.
         Delete the timer for the entry.
         If the trigger was an IP request and not a timer event, then
            Return an error to IP.

   The use of a new timer for timing out EARP and ARP responses is not
   absolutely necessary.  Transport layer re-transmission timers may
   serve just as well if the IP and link layer interfaces are modified
   to allow the transport layer protocol to flag whenever it is
   re-transmitting.  A request for a packet to be re-transmitted
   would signal the need to request an address mapping again, while a
   standard request to send a packet would not.

   The design of the output algorithm for EARP includes the assumption
   that sometimes an EARP host will be incorrectly designated as a
   standard ARP host in a requesting host's address resolution table.
   The assumption must be made because no network is 100% reliable
   as some frames will inevitably be lost and since transmission delays
   due to congestion can temporarily exceed the limits of even the most
   scrupulously designed response timers.  When an EARP host fails to
   respond to an EARP request but does respond to a standard ARP
   request, then the advantages of link layer path multiplexing and
   demultiplexing will be lost in any communication between the
   requesting and the responding host.  Communication, however, will
   still be possible, and it is quite probable that the error will be
   detected since every EARP host hears every other EARP host's address
   mapping requests.

                                                              [Page 11]

Preliminary                     EARP                      November 1990


   Communication between hosts which view the IP to link layer address
   mapping as one to many is more reliable although more complex than
   that between hosts which are limited to a simple one to one
   mapping.  In the future, Internet hosts may be known on all their
   several link layer interfaces by a single IP address, but until
   that time, EARP provides support for a better interconnection model
   than that supported by standard ARP.  Since the concepts of path
   indication and address ranking are general, new conventions can be
   instituted for the interpretation of the fields on new LANs and
   older conventions can be dropped as current LANs evolve.  The
   re-interpretation of these two fields will allow for the
   accomodation within EARP of new LANs with very different structures.


   [1]  Katz, D., "A Proposed Standard for the Transmission of IP
        Datagrams over FDDI Networks", RFC-1103, Merit/NSFNET,
        April, 1990.

   [2]  Lekashman, J., "Multi-Homed Hosts in an IP Network", NASA Ames
        GE, April, 1988.

   [3]  Plummer, David C., "An Ethernet Address Resolution Protocol",
        RFC-826, MIT, November, 1982.

   [4]  Postel, J., "Internet Protocol", RFC-791, USC/Information
        Sciences Institute, September 1981.

   [5]  Postel, J., and Reynolds, J., "A Standard for the Transmission
        of IP Datagrams over IEEE 802 Networks", RFC-1042,
        USC/Information Sciences Institute, February, 1988.

   [6]  Reynolds, J.K., and J.  Postel, "Assigned Numbers", RFC-1010,
        USC/Information Sciences Institute, May 1987.


   Douglas Bagnall                                       Caralyn Brown
   HP/Apollo Computer                                   Prime Computer
   300 Apollo Drive                           500 Old Connecticut Path
   Chelmsford, MA  01824                          Framingham, MA 01701
   Phone:  (508) 256-6600 x4414           Phone:  (508) 620-2800 x4237
   Email:  bagnall_d@apollo.hp.com        Email:  cbrown@enr.prime.com

                                                              [Page 12]

Preliminary                     EARP                      November 1990

   Mary Jane Strohl                                       Douglas Hunt
   HP/Apollo Computer                                   Prime Computer
   300 Apollo Drive                           500 Old Connecticut Path
   Chelmsford, MA  01824                          Framingham, MA 01701
   Phone:  (508) 256-6600 x4421           Phone:  (508) 620-2800 x????
   Email:  strohl@apollo.hp.com           Email:   dhunt@enr.prime.com