Re: Another proposal

mogul (Jeffrey Mogul) Sat, 02 December 1989 00:26 UTC

Received: by acetes.pa.dec.com (5.54.5/4.7.34) id AA09939; Fri, 1 Dec 89 16:26:51 PST
From: mogul (Jeffrey Mogul)
Message-Id: <8912020026.AA09939@acetes.pa.dec.com>
Date: 1 Dec 1989 1626-PST (Friday)
To: smb@research.att.com
Cc: mtudwg
Subject: Re: Another proposal
In-Reply-To: Msg from smb@research.att.com dated Thu, 30 Nov 89 22:22:55 EST. <8912010322.AA24044@hector.homer.nj.att.com>

    On another tack, Craig Partridge remarked to me that the IAB is
    assuming that while hosts do indeed change slowly, routers change
    fairly rapidly.  This would imply that solutions that reply on
    proper behavior by the end-host -- the IP bit scheme, or my variant
    that uses an IP option instead -- will be much less effective for
    a long time.  A 1063-like approach, while a bit slower to ramp up,
    will provide an excellent approximation to the desired universality
    much more quickly.

This observation (routers change faster than hosts) crystalizes something
I have been thinking about for a few days.  Let me launch another proposal
(just before I leave town for a week) and maybe the rest of you can fight
it out while I'm gone.

Let's also assume that we cannot use the reserved bit in the IP header.
So we are stuck using an IP option (sorry, Rich Fox: I do not think
a transport-level solution is right).  So the gateways are stuck having
to parse the options (but let's make sure that this happens infrequently)
so there isn't much more cost in making the gateways actually process
the options.

Other observations: it would be nice to be able to get useful MTU 
information back to the sender even if the receiving host doesn't understand
the new option.  It would also be nice, as I have stated before, if
the information could be exchanged before any large packets need to
be sent.

One last observation: if we assume that the "multi-MTU" subnet problem
created by translucent FDDI-Ethernet bridges can be solved at the
data-link level, then the last-hop router has all the information necessary
to compute the path MTU.  The participation of the receiving host is
not strictly necessary.

Here's the proposal: As in RFC 1063, the sender attaches an MTU Request
option to some of its outgoing datagrams.  The option is updated by
routers along the way.  When the last-hop router is reached (i.e.,
a router which can deliver the packet directly to the destination
host), that router now knows the path MTU.

At this point, my new proposal (1063a) diverges from RFC1063.  The
last-hop router now sends an ICMP "MTU Reply" (new type) message back
to the source host as it forwards the packet on to the destination.
(If the gateway is congested, the ICMP packet should of course
be dropped instead of the original packet).  The source host receives
the ICMP message and updates its path MTU entry (the ICMP message
provides enough of the original IP header to do this).

Obvious objection: we are now injecting extra packets into the
internet.  True, but if we keep the load relatively low we will
save more from avoiding fragmentation than we lose from these
ICMP messages.  So the problem is "how does the sender decide when
to send the MTU Request Option if the ICMP replies are being dropped?

[Here my proposal starts getting a little fuzzy.  Suggestions welcome.]

Presumably, we don't want to send it more often than once per RTT.
That means that we should not send a second MTU Request Option until
(1) we are asked to retransmit that datagram by the transport level
(which presumably is tracking the RTT) or (2) we receive a packet
from the destination host which is in reply to our original packet.
I.e., we should use our existing RTT information in order to meter
out the MTU Request Options.

Next, how often should we send the MTU Request Option if the last-hop
gateway doesn't understand it and simply isn't replying?  Easy
approach: once per RTT.  It's mostly our own ox being gored by the
cost of the excess option-processing.  Harder approach:  the last-hop
gateway should normally change the option to indicate to the
destination host that the ICMP has been sent.  If the option is not so
marked, the destination host could send the ICMP itself.  (I think this
is too finicky.  We can afford a few extra option transmissions.)
Mostly, we should give up trying to elicit a response after a few
RTTs have gone by (if we have a connection running with the destination
and we aren't getting the ICMPs, the gateway is probably not sending
them!)

Next, how do we detect changes in the path?  One thing to do is
to retransmit the MTU Request Option (if we think we will get an
answer) once every N RTTs or M minutes (i.e., something approaching
the natural frequency of the routing protocols).  This should not
be a serious load.

Another trick is to modify the IP receivers so that if a destination host
receives an MTU Request Option, it interprets this as "Report Fragmentation".
As long as there is good reason to believe that the source host software
has not been downgraded (say, for the next day or so) it should be
safe to send ICMP Fragment Received messages back to the source host,
at which time the source should (1) update its MTU estimate from the
ICMP and (2) retransmit the MTU Request Option.

One more possibility: although we really want to trace the path
in both directions (since the routes may be asymmetric), when the
destination host receives an MTU Request Option it could use the
incoming-path MTU as an estimate until it manages to get a reply
to its own MTU Request.  (It certainly shouldn't send anything
larger than this estimate.)

Note that this new proposal, even with the destination hosts interpreting
the MTU Request Options, is an awful lot simpler than RFC1063 because
(once we have conceded the need to send a few extra ICMP packets) there
is none of the nasty "where to put the reply option" mechanism.
Also, note again that unlike RFC1063, this mechanism does not require
ubiquitous implementation in end hosts before it begins to be useful.
(Note also that Steve Deering's proposal could increase the incidence
of fragmentation, rather than decrease it, before it is ubiquitously
implemented, since it involves "tempting fate".)

Enough for now.  See you in 10 days.

-Jeff