some further musings

smb@research.att.com Tue, 05 December 1989 16:21 UTC

Received: from decwrl.dec.com by acetes.pa.dec.com (5.54.5/4.7.34) id AA20534; Tue, 5 Dec 89 08:21:33 PST
Received: by decwrl.dec.com; id AA18151; Tue, 5 Dec 89 08:21:29 -0800
From: smb@research.att.com
Message-Id: <8912051621.AA06557@hector.homer.nj.att.com>
Received: by hector.homer.nj.att.com id AA06557; Tue, 5 Dec 89 11:21:07 EST
To: mtudwg
Subject: some further musings
Date: Tue, 05 Dec 1989 11:21:05 -0500
>From: hector!smb

A few more thoughts occurred to me this morning:  maybe we should
re-examine our assumptions.  I'm not sure these all make sense, and
they're a bit contradictory; think of them as monkey wrenches cast upon
the waters...

First:  is it really necessary to reprobe for MTU changes, once the
initial negotiation is complete?  Route changes are comparatively
infrequent, and many -- most? -- routing changes will not affect the
MTU.  For example, if an internal router fails, any new route to my
external gateway will likely be via the same medium, i.e., Ethernet.
For that matter, for the next few years the vast majority of hosts will
be restricted by Ethernet's MTU; most other choices are rather uncommon
or too expensive for workstations (i.e., FDDI).  As long as our
long-haul links have an MTU greater than 1500, few changes in route
will affect the path MTU.

This has several implications.  First, it favors any of the types of
report-fragmentation.  Minimum-MTU requires an active discovery
process; if my hypothesis is correct, we want something that responds
to decreases in MTU, but ignores increases.  Second

A second assumption we have been operating under is that the sender
should initiate the MTU discovery process.  That's true if the 1063
option is used; it's not true with report-fragmentation.  Suppose that
receivers always generated a fragmentation ICMP message, though only
for the first few offenses per connection.  If the sender understands
the ICMP message, it will adjust its behavior accordingly; if not, the
message is mostly harmless except for the minor amount of extra
traffic.  I would define ``first few'' as some small integer (on the
order of 10), or twice the round-trip time -- the receiver should have
a decent idea of it, if only from the initial SYN-ACK handshake if TCP
is used.  If we make the report an IP option instead of a separate
message -- tagged onto an unsolicited ICMP ECHO REPLY or something
else equally harmless -- we could also resend the report any time we
retransmitted an ACK.  (That violates layering a bit; the concept may
need clearer definition to make it work for NFS and the like.)

Finally, if Jeff Mogul's recent proposal is considered useful despite
my nitpicking objections, we can apply his observation to report-
fragmentation:  the last hop router knows the final MTU, so any router
that fragments a packet should send back the ICMP message to the host.
That gives us the advantage of putting the mechanism into the
faster-evolving router population, and getting feedback to the host
more quickly (i.e., from closer to the source).

The obvious objection here is that the router has no knowledge of
connections, and hence wouldn't know when to stop sending these
messages to an uncooperative host.  I'm not convinced that that holds
up.  Conforming hosts today shouldn't be sending packets that need
fragmentation as long as every hop accepts 576 or larger.  This implies
that most jumbograms are from hosts that are trying to do MTU
discovery, in which case they'd understand the ICMP message.  A
comparatively small cache could be kept of source-destination pairs
that had been sent such messages recently.  Since this cache is soft
state, it doesn't matter too much if it's lost on a reboot, unless the
volume of extra ICMP messages gets to be very large.  Does anyone have
any current statistics on the frequency of non-local fragmentation?

Routers that connect to tinygram networks might need a variant on this
strategy, of course; their cache might be too large.

We could further embellish this scheme by using the cache when routes
change.  If the new route to a destination lowers the MTU, flush the
cache for that destination -- you want to send new options.  If it
raises the MTU, flag the cache entries; when any packet flows through
for a source-destination pair that's in the cache -- and hence for
which the sencder has previously been advised of the proper MTU -- send
it a new report to raise the MTU.  This last may be overkill, of
course...


		--Steve Bellovin