What's your favorite MTU?

fab%saturn.ACC.COM@salt.acc.com (Fred Bohle acc_gnsc) Fri, 13 April 1990 22:16 UTC

Received: from decpa.pa.dec.com by acetes.pa.dec.com (5.54.5/4.7.34) id AA08176; Fri, 13 Apr 90 15:16:21 PDT
Received: by decpa.pa.dec.com; id AA09940; Fri, 13 Apr 90 15:16:14 -0700
Received: from SATURN.ACC.COM by salt.acc.com (5.61/1.34) id AA17408; Fri, 13 Apr 90 15:15:08 -0700
Received: by saturn.acc.com (5.51/1.28) id AA02188; Fri, 13 Apr 90 18:15:54 EST
Date: Fri, 13 Apr 90 18:15:54 EST
From: fab%saturn.ACC.COM@salt.acc.com (Fred Bohle acc_gnsc)
Message-Id: <9004132315.AA02188@saturn.acc.com>
To: mtudwg
Subject: What's your favorite MTU?

After reviewing your notes I have the following comments:

1. How about a simple binary search to determine the MTU size?
Looking at the list of real MTU sizes, they are very close to
a fit to a simple halving to come up with an MTU size which
will not fragment, e.g.:
	64000 -> 32000 -> 16000 -> 8000 -> 4000 -> 2000 ->1000
	-> 500 -> 250 -> 125 -> 62

This converges in the above sequence in 10 steps for the worst
case, Hyperchannel to an undefined minimum MTU network.  The
starting point would not be 64000 every time, but the number
less than the LOCAL network MTU value.
The values along the way are not too bad except for FDDI and
Ethernet.  This is solved in the following points.

2. One objection to a simple convergence on a value for minimum
MTU was the lack of notification when a larger MTU became
available (due to a gateway coming back on line somewhere).
The discussion on this list has lost track of this line of thinking.
I suggest "probing" for a larger MTU after some amount of time/data
has passed.  In TCP a measure based on round trip times or
some number of windows seems reasonable.  Probably multiples
of those, like maybe 100 RTT's or 10 windows. (Suggestions

3. Probing would continue the binary search, only in an upward direction,
remembering the last value which failed, and the last value which
worked without fragmentation.  Average them to get the size for
the probe packet.  We could do something tricky here
to avoid holding up the data transfer for too long.  Send the
probe packet with "Don't Fragment" set, and if we get an ICMP
Can't Fragment message, retransmit with DF turned off. 

4. Continuing the binary search would converge on Ethernet numbers:
	1000 -> 1500	( which works for IP-E)
	1500 -> 1250	( if IP-IEEE 802.3, until the next probe)

Converging on FDDI takes longer:
	4000 -> 6000	( which fragments)
	6000 -> 5000	( which also fragments)
	5000 -> 4500	( which also fragments)
	4500 -> 4250	( which works until the next probe)

5. Doing all this injects one extra packet every 10 (or whatever)
windows.  With the suggested numbers, each pleteau is visited in short
order.  Some bandwidth is unused until a probe sequence finds it.
With binary searching, a value which does not fragment is found
in typically 3 RTT's, maximum 10 RTT's.

6. When a gateway comes back up,  the probe sequence will discover
some of the unused bandwidth.  To recover it all, we might try
the last number which fragmented again, and resume our binary search
if it still fragments.  If it does not fragment, increase it some
more (suggest percentages here), maybe MTU * 1.25?, MTU * 1.50,
or even MTU * 2?

Well, it has been a long day, and I can't think any more.  Let
me know where we take this idea from here.  I still think we are
on the right track with the DF bit, since it does not need a new
bit in the IP header.  Just having the new format of the ICMP
message seems to do it.


Fred Bohle			EMAIL: fab@saturn.acc.com
Interlink Computer Sciences	AT&T : 301-290-8100 
10220 Old Columbia Road
Columbia, MD 21046