Modifications to Steve Deering's "DF" scheme

mogul (Jeffrey Mogul) Mon, 05 March 1990 21:30 UTC

Received: by acetes.pa.dec.com (5.54.5/4.7.34) id AA16912; Mon, 5 Mar 90 13:30:22 PST
From: mogul (Jeffrey Mogul)
Message-Id: <9003052130.AA16912@acetes.pa.dec.com>
Date: 5 Mar 1990 1330-PST (Monday)
To: mtudwg
Cc:
Subject: Modifications to Steve Deering's "DF" scheme

I'm becoming more and more attracted to this approach, since
(besides the technical points in its favor) it avoids most of
issues on the "ACTION ITEMS" part of the minutes of the last
meeting.

One point against the "Pure DF" scheme (i.e., the one that
Steve has described) is that because the too-big datagrams are
dropped, they represent "wasted effort" and also result in
anomalously high round-trip times.  This is especially unfortunate
when, during a long-lived connection, the sender chooses to
retry sending a large datagram.  Since it is most likely to still
be too big, it will probably be dropped, and the connection will
stall for a while.

I propose to fix this by using a "spare" IP Header bit (if the Proteon
gateways don't drop packets with unrecognized TOS bits, that's probably
where to put it.  If they do drop these packets, this proposal is moot
for the time being.)

If this bit is clear, then the DF bit carries its normal meaning.
Current gateways return a Can't Fragment message with a 0 in the
"MTU field"; upgraded gateways return Can't Fragment with the MTU
in that field.

If the bit is set, it changes the meaning of the DF bit from "Don't
Fragment" to "Denote Fragment" (well, that's the best I can do
without opening my thesaurus).  In this case, the upgraded gateways
DO fragment and forward the packet, but also return an ICMP Destination
Unreachable/Fragment Report message, which looks exactly like Steve's
modified Can't Fragment message except that it carries a different
code.  The purpose of the new code is to allow the sending host to
realize that it does NOT have to retransmit the segment in question
(at least, not until the normal timeout) but it should not send any
further segments bigger than the reported MTU.

Unupgraded gateways would ignore the "Change DF meaning" bit, so
the sending host would receive the Can't Fragment/0-MTU message
and would treat it as it would otherwise.

One of the main benefits of Steve's scheme is that it prevents
fragments from ever reaching a host that cannot reassemble them.
This is not entirely true of the modification that I propose;
however, since the condition cannot persist longer than an RTT
or so, the effect is minimal.  (If the receiver drops the fragments,
the sender will timeout and retransmit, but should already have
learned the proper path MTU).

An advantage of the modified scheme is that if the path involves
more than one drop in the MTU, the sender will still discover the
true MTU within one RTT.  That is because the fragment 0 packet
still carries the DF bit and the Change Meaning bit, and (assuming
that all the fragmenting gateways are upgraded) one ICMP will
be returned from each fragmenting stage.

Since the sending host can tell if the current path supports "Denote
Fragment" (because it received a Fragment Report rather than a
Can't Fragment) it could, in theory, reprobe more often for an
increased MTU ... because the major risk (losing the probing segment)
is eliminated as long as the route doesn't change.

Steve raised the question of ID wrap-around.  I think if the sending
host expects that this might be a problem (presumably, by noticing
that it is about to send an ID that might still exist in the Internet)
then it should turn on DF but NOT turn on the Change Meaning bit.  This
would ensure that subsequent datagrams are not fragmented, and so
there could be no confusion about fragment IDs.

-Jeff