Modifications to Steve Deering's "DF" scheme
mogul (Jeffrey Mogul) Mon, 05 March 1990 21:30 UTC
Received: by acetes.pa.dec.com (5.54.5/4.7.34)
id AA16912; Mon, 5 Mar 90 13:30:22 PST
From: mogul (Jeffrey Mogul)
Message-Id: <9003052130.AA16912@acetes.pa.dec.com>
Date: 5 Mar 1990 1330-PST (Monday)
To: mtudwg
Cc:
Subject: Modifications to Steve Deering's "DF" scheme
I'm becoming more and more attracted to this approach, since (besides the technical points in its favor) it avoids most of issues on the "ACTION ITEMS" part of the minutes of the last meeting. One point against the "Pure DF" scheme (i.e., the one that Steve has described) is that because the too-big datagrams are dropped, they represent "wasted effort" and also result in anomalously high round-trip times. This is especially unfortunate when, during a long-lived connection, the sender chooses to retry sending a large datagram. Since it is most likely to still be too big, it will probably be dropped, and the connection will stall for a while. I propose to fix this by using a "spare" IP Header bit (if the Proteon gateways don't drop packets with unrecognized TOS bits, that's probably where to put it. If they do drop these packets, this proposal is moot for the time being.) If this bit is clear, then the DF bit carries its normal meaning. Current gateways return a Can't Fragment message with a 0 in the "MTU field"; upgraded gateways return Can't Fragment with the MTU in that field. If the bit is set, it changes the meaning of the DF bit from "Don't Fragment" to "Denote Fragment" (well, that's the best I can do without opening my thesaurus). In this case, the upgraded gateways DO fragment and forward the packet, but also return an ICMP Destination Unreachable/Fragment Report message, which looks exactly like Steve's modified Can't Fragment message except that it carries a different code. The purpose of the new code is to allow the sending host to realize that it does NOT have to retransmit the segment in question (at least, not until the normal timeout) but it should not send any further segments bigger than the reported MTU. Unupgraded gateways would ignore the "Change DF meaning" bit, so the sending host would receive the Can't Fragment/0-MTU message and would treat it as it would otherwise. One of the main benefits of Steve's scheme is that it prevents fragments from ever reaching a host that cannot reassemble them. This is not entirely true of the modification that I propose; however, since the condition cannot persist longer than an RTT or so, the effect is minimal. (If the receiver drops the fragments, the sender will timeout and retransmit, but should already have learned the proper path MTU). An advantage of the modified scheme is that if the path involves more than one drop in the MTU, the sender will still discover the true MTU within one RTT. That is because the fragment 0 packet still carries the DF bit and the Change Meaning bit, and (assuming that all the fragmenting gateways are upgraded) one ICMP will be returned from each fragmenting stage. Since the sending host can tell if the current path supports "Denote Fragment" (because it received a Fragment Report rather than a Can't Fragment) it could, in theory, reprobe more often for an increased MTU ... because the major risk (losing the probing segment) is eliminated as long as the route doesn't change. Steve raised the question of ID wrap-around. I think if the sending host expects that this might be a problem (presumably, by noticing that it is about to send an ID that might still exist in the Internet) then it should turn on DF but NOT turn on the Change Meaning bit. This would ensure that subsequent datagrams are not fragmented, and so there could be no confusion about fragment IDs. -Jeff
- Modifications to Steve Deering's "DF" scheme Jeffrey Mogul