MTU discovery considered harmful?
smb@ulysses.att.com Fri, 20 April 1990 19:53 UTC
Received: from decwrl.dec.com by acetes.pa.dec.com (5.54.5/4.7.34) id AA16629; Fri, 20 Apr 90 12:53:36 PDT
Received: by decwrl.dec.com; id AA21298; Fri, 20 Apr 90 12:53:26 -0700
Message-Id: <9004201953.AA21298@decwrl.dec.com>
Received: by inet; Fri Apr 20 15:53 EDT 1990
From: smb@ulysses.att.com
To: mtudwg
Subject: MTU discovery considered harmful?
Date: Fri, 20 Apr 1990 15:53:12 -0400
I know it's late in the game, but I'm becoming very concerned that MTU discovery may be fundamentally a Bad Idea. In particular, I haven't seen any discussion of the relationship between MTU, window size, and router hop counts; the latter aspect would again tend to pull us towards tying in to OSPFIGP. And without some changes, I think we're going to be opening a can of worms. Let's look at a not-so-absurd limiting case: FDDI rings at both LANs, and point-to-point links across a regional net. FDDI uses a 4K MTU; serial lines, being HDLC, have more or less arbitrary MTUs, and will likely be set to 4K once FDDI becomes common. Current TCPs (at least, many of them) have default window sizes of 4K. This means that we've reduced sliding window to send-and-wait. Even with 8K windows, we haven't helped much here -- the sender will transmit two 4K packets right away, and then have to wait for the first ACK; if delayed ACKs are used, we'll quite likely see just one ACK for both packets. There's another issue as well: serialization time on the links. When a packet is being sent over a wire, there's a non-negligible transmission time due to the clock speed of the link. For example, a DS0 link -- 56K bps -- has a serialization speed of 1/7 msec/byte. For 4K packets, that's 585 msecs just to clock the bits onto the wire. Since we're routing packets at the IP level, a gateway has to accumulate the entire packet before it can retransmit it; thus, we pay a 585 msec delay penalty for each DS0 hop. (For DS1 speeds -- 1.544M bps -- which are used on today's backbone, the cost is of course less, about 21 msec for each transmission of a 4K packet.) Note what happens if that 4K packet is broken up into 4 1K chunks. We still pay the serialization price the first time for sending the 4K bytes; however, each gateway can now hand off the packet as soon as 1K has arrived. We thus get overlapped transmissions -- while the host (or rather, the first long-haul gateway) is still sending the last packet, the first three are simultaneously being sent over three other links. The per-hop cost is therefore only for a 1K packet. To grossly oversimplify things, to a (poor) first approximation the optimum MTU size is the window size divided by the number of hops, or at least the number of ``slow'' (a term I'll leave undefined) hops. That way, each router can be busy sending a packet simultaneously. (I say that this is a poor approximation because of the considerable overhead per packet. But don't overestimate that overhead; for a router using slow lines, the serialization time dominates. For example, according to some measurements I've done recently, on a Cisco router the fixed overhead is on the order of 2 ms, plus the cost per byte -- and for a 40-byte minimum TCP packet, that's 5.7 ms.) The proposal in the draft RFC gives a good mechanism for calculating the PMTU, but yields no information on the hop count. Informal looks at some non-random traceroutes suggest typical connections are traveling at least 10 hops. I'd say as a guess, without looking at maps of the NSFNET backbone or any of the regional nets, that we can assume 3 hops within a regional net to reach NSFNET, 2 or 3 hops on the backbone, and another 3 hops via the destination regional net. This would suggest that maximum MTU be approximately 1/8 of the window size -- a number that's remarkably close to what we're now using. That said, has anyone done any throughput measurements using a TCP that's been hacked to use, say, 1500 byte MTUs? Our discussions over the last few months make it fairly obvious that we can't rely on munging the routers to give us hopcount information via a new path discovery mechanism. But hosts can adjust their window sizes. Let me suggest, off the top of my head, two strategies. First, a host can more-or-less reliably detect use of Path MTU by noting the arrival of TCP packets with Don't Fragment set. If a host notices that PMTU is in use, it should increase the window size for that connection by some factor, perhaps (if it knows) using its own PMTU information as a guess about the other end's PMTU. By the same token, a host using PMTU should nevertheless restrict its maximum effective PMTU to some fraction of the largest receive window ever advertised at it. (I realize I'm being TCP-specific here.) What fraction should we use? I suspect that a factor of 4 will work, though it wouldn't hurt to try some experiments. In today's world, that means that on an all-Ethernet LAN (InterLAN? CateLAN?), the typical local situation, we'll see MTUs of 1K rather than 1500 -- a reduction that isn't serious. All-FDDI locales will not be common for a while; first penetration will be in the campus backbone market. People who really want the 4K MTU in such situations can always specify SUBNETSARELOCAL, thereby bypassing the whole process. Comments? --Steve Bellovin
- MTU discovery considered harmful? smb
- Re: MTU discovery considered harmful? William Chops Westfield
- Re: MTU discovery considered harmful? Drew Daniel Perkins
- Re: MTU discovery considered harmful? Philippe Prindeville
- Re: MTU discovery considered harmful? Steve Deering
- Re: MTU discovery considered harmful? Drew Daniel Perkins