Re: Another proposal to think about

mogul (Jeffrey Mogul) Fri, 01 December 1989 23:32 UTC

Received: by acetes.pa.dec.com (5.54.5/4.7.34) id AA09728; Fri, 1 Dec 89 15:32:37 PST
From: mogul (Jeffrey Mogul)
Message-Id: <8912012332.AA09728@acetes.pa.dec.com>
Date: 1 Dec 1989 1532-PST (Friday)
To: William Westfield <BILLW@MATHOM.CISCO.COM>
Cc: mtudwg
Subject: Re: Another proposal to think about
In-Reply-To: William Westfield <BILLW@MATHOM.CISCO.COM> / Fri 1 Dec 89 05:54:06-PST. <12546625828.8.BILLW@MATHOM.CISCO.COM>

	NFS (+ Sun RPC) provide a textbook example of both the possibility of
	doing this right, and the dangers of doing this wrong.  Sun RPC loves
	to send 8kb UDP packets over an Ethernet (with a 1.5kb MTU).  This is
	often a disaster when a gateway (or slow receiver interface) is
	involved.
    
    This common example raises some interesting questions.  A SUN knows very
    well that the MTU is at most 1500 bytes, and then decides to use 8k packets
    anyway.  Since using a smaller packet size allows "slow hosts and routers"
    to work better, it is fairly clear that it is slower this way (unless it
    doesn't work at all with big packets).

Not at all clear.  It is entirely possible to use 1500-byte datagrams
[aside for terminology: IP "datagrams" are fragmented into Ethernet
"packets"] to move 8kb NFS buffers without using any more packets than
is used with Suns "fragment immediately" scheme.  The reason that Sun
chose to use fragmentation was that they didn't want to spend the
time (programmer time and CPU time) for a real transport protocol.
We should all be able to design a layer that fits between UDP and
Sun's RPC that allows you to move 8k buffers efficiently without doing
fragmentation.

Another reason that NFS does this is because it works.  As long
as you don't lose fragments, of course.

    Is fragmentation that occurs at
    the originating host that much less "harmfull" than fragmentation that
    occurs at routers?  (I think perhaps so.) 
    
Marginally so, since (if the routers have limited input buffering
abilities) it increases the number of places where a fragment could
be dropped.

    Why doesn't sun use 8k tcp packets too, anyway?

Recall that the fundamental reliability/performance difference
between IP fragmentation and TCP segmentation is that if a TCP
segment in the middle of a burst gets dropped, the TCP ACK mechanism
allows you to make use of the segments that did manage to arrive.
Even if all but the first segment of a window is continually dropped,
you will still make progress.

With fragmentation, on the other hand, you cannot make any progress
until you have received all the fragments of a datagram.  Even if
the sender is able to use the same IP ID for retransmissions, if
there is "deterministic fragmentation loss" (e.g., you always lose
the last fragment of a datagram) there is no way to make any progress.

Using 8k TCP packets would move you from the "can make progress"
to the "can't make progress" domain, without actually reducing
the number of packets that are sent in the best case.

-Jeff