Re: Where we stand and where we are going

Rob Austein <> Fri, 28 June 2002 02:15 UTC

Return-Path: <>
Received: from by (PMDF V6.0-025 #44856) id <> (original mail from; Thu, 27 Jun 2002 22:15:01 -0400 (EDT)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Thu, 27 Jun 2002 19:31:59 -0400 (EDT)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT ; Thu, 27 Jun 2002 19:31:58 -0400 (EDT)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Thu, 27 Jun 2002 19:31:58 -0400 (EDT)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Thu, 27 Jun 2002 19:31:58 -0400 (EDT)
Received: from by (PMDF V6.0-025 #44856) id <> for (ORCPT; Thu, 27 Jun 2002 19:31:58 -0400 (EDT)
Received: from ( []) by (PMDF V6.0-025 #44856) with ESMTP id <> for; Thu, 27 Jun 2002 19:31:57 -0400 (EDT)
Received: from (localhost []) by (Postfix) with ESMTP id BCAB018AC for <>; Thu, 27 Jun 2002 19:31:43 -0400 (EDT)
Date: Thu, 27 Jun 2002 19:31:43 -0400
From: Rob Austein <>
Subject: Re: Where we stand and where we are going
In-reply-to: <>
Message-id: <>
MIME-version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-type: text/plain; charset=US-ASCII
User-Agent: Wanderlust/2.4.1 (Stand By Me) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) Emacs/20.7 (i386--freebsd) MULE/4.0 (HANANOEN)
References: <> <> <> <>
List-Owner: <>
List-Post: <>
List-Subscribe: <>, <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Help: <>, <>
List-Id: <>

At Thu, 27 Jun 2002 09:19:17 -0400, Michael Mealling wrote:
> On Wed, Jun 26, 2002 at 04:42:15PM -0400, Rob Austein wrote:
> Ok, I've been doing a lot of thinking in this realm and talking to all
> sorts of people and I keep hearing different things. Here's the FUD I 
> keep hearing:
> 1) IP fragmentation on the Internet is awful so if you do UDP packets
> larger than 1500 bytes you will probably loose one of the fragments
> about 10 percent of the time

Which, for many protocols, would be a Bad Thing (tm), but for an
idempotent protocol in which it is cheaper to re-run the query than to
stash replies in case something bad happens, this might be the best
one can do.

> 2) Attempts to do UDP fragementation at the application layer and not the
> IP layer just means you're going to reinvent all of TCP over again so
> why bother trying and just use TCP.

Yeah, I've had that conversation too.  A few years ago I suggested
reimplementing portions of NETBLT over UDP, and was roundly trounced
for failing to understand the behavior of a tail-drop universe.  I
have not yet figured out whether I really believe that argument, but I
think it would be safe to say that we understand single-datagram UDP,
we understand TCP, and we may understand a few other protocols, but
the more general issues of timing, retransmission (ie, timing), and
buffering (ie, timing) for transport protocols are significantly more
"interesting" than one might wish (in the sense of the Chinese curse).

> 3) "Try with UDP and if it fails retry with TCP" is "good enough". <insert
> all of your "good enough is the enemy of good" verbage here>

If you're willing to live with approximately an order of magnitude
increase in load as the cost of failover, this might indeed be the
least bad option.  Transaction TCP shaves a bit off of that, but in
practice one gets to a point of diminishing returns very quickly in
the transport protocol space, if only because transport protocols
usually live in OS kernels.

> I have had two instances where the usage profile of a protocol suggests
> that 99% of the responses will be less than 2K and the interaction is
> stateless and connection-less. Inheriting the full session semantics of TCP
> isn't required. But neither is the sad state of UDP packet size limitations.
> My proposed solution is to limit UDP packet sizes to 512 bytes and put
> packet sequence numbers on them. You still have a connectionless interaction
> but it a) puts the packet size into a realm with a higher probability of
> success and b) allows for a handful of those packets to get through. I'm
> not sure if you need more than that. You can still do the "well if 
> that didn't work I can always do TCP"...

I did a think piece years ago on transport requirements for DNS, as
input to a BOF that Vern held in, um, Orlando.  No doubt I've still
got it somewhere.  The 5 cent version is that transport requirements
for a lightweight idempotent query protocol with many orders of
magnitude more clients than servers are, um, kind of whacky.  So we
have stayed with UDP for DNS, but have added the aforementioned EDNS0
size "negotiation"[*] extension.

[*] It's not a negotation, just a statement by the client that the
server should please feel free to generate a response up to N bytes
long.  Life is wierd when a conversation only lasts for two packets.