Re: Transport requirements for DNS-like protocols

Michael Mealling <michael@neonym.net> Sun, 30 June 2002 03:06 UTC

Return-Path: <ietf-irnss-errors@lists.elistx.com>
Received: from ELIST-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GYI004040MTXL@eListX.com> (original mail from michael@bailey.dscga.com) ; Sat, 29 Jun 2002 23:06:29 -0400 (EDT)
Received: from CONVERSION-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GYI004010MSXJ@eListX.com> for ietf-irnss@elist.lists.elistx.com (ORCPT ietf-irnss@lists.elistx.com); Sat, 29 Jun 2002 23:06:28 -0400 (EDT)
Received: from DIRECTORY-DAEMON.eListX.com by eListX.com (PMDF V6.0-025 #44856) id <0GYI004010MSXI@eListX.com> for ietf-irnss@elist.lists.elistx.com (ORCPT ietf-irnss@lists.elistx.com); Sat, 29 Jun 2002 23:06:28 -0400 (EDT)
Received: from bailey.dscga.com (bailey.neonym.net [198.78.11.130]) by eListX.com (PMDF V6.0-025 #44856) with ESMTP id <0GYI0036V0MRRH@eListX.com> for ietf-irnss@lists.elistx.com; Sat, 29 Jun 2002 23:06:28 -0400 (EDT)
Received: from bailey.dscga.com (localhost [127.0.0.1]) by bailey.dscga.com (8.12.1/8.12.1) with ESMTP id g5U34RuK012171; Sat, 29 Jun 2002 23:04:27 -0400 (EDT)
Received: (from michael@localhost) by bailey.dscga.com (8.12.1/8.12.1/Submit) id g5U34QNG012170; Sat, 29 Jun 2002 23:04:26 -0400 (EDT)
Date: Sat, 29 Jun 2002 23:04:26 -0400
From: Michael Mealling <michael@neonym.net>
Subject: Re: Transport requirements for DNS-like protocols
In-reply-to: <5.1.1.2.2.20020629080646.02aa6730@jay.songbird.com>
To: Dave Crocker <dhc2@dcrocker.net>
Cc: Rob Austein <sra@hactrn.net>, ietf-irnss@lists.elistx.com
Reply-to: Michael Mealling <michael@neonym.net>
Message-id: <20020629230426.J24592@bailey.dscga.com>
MIME-version: 1.0
Content-type: text/plain; charset="us-ascii"
Content-disposition: inline
User-Agent: Mutt/1.3.22.1i
References: <199812050411.UAA00462@daffy.ee.lbl.gov> <vern@ee.lbl.gov> <5.1.1.2.2.20020629080646.02aa6730@jay.songbird.com>
List-Owner: <mailto:ietf-irnss-help@lists.elistx.com>
List-Post: <mailto:ietf-irnss@lists.elistx.com>
List-Subscribe: <http://lists.elistx.com/ob/adm.pl>, <mailto:ietf-irnss-request@lists.elistx.com?body=subscribe>
List-Unsubscribe: <http://lists.elistx.com/ob/adm.pl>, <mailto:ietf-irnss-request@lists.elistx.com?body=unsubscribe>
List-Archive: <http://lists.elistx.com/archives/ietf-irnss/>
List-Help: <http://lists.elistx.com/elists/admin.shtml>, <mailto:ietf-irnss-request@lists.elistx.com?body=help>
List-Id: <ietf-irnss.lists.elistx.com>

On Sat, Jun 29, 2002 at 12:47:53PM -0700, Dave Crocker wrote:
> At 03:34 PM 6/28/2002 -0400, Rob Austein wrote:
> >> Ok, this and the rest of the paragraphs assume IP fragmentation. The
> >> question I have is this: ... If you
> >> just don't _do_ packet level retransmission then congestion control
> >> becomes a non-issue.
> >
> >Multiple IP packets sent all at once == congestion issues, even if
> >those multiple packets are really just fragments of a single packet.
> 
> On the other hand, the natural clumping of packets into packet trains 
> suggests that quickly sending a *small* number of IP datagrams together -- 
> such as for an extended transaction unit -- might not be all that bad.

The proposal I've been toying with was to limit the number of packets
in the train to some number equating to *small*. Anything above that
is responded to with a "I'm sorry but the response is on the order of 
a file stransfer instead of a simple response so please requery via TCP". 
But sans some hard network testing across variously connected parts of 
the network I have no idea how to come up with that number.

> >I think the theory behind preferring IP level fragmentation over
> >having the application do the same thing is that the latter guarentees
> >fragmentation while the former only risks it.
> 
> The non-determinacy is the problem.  Also the fact that IP fragmentation is 
> basically a problem-recovery mechanism.

Yes. The fact that fragmentation is the exception rather than the rule
means that the context switching required during the probable situation
that the router is 'stressed' suggests that the large UDP packet solution 
is relying on the network feature most likely to fail instead of most 
likely to succeed. We have good evidence that UDP packets smaller than
512 bytes succeed at a high rate. IMHO, we should optimize to what will 
succeed rather than what is an error correction.

> >That is, even in the
> >absence of PMTU discovery, there is still a chance that a larger than
> >minimum IP packet might still make it through the net unfragmented.
> 
> But the fact that it might not is the problem.

Correct. Its the recovery from the dropped fragments that causes additional
network impact and increased delay due to timeouts for the user.

> >Also note that (in IPv4) fragmentation can happen anywhere along the
> >path.  Thus, if (count by packets) congestion is a problem near the
> >server but one can keep the local MTU high on the links nearest the
> >server, one can defer fragmentation until the response packet is
> >closer to its destination.
> 
> However nothing in Internet technology makes it comfortable to require or 
> detect particular client/server topologies.

Or rely on the fact that the hops closest to the destination may in fact
be the more incapable. Especially in the case of firewalls and NAT boxes
in the vicinity of the client.

> >Having just recently had to explain to a bunch of nontechnical folks
> >that the magic number thirteen in the sentence "the thirteen root name
> >servers" derives, ultimately, from the hardwired 512 byte message size
> >specified in RFC 1035, you will understand that I would prefer not to
> >repeat this particular mistake (I'd rather make new ones...).
> 
> Perhaps the way to avoid this mistake is to make DNS use a transport 
> protocol that does not rely on the size of the underlying packets.  The 
> easiest way to do this is a thin layer ON TOP of UDP, that strings them 
> together.

Beyond a simple sequence number and possible checksum in the first octet,
would anything else be needed. The only thing I can think of is
some additional octet in the first packet indicating the total number
of packets sent.

> Whether selective retransmission of individual UDP datagrams is a 
> requirement becomes the question.  Without it, we remain reliant on a 
> mostly-reliable network.  That's pretty fragile, in spite of how well it 
> has worked for so long.

But in the case where requerying is so cheap for the server it actually
becomes the best optimization for performance on both ends.

> However to add selective retransmission requires that the server 'assemble' 
> the DNS query and acknowledge its parts selectively.  Hence the server 
> becomes statement.

And increases server demands to the point that a cost of a truly reliable
connect becomes the optimal solution, when in truth it ends up being
overkill.

-MM

-- 
--------------------------------------------------------------------------------
Michael Mealling	|      Vote Libertarian!       | urn:pin:1
michael@neonym.net      |                              | http://www.neonym.net