[DNSOP] Application level DNS message fragmentation

Mukund Sivaraman <muks@isc.org> Mon, 08 December 2014 08:32 UTC

Return-Path: <muks@isc.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 55E401A702C for <dnsop@ietfa.amsl.com>; Mon, 8 Dec 2014 00:32:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.165
X-Spam-Level:
X-Spam-Status: No, score=0.165 tagged_above=-999 required=5 tests=[BAYES_05=-0.5, SPF_SOFTFAIL=0.665] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Mm8znnuHSX4c for <dnsop@ietfa.amsl.com>; Mon, 8 Dec 2014 00:32:21 -0800 (PST)
Received: from mail.banu.com (mail.banu.com [IPv6:2a01:4f8:140:644b::225]) by ietfa.amsl.com (Postfix) with ESMTP id E28731A1B12 for <dnsop@ietf.org>; Mon, 8 Dec 2014 00:32:20 -0800 (PST)
Received: from totoro.home.mukund.org (unknown [115.118.147.237]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.banu.com (Postfix) with ESMTPSA id D1616E60088; Mon, 8 Dec 2014 08:32:17 +0000 (GMT)
Date: Mon, 08 Dec 2014 14:02:12 +0530
From: Mukund Sivaraman <muks@isc.org>
To: dnsop@ietf.org
Message-ID: <20141208083212.GA13206@totoro.home.mukund.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="OgqxwSJOaUobr8KG"
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
Archived-At: http://mailarchive.ietf.org/arch/msg/dnsop/Ign9iHcDCsxlfLU66JTHXafyGWQ
Subject: [DNSOP] Application level DNS message fragmentation
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Dec 2014 08:32:22 -0000

Hi all

[I am reluctant to send this, as it could very well be a stupid
 idea. But as at least one person has suggested I discuss it on DNSOP,
 so here it is.]

When a server (plain or EDNS capable) is queried via UDP, and determines
that the response won't fit into 512 (plain) or the client's UDP message
size (EDNS), it sets TC=1 forcing the client to retry via TCP.

Fragmentation at the IP layer causes issues. Fragmentation could occur
when the PMTU is lower than the advertised EDNS message size. IP
fragments may be dropped by devices on the path causing the UDP datagram
to not arrive at the user application. Packets with DF=1 also are not
fragmented by a router if it cannot forward it.

With EDNS, when the client message size is small, a response may still
not fit in a single datagram causing the client to retry using TCP.

----

Can we have the following scheme so that fragmentation is supported at
the application level?

When a server determines that the response doesn't fit into a single
datagram (512 or the client's message size), the server splits the reply
into multiple fragment datagrams (512 or some discovered PMTU that
works) such that:

1. Each datagram is a DNS reply message with identical header field
values (except for section counts) and TC=1 in each of them. The ID
field has the same value among all reply fragments.

2. Each datagram contains part of the RRs that form the complete reply,
split on RR boundaries. The DNS header contains the appropriate section
counts for that datagram. The datagrams need not be equal in size.

3. An additional RR (plain DNS) or pseudo RR (inside OPT) called
FRAGMENT is present in every datagram with 2 16-bit fields containing
the count of fragments, and current fragment. (Though a DNS message is
limited to 1<<16 octets and a DNS datagram can be at least 512 octets
long, 16-bit fields are better for fragment count as the datagrams can
be of different sizes.)

4. A client that doesn't know about this scheme notices TC=1 and retries
with TCP. Datagrams other than the first one should be ignored as they
are duplicate replies with the same message ID.

5. A client that is aware of this scheme finds TC=1 and the FRAGMENT RR
and does reassembly (similar to IP fragment reassembly such as RFC 815),
DNS messages being limited to 1<<16 octets too.

This scheme still restricts the size of a single RR to the datagram
size. Reassembly (unlike IP fragments) doesn't require offsets such as
used in RFC 815 as RRs are wholly contained inside one datagram.

TSIG can also be made to work with such a scheme on fragment by fragment
basis.

----

This scheme is not for replacing TCP. As mentioned above, if a TXT RR
containing multiple character-strings doesn't fit in a single datagram
for example, and truncation happens, it'll require TCP. It's not for
replacing EDNS's large datagram sizes too. But it is possible for EDNS
replies to overflow path MTU causing loss of replies, and when loss is
noted, on second attempt, truncation could occur as the message no
longer fits in reduced datagram size.

Some things can still be served by UDP where possible (without involving
all the baggage of TCP.. roundtrips for starting SYN/ACK, for most DNS
requests having the connection remain in slow-start phase, etc.)  As an
example, with a fragment datagram max size of 512, replies could
traverse a firewall that blocked large replies.

This scheme should be backwards compatible with (ignored by) existing
implementations. Client implementations of this scheme can also signal
support with FRAGMENT 0 0.

		Mukund