Re: [DNSOP] Application level DNS message fragmentation

Mukund Sivaraman <muks@isc.org> Thu, 11 December 2014 09:26 UTC

Return-Path: <muks@isc.org>
X-Original-To: dnsop@ietfa.amsl.com
Delivered-To: dnsop@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 017031ACD57 for <dnsop@ietfa.amsl.com>; Thu, 11 Dec 2014 01:26:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.065
X-Spam-Level: **
X-Spam-Status: No, score=2.065 tagged_above=-999 required=5 tests=[BAYES_50=0.8, J_CHICKENPOX_36=0.6, SPF_SOFTFAIL=0.665] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qfjFna_JP2Gj for <dnsop@ietfa.amsl.com>; Thu, 11 Dec 2014 01:26:21 -0800 (PST)
Received: from mail.banu.com (mail.banu.com [IPv6:2a01:4f8:140:644b::225]) by ietfa.amsl.com (Postfix) with ESMTP id 4884E1ACD55 for <dnsop@ietf.org>; Thu, 11 Dec 2014 01:26:21 -0800 (PST)
Received: from totoro.home.mukund.org (unknown [115.118.51.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.banu.com (Postfix) with ESMTPSA id EF3F5E6008A; Thu, 11 Dec 2014 09:26:17 +0000 (GMT)
Date: Thu, 11 Dec 2014 14:56:12 +0530
From: Mukund Sivaraman <muks@isc.org>
To: Paul Vixie <paul@redbarn.org>
Message-ID: <20141211092612.GA23177@totoro.home.mukund.org>
References: <20141208083212.GA13206@totoro.home.mukund.org> <5486D5DC.1050902@redbarn.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="/9DWx/yDrRhgMJTb"
Content-Disposition: inline
In-Reply-To: <5486D5DC.1050902@redbarn.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Archived-At: http://mailarchive.ietf.org/arch/msg/dnsop/U9Ddf-QiYQACPFs1BzWd2MJPyag
Cc: dnsop@ietf.org
Subject: Re: [DNSOP] Application level DNS message fragmentation
X-BeenThere: dnsop@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: IETF DNSOP WG mailing list <dnsop.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dnsop>, <mailto:dnsop-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dnsop/>
List-Post: <mailto:dnsop@ietf.org>
List-Help: <mailto:dnsop-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dnsop>, <mailto:dnsop-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Dec 2014 09:26:23 -0000

Hi Paul

I've started preparing a draft on Stephen Morris's suggestion.

On Tue, Dec 09, 2014 at 02:58:36AM -0800, Paul Vixie wrote:
> there is no reason to support this in non-EDNS. if someone won't upgrade
> to EDNS, then (1) we have no responsibility toward improving their DNS
> experience, and (2) they probably will not upgrade to this multi-message
> proposal either. (arguments of the form, "we want this to work when both
> endpoints can do EDNS but the middlebox forbids EDNS", are answered by
> noting that middleboxes probably would not permit multi-message DNS,
> either.)

Nod. FRAGMENT would be an EDNS option. Also note that we drop down to up
to 512 in BIND even with EDNS (don't remember off the top of my mind if
it is UDP payload or including all headers), which is perfectly fine for
the case where PMTU is small and DO=1 is required (though it won't be
very successful).

> i think there is no PMTU that works. marka has fought this battle for a
> long time, and he's currently suggesting 1280-(headersize) for IPv6 and
> 1500-(headersize) for IPv4, period. my hope is that any recommendation
> for application-level fragmentation for DNS on UDP/53 would say
> "MAX(reliably determined PMTU, MIN(client's offered buffer size, 1500 or
> 1280 depending on transport protocol))."

I didn't think of suggesting a fragment size in the draft any more than
what RFCs up to EDNS(0) discuss for DNS message sizes, but I'll keep
this in mind. You'll see the draft and you can suggest changes about
this.

I'll ask Mark about this, but the PMTU determination (or rather, a
payload size that works as PMTU can be lower) is from the client's
side. As it misses responses from the server, it lowers the offered
buffer size and retries. The server-side would use this offered buffer
size (and perhaps its link MTU to avoid IP fragments at source). The
server side can't determine the PMTU as it doesn't get acks.

> > 1. Each datagram is a DNS reply message with identical header field
> > values (except for section counts) and TC=1 in each of them. The ID
> > field has the same value among all reply fragments.
> >
> > 2. Each datagram contains part of the RRs that form the complete reply,
> > split on RR boundaries. The DNS header contains the appropriate section
> > counts for that datagram. The datagrams need not be equal in size.
> 
> splitting an RR-set across messages makes my skin itch. i know it's the
> right thing to do and i'm not objecting. just letting you know, somebody
> will some day not recognize the OPT code that describes this as a
> multi-message transaction, and cache a partial RR-set, and we'll google
> the message i am now typing to show them the error of their ways.

:-)

> > 3. An additional RR (plain DNS) or pseudo RR (inside OPT) called
> > FRAGMENT is present in every datagram with 2 16-bit fields containing
> > the count of fragments, and current fragment. (Though a DNS message is
> > limited to 1<<16 octets and a DNS datagram can be at least 512 octets
> > long, 16-bit fields are better for fragment count as the datagrams can
> > be of different sizes.)
> 
> i think the absence of ACK-based timing means that packet trains longer
> than 256 packets are too dangerous to contemplate. even with some kind
> of application-layer inter-record-gap that's a lot of packets to inject
> without needing to hear an OK signal from the remote end. therefore i
> suggest two 8-bit fields.

You are right. The fields are updated to 8 bits each. I've put in a
section about network considerations where congestion is
discussed. Also, with more fragments, the probability of loss goes up,
and because there is no acknowledgement, large numbers will cause
problems. The draft would need to recommend limits and behavior.

> > 4. A client that doesn't know about this scheme notices TC=1 and retries
> > with TCP. Datagrams other than the first one should be ignored as they
> > are duplicate replies with the same message ID.
> 
> i think that wastes end-to-end bandwidth, and should be avoided, by
> having the initiator solicit (for QUERY) or probe (for UPDATE) using an
> EDNS OPT, rather than letting the responder just spew.

It seems that the client would have to solicit with more fields anyway
(to avoid increasing the chance of Kaminsky attack). Because there are
more reply datagrams sent, there is a slightly better chance of attack
succeeding, which we could avoid by adding extra entropy such as a
nonce. There is also the case of amplification attacks that needs to be
considered.

> > 5. A client that is aware of this scheme finds TC=1 and the FRAGMENT RR
> > and does reassembly (similar to IP fragment reassembly such as RFC 815),
> > DNS messages being limited to 1<<16 octets too.
> 
> referencing your later message on this thread, i don't think compression
> pointers can be allowed to point out-of-message. so, each message will
> form its own string dictionary. if that's what you meant to say then i'm
> sorry for misunderstanding you.

Nod, and this would assist easy implementations as they can throw away a
fragment after parsing it without waiting for all of them.

> i'd like to see this coupled to the cookie proposal, so that if cookies
> aren't used, then this option is not available.

Some sort of measure has to be in place to avoid possibility of
amplification attacks.

		Mukund