Re: [Driu] Query distribution vs. query routing

Martin Thomson <martin.thomson@gmail.com> Fri, 20 July 2018 18:19 UTC

Return-Path: <martin.thomson@gmail.com>
X-Original-To: driu@ietfa.amsl.com
Delivered-To: driu@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4E726130F6F for <driu@ietfa.amsl.com>; Fri, 20 Jul 2018 11:19:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ETsyOTecIzXj for <driu@ietfa.amsl.com>; Fri, 20 Jul 2018 11:19:07 -0700 (PDT)
Received: from mail-oi0-x243.google.com (mail-oi0-x243.google.com [IPv6:2607:f8b0:4003:c06::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2D919130F3C for <driu@ietf.org>; Fri, 20 Jul 2018 11:19:07 -0700 (PDT)
Received: by mail-oi0-x243.google.com with SMTP id b15-v6so22874392oib.10 for <driu@ietf.org>; Fri, 20 Jul 2018 11:19:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=TpYex4ZWK4YgRPeiRLYX/mkbZtFbJSn7tdC68QCNYtk=; b=mT3bikQ0Pxv7kWnKuMRXUIJObJWSOJzSWwycmaGoOiBv7img+6Riel0vpSosh0vM4X neqWGfBoCvsxu9/p9GmWbHRV6wxfxbOcfUTSRS/neo9Px6d024AULe9U9pjWPijIH6Pn eS9feC5Gyr0hs3gwMazQNyPpE8IRw2gEABMoGp3viAjPL9HAl3EWMN0CBy6PZq44bmvW KR5GnjKFBv0P12exmmJhYp1T03CXd0xeLG/pgLlDkMqzvHC84zo0xH06WW4YmR1r2ihH WKtzvd0E47AiIyif0fpi0kaVt8RV1e8qQUD428leeWc9zhno45FTZeZ1FjSZVsIdVh8x s6Tg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=TpYex4ZWK4YgRPeiRLYX/mkbZtFbJSn7tdC68QCNYtk=; b=XPk7HvFrT0BbPKucapBjnpoZEdHCU1wLs2VHjRODpP8R20Oc8pQJwI1K263wbWX7yp yb3IJKMDmJJKbzBoJLvq1Ed6sGawsMlkJxul4Mf30PFicxuYHfv4zpYat91awtVqZ/Qz EDzJyWB1+6HWgCvyUtg6V9r7Eu8hl61aBo9+Cjc8AbfY+g9r0ejStROWiDXmsFSA0ik5 llvqzfYCiiGG3bDVAVVQSnpE7/95bS24YB+fiNZNoGgxvN+w5DUEGpaahVtOcrQYrN8P czZj2ZEeLlQfyrV1/b1+ihf0MIR3jNNelwFF76p1e6nA7sM5t7LwEoMFM3xtDqDiq+sA YvEw==
X-Gm-Message-State: AOUpUlFnKuDHDQ8ahiw1PxxMiGmD4javv0MaEYFsjIOHROB0czFP16Qo jF8cY9TOX+ylPI33S+n5U9aNOEr9uC/QXmhGRerPLJ4i
X-Google-Smtp-Source: AAOMgpdVr7e+W+Yzns4LpNkqto8dIcIQ6WuERT6XLrNJyUc8opaW7QwT90WywncNvnJpNGHLQDUAcg4yrl+jQ5J1MfY=
X-Received: by 2002:aca:5155:: with SMTP id f82-v6mr62127oib.272.1532110746366; Fri, 20 Jul 2018 11:19:06 -0700 (PDT)
MIME-Version: 1.0
References: <CA+9kkMB+yODJARiFqyestCRtX6Od-Ryz7qkjhWmn5ToqYBOLgA@mail.gmail.com>
In-Reply-To: <CA+9kkMB+yODJARiFqyestCRtX6Od-Ryz7qkjhWmn5ToqYBOLgA@mail.gmail.com>
From: Martin Thomson <martin.thomson@gmail.com>
Date: Fri, 20 Jul 2018 14:18:55 -0400
Message-ID: <CABkgnnV32RqCf11Hqb2Op_aL_asctz7aSV5z9+jMtvymrJKJmw@mail.gmail.com>
To: Ted Hardie <ted.ietf@gmail.com>
Cc: driu@ietf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/driu/cZRvHvBAlXB5S9iifnhoSrWXTVA>
Subject: Re: [Driu] Query distribution vs. query routing
X-BeenThere: driu@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: "DNS Resolver Identification and Use \(DRIU\)." <driu.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/driu>, <mailto:driu-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/driu/>
List-Post: <mailto:driu@ietf.org>
List-Help: <mailto:driu-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/driu>, <mailto:driu-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Jul 2018 18:19:18 -0000

Thanks for sharing this Ted.

I think that this is a conversation that this group needs to address
before it gets to the point that any new work in this area is mature
enough to be considered for a WG-forming BOF.

One caution though: though DoH might have precipitated this
conversation, and so it should take some of the blame for these
issues, this is not a problem unique to DoH.  It's a consequence of
the notion that discovery is a fundamental part of DNS.
On Thu, Jul 19, 2018 at 6:57 PM Ted Hardie <ted.ietf@gmail.com>; wrote:
>
> Howdy,
>
> The mic line interaction and related jabber traffic around Mark's draft were interesting, and they nicely illustrated just how confused I can be.  Sorry for sharing that experience with the group.
>
> Where I think it's useful to dive in a little further is in the distinction between query distribution and query routing.
>
> Let's start by making a parallel: a device on a lan with a single router has a default next hop destination; that router may also have a default next hop destination.  In both cases, all traffic leaving one device will go through the other (lan device to lan router; lan router to default upstream destination).  In the router case, it's pretty common for there to be two different upstream possibilities, with the router making the choice between upstream possibilities for packets as they arrive from the device.  That choice may be a simple algorithm that load balances between them, like ECMP, or it may be made by consulting a mapping that tells it which upstream is a better route to a specific destination network (that mapping being derived from data distributed by some routing protocol).
>
> At a hideously gross approximation, a host talking to a recursive resolver is in a parallel situation.  It can have a single upstream resolver it talks to, or it can have more than one.  If it has more than one, it may direct queries to them as a load balancing function, or as a function of the names one serves usually because one is configured to serve the local names in a split DNS situation.  That configuration is the simple routing information in the parallel here.
>
> What I heard in the jabber room as explanatory text was that this proposal suggests allowing the default upstream server to name other default upstream servers, who could share the load of the query stream.  There would be benefits to the upstreams in load-shedding and benefits to the device in reducing the information revealed to any particular upstream.  I have some serious concerns there about whether the default upstream servers it names could be scoped to avoid this becoming a DOS vector, but even putting that aside, I think the incentive is poor.  The local device already has a known good upstream with an established TCP/TLS/HTTP session going, and it's going to want to avoid the latency of establishment of load balancing in a lot of cases.
>
> What I originally thought was in the proposal was closer to distributing routing information.  There, the server isn't distributing new default servers, it is distributing information about servers that are suited to answering queries about specific names.  The choice to use them, in other words, wouldn't be based on load balancing, but on query routing--sending a query to the best available query responder from a latency or authority perspective.  There's a whole bunch of work around securing routing information that we'd have to import if we went down this path.  Down this path split DNS becomes fragmented DNS, in which you may find some answers are "reachable" and some are not.
>
> I can also read the document to be recommending a hybrid mode, in which every DOH responder agrees to be a default query path but still tells you which names it is well suited to resolve (giving you a situation like that in which a router gets multiple routing announcements that include both specific networks and a default route).  There are a bunch of interesting possibilities there, both in optimizations and attacks.  Ultimately, though, the incentive works only when this actually results in quicker connection setups and when both the DOH client and DOH responder care a great deal about that timing.
>
> What I think that turns out to mean, and this is from a very dusty crystal ball, is that a DOH responder will have to maintain a very robust DNS recursive resolver infrastructure if it agrees to be a default query path in order to attract query traffic bound for its network.  That means a big cache, good connections to other DNS services, the lot.  If it doesn't, its delays in responding to "default upstream" queries will be slower than those from other potential services, and a DNS client will eventually cease using them (presuming it is still testing optimization by query response time).  If it still gets any queries, it will be along the "fragmented DNS" query model, where it gets only queries where the latency of its answers for specific networks is very important.
>
> I think those later in the line that were pointing out the risks of consolidation were right, in other words, and maybe didn't even go far enough.  I think the end game of this model is that the user has no control over where the queries go and the heuristic system underneath them ends up sending them to site willing to offer the highest number of names (more specific routes) and the biggest DNS query infrastructure. That's going to land everything behind a few CDNS, unless I miss my guess.
>
> I'm sure I've made several additional howlers in this note, for which I apologize early,
>
> Ted
> _______________________________________________
> DRIU mailing list
> DRIU@ietf.org
> https://www.ietf.org/mailman/listinfo/driu