Re: [Doh] Clarification for a newbie DoH implementor

"Mark Delany" <> Sun, 09 June 2019 08:37 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 18BD012013F for <>; Sun, 9 Jun 2019 01:37:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id H8b9CF5z7UhD for <>; Sun, 9 Jun 2019 01:37:28 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 5F87A120058 for <>; Sun, 9 Jun 2019 01:37:27 -0700 (PDT)
Received: by (Postfix, from userid 1001) id 636A23AFF7; Sun, 9 Jun 2019 18:37:24 +1000 (AEST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/simple;; s=2019; t=1560069444; bh=zx/wNJU5ALVblZLN61dDZ++nMUo=; h=Comments:Received:Date:Message-ID:From:To:Subject:References: MIME-Version:Content-Type:Content-Disposition:In-Reply-To; b=DMnkyJ+rQOymPKA6UhI2ckmUeNFBSeuU0q987Ho/5Y4pd4gHRCJo3ix6CvJrrrnc9 gKX3O9sS3kZUUOgktswAOt6FMXeFPSmR/4LCyHgPF1vTpRPBmNEsS7gcsFgXHDRx2M FTZvUfkUGUheMbYy9yHn77zGYRypV8x6KjMcNIoQ=cNIoQ=
Comments: QMDA 0.3a
Received: (qmail 23966 invoked by uid 1001); 9 Jun 2019 08:37:24 -0000
Date: 9 Jun 2019 08:37:24 +0000
Message-ID: <>
From: "Mark Delany" <>
References: <> <> <> <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <>
Archived-At: <>
Subject: Re: [Doh] Clarification for a newbie DoH implementor
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DNS Over HTTPS <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 09 Jun 2019 08:37:30 -0000

On 19May19, Ben Schwartz allegedly wrote:

> In practice, none of this seems to be a problem.  DoH servers simply do
> their best to fully answer the query

I know this is getting close to flogging a dead horse, but I'm still troubled by
the silent truncation implied by a DoH Server processing TC=1.

Notwithstanding clients which directly connect to DoH servers (such as some
browsers) the typical scenario is likely to be existing stub clients using UDP
to talk to a DoH proxy/client which in turn talks to a DoH server, i.e.:

 stub (UDP) ->        DoH proxy (HTTPS) -> DoH server (UDP) -> resolver
 stub (UDP) <-        DoH proxy (HTTPS) <- DoH server (UDP) <- resolver

(Sorry, you'll need to render this in a fixed-width font).

As I understand Ben's response, if the resolver returns TC=1 then a typical DoH
server (or its resolver library) will retry with TCP which makes the flow for a
large response actually look like:

 stub (UDP) ->        DoH proxy (HTTPS) -> DoH server (UDP) -> resolver
                                           DoH server (UDP) <- (TC=1) resolver
                                           DoH server (TCP) -> resolver
 stub (UDP) <- (TC=0) DoH proxy (HTTPS) <- DoH server (TCP) <- (TC=0) resolver

Which certainly meets the "do their best to fully answer the query" as the stub
never has to worry about TC=1 and simply gets the "full result".

However... It makes me wonder how a large TCP response the DoH Server sends back
via HTTPS can possibly fit into the response the DoH proxy sends back via UDP to
the stub.

Let's say the DoH proxy receives a 5K response over HTTPS, does it blindly try
and transmit this 5K response over UDP back to the stub and just hope for the
best? Or should it know this is likely to fail and act accordingly? If so, how
should it act?

I can't think of what a DoH proxy can sensibly do in such circumstances apart
from arbitrarily truncate the response down to the UDP size indicated by the
stub and *not* mark the response with TC=1. Thus the stub loses some of the
answer and more importantly loses knowledge of the truncation. That seems bad to

As we've discussed previously, it's pointless returning TC=1 to the stub as it
will simply re-issue an identical query as far as the DoH server is concerned.

Note that I said "identical query as far as the DoH server is concerned". The
same is not the case for the DoH proxy as the query re-issued from the stub
*can* be disambiguated since the stub connects to the proxy via TCP.

Which offers a possible solution.

Since a proxy knows whether the inbound query has come via UDP or TCP it can
annotate the HTTPS request accordingly (let's invent a "Use-TCP"
header). The DoH server acts on this header thus the flow becomes:

 stub (UDP) ->        DoH proxy (HTTPS)           -> DoH server (UDP) -> resolver
 stub (UDP) <- (TC=1) DoH proxy (HTTPS)           <- DoH server (UDP) <- (TC=1) resolver
 stub (TCP) ->        DoH proxy (HTTPS - Use-TCP) -> DoH server (TCP) -> resolver
 stub (TCP) <- (TC=0) DoH proxy (HTTPS)           <- DoH server (TCP) <- (TC=0) resolver

IOWs TC=1 is sent all the way back to the stub for it to deal with. If the stub
re-queries with TCP, the proxy forwards the query with the "Use-TCP" annotation
to the DoH server which in turn instructs its resolver library to use TCP.

Not only does this alleviate a DoH server from trying their best to "fully
answer the query" in the blind, it also avoids the impossibility of transmitting
a large response back to the stub over UDP. Most importantly truncation is no
longer silently performed - rather it is communicated back to the stub which
lets it decide what to do as has traditionally been the case.

I think all stubs should work in this "Use-TCP" scenario whereas clearly they
cannot in the "DoH server transparently handles TC=1" scenario.

The one down-side is that a TC=1 response incurs higher latency due to the flow
going all the way back to the stub for a retry, however given these TC=1
responses are uncommon, that seems like a minor price to pay for a consistent