Re: [Doh] [DNSOP] Do53 vs DoT vs DoH Page Load Performance Study at ANRW

Kevin Borgolte <> Fri, 19 July 2019 10:10 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4A006120154 for <>; Fri, 19 Jul 2019 03:10:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id AnAdYh6uFoV7 for <>; Fri, 19 Jul 2019 03:10:44 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::431]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id DE22F12016D for <>; Fri, 19 Jul 2019 03:10:43 -0700 (PDT)
Received: by with SMTP id g2so13982501pfq.0 for <>; Fri, 19 Jul 2019 03:10:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=mail; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=25ZikMlVCzGtP+Vdf15vRrgcowGxHo6pilCrPOI0FMQ=; b=gRnqM3nlLan5Hw+Q+BnFcJ1At/KV1mFjlzdsKrsduwEDpJL4Q+JBVn7kkbZPr87viG zNAcbcCvIY4cDLh3tBsN8TdkJFXPnlJcDNe7bmc7OCSV2g2Cw50jNGjwxc6V4xccvqQR yaD5TiUFf0798+bykwbULH0EKFQN2fhPjl0Po7ho1HvT//aXPSKnRszExV/ag9keW+nI E3xlsZqm9qTShu8ErfESyY9XHENEu/Pzu+5zmXNIIW+Gn9suXyCbEqpHjKBH1IM4w/9o ZfEyBwbt/ogeyx6NOnuTUiqvgpdWJvF9ThV89hcArMmobEPlEf3QPFHLlGSCEcv9uFbl +A/A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=25ZikMlVCzGtP+Vdf15vRrgcowGxHo6pilCrPOI0FMQ=; b=YmoCxEvKBL2oKiflJDwrBdDkhav0kOjM7V/ml283p9izcmvK09A3tb4m1efNkPalnz 7dz74LB2ZzTaz1aXtVOu5CwfplHRpo5zqjtjIv1Cp3DX/mA79dd1OUmYY8IeUyW8aR2N fcDbs0DUYapBU7VvMEBKnzSBp/K7zvVXRum/tRs3j/kbjqcjj29vvIZlP/Umzzb2bMne o7NtFrIsS5qV9e6ph/MFK3wEu3TfwcsZUMy0uIM1xfTJBuhGABQO/JGQmw1ga7xsOTNl D/Fk5v5dq/D+WkTC9V5evXHVdEQBFtTHjJ77xJA2yBj6KA+O0ypv6Z0eqDd7De9oMIlz 08PQ==
X-Gm-Message-State: APjAAAWRP72sVgdg7oteL3MNQ4DVP4BoU9tTrJ1ME+a6x8eWgP8nVox8 oMnRJCqDcw/oUdeB2rOTkSvBhQ==
X-Google-Smtp-Source: APXvYqxvYxO7O3hl0HGaP/T/O0ba69u0aEXFNJuIasAOqVIK6iD4/mUNrwLmqu0qibPqJiCeVbFMLA==
X-Received: by 2002:a63:9e54:: with SMTP id r20mr17748095pgo.64.1563531038479; Fri, 19 Jul 2019 03:10:38 -0700 (PDT)
Received: from [] ([]) by with ESMTPSA id q69sm44210052pjb.0.2019. (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jul 2019 03:10:38 -0700 (PDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Kevin Borgolte <>
In-Reply-To: <>
Date: Fri, 19 Jul 2019 18:10:34 +0800
Cc:, DoH WG <>, dnsop WG <>,,,,,
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <>
To: Rob Sayre <>
X-Mailer: Apple Mail (2.3445.104.11)
Archived-At: <>
Subject: Re: [Doh] [DNSOP] Do53 vs DoT vs DoH Page Load Performance Study at ANRW
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DNS Over HTTPS <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 19 Jul 2019 10:10:45 -0000

> But, I think you should add the list and the reason for the range choice to the paper. For example, I can't tell what range you actually used from your description (although that might just be due to a hurried reply).

Section 3.2.4 talks about the selection of websites:

We collect HARs (and resulting DNS lookups) for the top 1,000 websites on the Tranco top-list to understand browser performance for the average user visiting popular sites. Furthermore, we measure the bottom 1,000 of the top 100,000 websites (ranked 99,000 to 100,000) to understand browser performance for websites that are less popular. We chose to measure the tail of the top 100,000 instead of the tail of the top 1 million because we found through experimentation that many of the websites in the tail of the top 1 million were offline at the time of our measurements. Furthermore, there is significant churn in the tail of top 1 million, which means that we would not be accurately measuring browser performance for the tail across the duration of our experiment.

We didn't include the full list in the paper itself for space reasons and because extracting the list from the paper would be cumbersome. It will be part of our future open source release though.

> Another issue is that, while your paper might accurately capture the network conditions on your local network, it's probably doesn't capture network variation as well as a large scale test along the lines of what Mozilla did. For example, if the university used a single router brand, this could skew the test. As one data point, I've never seen the various network-throttling apps match a real-user-metrics test very well, although they do catch really problematic situations.

It is true that the an experiment from more diverse vantage points could be more useful to capture network variations, however, we already see that the protocols perform differently to the degree that a human could notice (that is, we already show that selecting your DNS protocol based on your network characteristics can have a non-negligible impact). Running a large scale test for page load times like Mozilla did for resolution times is also quite difficult: Simple telemetry data won't answer the questions we want to ask, as websites need to be fetched multiple times (per protocol/recursor combination), and browser caches need to be flushed. These are not things you want to do with your users. Using existing measurement platforms also was not an option, as they are limited in what you can access or do not support the necessary protocols. Meaning, you would need to set up your own measurement clients, which brings questions like "How do you select the vantage points?" (you can't use hosting/cloud providers, as this would lead to little network variation). If you have an idea on how to easily run this on a larger scale, we'd love to look into it.

We set link capacity and latency based on best case values from real-world measurements (Section 3.2.3) from OpenSignal, and we use iproute2 for traffic control (tc), which we verified to be accurate through latency (Table 1) and multiple "speedtest" (via Ookla). By tuning the network setup in this way (instead of using a 4G/3G modem for connectivity), we eliminated potential differences in routing that could have otherwise influenced our results and impact comparability.

-- Kevin