Re: [DNSOP] CLIENT-SUBNET bis appetite?

Warren Kumari <> Fri, 15 December 2017 19:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 1F7E1128616 for <>; Fri, 15 Dec 2017 11:34:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id ShT7c4Sw1gma for <>; Fri, 15 Dec 2017 11:34:11 -0800 (PST)
Received: from ( [IPv6:2a00:1450:400c:c0c::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 3DEF3129406 for <>; Fri, 15 Dec 2017 11:34:10 -0800 (PST)
Received: by with SMTP id x49so8915023wrb.13 for <>; Fri, 15 Dec 2017 11:34:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=sXSbES3OXu/zo+Llan5vycjsUqhSY9OTEJTNRoyiJTo=; b=TcLIAdAjRDfH2RCfZJ/Xfeo1X1z47hWU95Kk/vqQPL6vs4Rf9Wr4amyD7+6X+ufoWz AgBGq2tvubJGFfZ4B6T5SnPjT0FqHMMfWpkIxg1TMoCGZd4s6RUcr88a3XYLtO+H6cej 2yKKknGO2/ZB39DNaDkgElIjhvbxvfnxpov+VAqxvrrvOVtdBrtDMDdwUs2b0VaWoo2I k38+XC/B41jyZwwSp7sswftfmL41txphVpFMWACDXdkbiD2pKAtuxWcH1wkugA1oPjF1 RRjzG1xMGyXdSvC5xOg6cscPnw02JPoesd3V72Nwf3Lq4hDG+NklEHHFhbNKdKQSBtN5 6n7g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=sXSbES3OXu/zo+Llan5vycjsUqhSY9OTEJTNRoyiJTo=; b=mmJTFQxhdY4SmH8DhjCeMv8e2lE5T7lmLjyL/npf9cO9yWjGreRyWitSYead4W+/0t OWYQ0aLlSXlaulb+ugRSa3T4Z71pbmMuXDQHNgiOS7VFUuKcpLxP8Ss+8ahr77to5kTA Rbkt84yEfrNqjmm/mGwwW+sYzJ5JkWjdXz819Hdi3m9TuISzv+u9UNJjILAgO0l4TMyj IzCgGbAMOS6ybFTRDRXrtX/vcOprJ6dwd6JIgI+VXq7VfJFLSr63vRv0RR6NSmDO9MHK ND80RMDxXeIZxX7BOr54EjZNGDIND5rclCBnijorSuAbzWTCS9UMb+TGzK2pvOGsEavx PSkQ==
X-Gm-Message-State: AKGB3mIK5aLPQBLRrBlW1j5tiIngVMWgTPbWszTZYMsYOy53oj58lYQR EJQnFAh3+qCLvBJfvotwxwQdh9oFoUOg6MWy5VldsQ==
X-Google-Smtp-Source: ACJfBovMZAfZmFbvX0tJFul9arvlyn5sB8Tw2qocwYCEf8WZDNyCO8Gvu0456YDlT5hvQDCpAhw2dsgb2GNDnogMsNk=
X-Received: by with SMTP id f11mr8761940wrh.283.1513366448322; Fri, 15 Dec 2017 11:34:08 -0800 (PST)
MIME-Version: 1.0
Received: by with HTTP; Fri, 15 Dec 2017 11:33:27 -0800 (PST)
In-Reply-To: <>
References: <> <> <>
From: Warren Kumari <>
Date: Fri, 15 Dec 2017 14:33:27 -0500
Message-ID: <>
To: Mukund Sivaraman <>
Cc: bert hubert <>, dnsop <>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <>
Subject: Re: [DNSOP] CLIENT-SUBNET bis appetite?
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF DNSOP WG mailing list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 15 Dec 2017 19:34:22 -0000

On Fri, Dec 15, 2017 at 11:50 AM, Mukund Sivaraman <> wrote:
> On Thu, Dec 14, 2017 at 07:00:58PM +0100, bert hubert wrote:
>> On Thu, Dec 14, 2017 at 11:09:13PM +0530, Mukund Sivaraman wrote:
>> > Any appetite for it? Don't throw things at me.. I ask because the
>> > current thing is slowly getting more widely deployed and there are
>> > design issues that can do with a ECS2 that breaks from ECS1 protocol. I
>> > ask because I'm once again having to deal with myriad implementation
>> > cases and dislike it.
>> Could you elaborate what you dislike most?
> It is too complicated to implement ECS correctly. There are a large
> number of corner cases. The things that resolvers and authoritative
> sides have to take care of are quite different. It is more complex
> than anything else in DNS.
> I think this should be built again from scratch.
>> The biggest thing we are noticing is that while it does great things
>> to getting to a server the content provider likes, it unavoidably
>> drives doen cache hitrates a lot, introducing a latency penalty.
>> The operators we see deploying ECS have tens of thousands of subnets
>> which all need to be mapped to only a few servers. But you still end up
>> with tens of thousands of cache entries and therefore tiny cache hitrates.
>> Such things could be addressed by answering with lists of subnet masks to
>> which this answer would also apply, but this makes little sense
>> operationally I think.
> Firstly, correct deaggregation is an important requirement of reducing
> cache usage. With the current design of ECS protocol, it's very
> important that correct disjoining of prefixes be done optimally to avoid
> cache pollution, yet the draft does not specify a suitable algorithm for
> it (we know how to do it, I think the draft should have stated it).
> A /n address prefix as specified by ECS option is a perfect binary
> tree of 1<<n addresses. To correctly deaggregate (scope=0)
> data from a longer prefix such as, this will result in all
> these answers to be generated:
> * answer
> * exact match answer (scope > source)
> * answer
> * exact match answer (scope > source)
> * answer
> and so on.. there are about 2n+1 answers necessary so that a
> answer does not override a /n client from receiving its specific answer.
>                   x
>            x            x
>          x   x        x   x
>         x x x x      x x y y
> If ECS option had more fields, we could have put the above pattern as
> a difference of trees (with direction bit and height, e.g., "x"s in
> the diagram above) and it would have reduced cache usage
> considerably.
> This can be generalized with more differences but anyway I think that
> using QUERY for ECS is a badly done idea (not even mentioning privacy
> loss).
> As an example, RPZ does not rely on queries.. it transfers all prefixes
> for matching in a zone so that the longest prefix match algorithm will
> not suffer from a previously cached shorter prefix matching and
> preventing future fetches.
> Another related problem is this: We often want to match against a GeoIP
> database (containing what may be changing but maintained prefixes) in
> associating zone data with geographic/network-topo clients. We want to
> say "serve this answer for country X or city Y or ASNNNN" and we don't
> care about managing the actual prefixes.
> GeoIP is a custom database format, where I can match against GeoIP, but
> I can't easily deaggregate all its prefixes for an ECS zone to be cached
> properly.
> I feel it would have been better to say to a downstream resolver "This
> answer is for country X", or "A is the answer for country X, B is the
> answer for country Y, the rest use answer C".

<No hats. other than working at a place that uses ECS to provide
optimized answers>
Please see the thread

There is no *best* answer for a country, nor a city or even a postal address.

There is instead a best answer for a specific IP address / subnet,
which depends upon the ISP, the peering connections with that ISP, the
utilization of the peering links with that ISP, the utilization of the
datacenter, and then much lower down, where in the physical topology
that subnet resides.

I'm within spitting distance of Ashburn Equinix (well, I cannot quite
spit there, but I could easily walk there, it's < 5 miles), but my ISP
is Comcast. This means that, instead of hitting anything in IAD
(Ashburn), I (currently) instead get sent to MRN (Lenoir, North
Carolina), which is 400 miles away. My packets also happen to go
through 111 8th Ave, NY - even though there is stuff in NY, the MRN
location is better / faster.

If I happened to have chosen a different ISP, Verizon for instance, my
optimal answers would be very different.
And no, the "obvious" answer of "just use the AS number then" fails
equally badly - my latency to (to chose something at random)
is ~12ms. If I ware a Comcast customer in California and the same
location were handed out, the latency would be ~80ms.

Note that I'm in a place which is well connected - in many places, the
"less optimal" answer is much much less optimal...

> The design of ECS needs to be reconsidered. I'd prefer something like a
> zone format for it, than using QUERY. QUERY cannot give complete
> information about all prefixes and there is a possibility of incorrect
> caching, and a very high probability of redundant cache pollution.

Sure, happy to reconsider the design, but it is important to know more
about what the constraints are, and how specific and dynamic the
answers are.

For exmaple, Akamai says they are in 1,600 networks
(  "The
Cloudflare Global Anycast Network"
( is is powered by 118 data
centers around the world. "

Google Cloud Platform has added new regions in São Paulo and Mumbai.
GCP has 13 regions, 39 zones, over 100 points of presence, and a
well-provisioned global network with 100,000s of miles of fiber optic
cable." (embarrassingly I couldn't easily find a public page with more

Dyn has a map here:

> >From a resolver's point of view, a non-ECS answer (no client-subnet
> option) is different from an ECS answer with scope=0 which is different
> from an ECS answer with source=0, whereas all these may be the same from
> an authority's point of view. They all need to be cached differently
> (from an intermediate resolver's view).
> This thing of scope > source meaning for-exact-match-only is weird as
> hell when implementing longest prefix matching. It is not convenient
> to use an off-the-shelf radix tree.
> ECS relies on the option always being returned for any kind of
> answers, as some resolvers use that as an indicator of ECS support
> (and stop using ECS if it ever stops). But ECS does not apply to
> several kinds of answers (e.g., anything but NOERROR, esp. NXDOMAIN
> and NODATA have to be consistent across all prefixes.) It doesn't
> apply to SOA, DNSKEY, NS in answer section, referrals, etc. Yet,
> many of these need to answer with SCOPE=0.
> An ACL config option about whether the NS supports ECS or not (to
> return the option or not) is different from a config option whether
> the NS passes through ECS or not: the latter would always pass through
> SOURCE=0 but return REFUSED for any ECS queries that didn't match the
> ACL; where as the former would return non-ECS reply for any ECS
> queries that didn't match the ACL).
> Transitivity of the option has corner cases.
> I don't have to point out how easy it is for a erroneous /16 to
> prevent queries to /24 answers shadowed by the /16.
> Some cache cases: Obviously an ECS cache is different from a
> zone.. it's not from a single zone, it is not an atomic collection of
> a single version of zone and ever changing. If there's a /24 answer in
> cache, and a newer query brings in a /16 answer that shadows it,
> should the resolver assume that the /16 has precedence because it's
> newer (hence the /24 should no longer exist) or do a
> longest-prefix-match against the older /24? What if the /16 then
> expires and the /24 hasn't expired? An NXDOMAIN answer should expire
> any previously cached prefix-specific cache entries for that name. A
> NODATA answer should expire any previously cached prefix-specific
> cache entries for that type.  Non-ECS data is different from SCOPE=0
> data. There are questions about trust ranking with usage of ECS data.
> These are just some topics that I can quickly think of. There are many
> other issues we faced and discussed during resolver ECS development.
> The draft leaves many things unspecified, such as more clarity in DNSSEC
> and handling of negative answers. Many issues were fixed during the
> draft phase, but I feel it was insufficient.
>> Can you share your ideas for ECS2?
> There are many quirks in ECS. I don't want to propose specific ideas
> now, except that we should gather requirements and start from
> scratch.

Yes, much of my soapbox rant was about just this -- understanding the
requirements is important - the reason that CDNs provide different
answers based upon the IP address it is a proxy for latency /
I'm sure we got many things wrong in ECS, but a redesign needs to be
informed by the use case and requirements.

(This mail not meant to sound as grumpy as it turned out :-) )


> We have to reduce complexity of the protocol on both auth and
> caching resolver sides. I think it should be designed again from
> requirements without being a tweak of ECS1. The current protocol
> complicates DNS implementation significantly.
>                 Mukund
> _______________________________________________
> DNSOP mailing list

I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.